How to keep quotes when parsing csv file? - c#

I am using Microsoft.VisualBasic.FileIO.TextFieldParser to read a csv file, edit it , then parse it.
The problem is the quotes are not being kept after parsing.
I tried using parser.HasFieldsEnclosedInQuotes = true; but it does not seem to keep the quotes for some reason.
This issue breaks when a field contains a quote for example :
Before
"some, field"
After
some, field
As two seperate fields
Here is my method
public static void CleanStaffFile()
{
String path = #"C:\file.csv";
String dpath = String.Format(#"C:\file_{0}.csv",DateTime.Now.ToString("MMddyyHHmmss"));
List<String> lines = new List<String>();
if (File.Exists(path))
{
using (TextFieldParser parser = new TextFieldParser(path))
{
parser.HasFieldsEnclosedInQuotes = true;
parser.Delimiters = new string[] { "," };
while (!parser.EndOfData)
{
string[] parts = parser.ReadFields();
if (parts == null)
{
break;
}
if ((parts[12] != "") && (parts[12] != "*,116"))
{
parts[12] = parts[12].Substring(0, 3);
}
else
{
parts[12] = "0";
}
lines.Add(string.Join(",", parts));
}
}
using (StreamWriter writer = new StreamWriter(dpath, false))
{
foreach (String line in lines)
writer.WriteLine(line);
}
}
MessageBox.Show("CSV file successfully processed :\n");
}

So you want to have quotes after you have modified it at string.Join(",", parts)? Then it's easy since only fields which contain the separator were wrapped in quotes before. Just add them again before the String.Join.
So before (and desired):
"some, field"
after(not desired):
some, field
This should work:
string[] fields = parser.ReadFields();
// insert your logic here ....
var newFields = fields
.Select(f => f.Contains(",") ? string.Format("\"{0}\"", f) : f);
lines.Add(string.Join(",", newFields));
Edit
I would like to keep quotes even if doesn't contain a comma
Then it's even easier:
var newFields = fields.Select(f => string.Format("\"{0}\"", f));

The TextFieldParser.HasFieldsEnclosedInQuotes property is used as follows, from the MSDN page:
If the property is True, the parser assumes that fields are enclosed in quotation marks (" ") and may contain line endings.
If a field is enclosed in quotation marks, for example, abc, "field2a,field2b", field3 and this property is True, then all text enclosed in quotation marks will be returned as is; this example would return abc|field2a,field2b|field3. Setting this property to False would make this example return abc|"field2a|field2b"|field3.
The quotes will indicate the start and end of a field, which may then contain the character(s) used to normally separate fields. If your data itself has quotes, you need to set HasFieldsEnclosedInQuotes to false.
If your data fields can contain both separators and quotes, you will need to start escaping quotes before parsing, which is a problem. Basicly you're going beyond the capabilities of a simple CSV file.

Related

Escape comma(,) from a csv cell while while exporting its data to database table

I have a csv file in which there is a field having comma in it. e.g under office location column I have a value xyz, building. When i checked the value through debugger it only shows "\"xyz". I have tried escaping the comma and backward slash by using Replace(",","") and Replace("\"","") but it failed. Also I am getting extra \ in the result as marked in red circle.
I have attached the image while debugging showing the structure of the csv row. The problem is in the red circle area.
I have also tried following function:
public static string RemoveColumnDelimitersInsideValues(string input)
{
const char valueDelimiter = '"';
const char columnDelimiter = ',';
StringBuilder output = new StringBuilder();
bool isInsideValue = false;
for (var i = 0; i < input.Length; i++)
{
var currentChar = input[i];
if (currentChar == valueDelimiter)
{
isInsideValue = !isInsideValue;
output.Append(currentChar);
continue;
}
if (currentChar != columnDelimiter || !isInsideValue)
{
output.Append(currentChar);
}
}
return output.ToString();
}
Kindly help in resolving the issues. Thanks
The \ character you see is not in the actual string, it's just an escaping character added in the debugger view.
Click on the magnifier to get the actual value of the string.
Hope it helps.
Try using TextFieldParser, in csv if the column value has comma the column value is escaped with qoutes, so adding HasFieldsEnclosedInQuotes to true will automatically read it as single column.
using Microsoft.VisualBasic.FileIO;
using (TextFieldParser reader = new TextFieldParser(csvpath))
{
reader.Delimiters = new string[] { "," };
reader.HasFieldsEnclosedInQuotes = true;
string[] col = reader.ReadFields();
}
String.Replace doesn't modify the existing string, it returns a new one. Because of that, you have the same old row string outside IsNullOrEmpty check.
Also, you are telling, you are trying to escape comma and quotes, but from you are removing it in your code.
If you want to remove commas and quotes, your code may look like
if (string.IsNullOrEmpty(row))
{
row = row.Replace(",", "").Replace("\"", "");
}
If you want to escape quotes and commas, your code may look like
if (row != null && row.Contains(","))
{
row = "\"" + row.Replace("\"", "\"\"") + "\"";
}
There are 3 issues with your code that are worth pointing out.
1. Parsing a CSV can be tricky
Would you code handle a multiline string correctly? Would you code handle a " inside one of the columns (so an escaped ")?
I recommend using a csv reading libary (aka NuGet package).
There is no backslash
Here is a file.
1,"The string in the first row has a comma, and an f, in it"
2,The string in the 2nd row does not have a comma in it
Here is what Visual Studio shows (I'm using VS Code here).
Here is what Console.WriteLine prints.
1,"The string in the first row has a comma, and an f, in it"
2,The string in the 2nd row does not have a comma in it
3. Replacing commas
Even if you deal with the quotes, wouldn't replacing commans get rid of the field delimiter?

Delimit a string by character unless within quotation marks C#

I need to demilitarise text by a single character, a comma. But I want to only use that comma as a delimiter if it is not encapsulated by quotation marks.
An example:
Method,value1,value2
Would contain three values: Method, value1 and value2
But:
Method,"value1,value2"
Would contain two values: Method and "value1,value2"
I'm not really sure how to go about this as when splitting a string I would use:
String.Split(',');
But that would demilitarise based on ALL commas. Is this possible without getting overly complicated and having to manually check every character of the string.
Thanks in advance
Copied from my comment: Use an available csv parser like VisualBasic.FileIO.TextFieldParser or this or this.
As requested, here is an example for the TextFieldParser:
var allLineFields = new List<string[]>();
string sampleText = "Method,\"value1,value2\"";
var reader = new System.IO.StringReader(sampleText);
using (var parser = new Microsoft.VisualBasic.FileIO.TextFieldParser(reader))
{
parser.Delimiters = new string[] { "," };
parser.HasFieldsEnclosedInQuotes = true; // <--- !!!
string[] fields;
while ((fields = parser.ReadFields()) != null)
{
allLineFields.Add(fields);
}
}
This list now contains a single string[] with two strings. I have used a StringReader because this sample uses a string, if the source is a file use a StreamReader(f.e. via File.OpenText).
You can try Regex.Split() to split the data up using the pattern
",|(\"[^\"]*\")"
This will split by commas and by characters within quotes.
Code Sample:
using System;
using System.Linq;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
string data = "Method,\"value1,value2\",Method2";
string[] pieces = Regex.Split(data, ",|(\"[^\"]*\")").Where(exp => !String.IsNullOrEmpty(exp)).ToArray();
foreach (string piece in pieces)
{
Console.WriteLine(piece);
}
}
}
Results:
Method
"value1,value2"
Method2
Demo

parse csv file, ignoring comma held in quotes c#

I am currently looking at splitting a CSV file that is read into an application by the comma, however, there is legitimate comma's held in double quotes that are getting split when i dont want them to be.
when using TextFieldParser this is reading the fields that I am wanting it to read, however its reading all the fields and then i am struggling to get them out on the correct lines.
public string ParseCSVForFields(string dataFileName)
{
var sb = new StringBuilder();
var line = new List<string>();
using (TextFieldParser parser = new TextFieldParser(dataFileName))
{
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(",");
parser.HasFieldsEnclosedInQuotes = true;
while (!parser.EndOfData)
{
//Processing row
string currentRow = parser.ReadLine();
string[] fields = parser.ReadFields();
foreach (var field in fields)
{
// this is where i am stuck
}
}
}
return null;
}
any and all help would be very much appreciated.
thanks
You are calling both ReadLine and ReadFields. That seems suspicious. Remove the ReadLine part.

CSV change delimiter

i'm reading a CSV file and changing the delimiter from a "," to a "|". However i've noticed in my data (which I have no control over) that in certain cases I have some data that does not want to follow this rule and it contains quoted data with a comma in it. I'm wondering how best to not replace these exceptions?
For example:
ABSON TE,Wick Lane,"Abson, Pucklechurch",Bristol,Avon,ENGLAND,BS16
9SD,37030,17563,BS0001A1,,
Should be changed to:
ABSON TE|Wick Lane|"Abson, Pucklechurch"|Bristol|Avon|ENGLAND|BS16
9SD|37030|17563|BS0001A1||
The code to read and replace the CSV file is this:
var contents = File.ReadAllText(filePath).Split(new string[] { "\n", "\r\n" }, StringSplitOptions.RemoveEmptyEntries).ToArray();
var formattedContents = contents.Select(line => line.Replace(',', '|'));
For anyone else struggling with this, I ended up using the built in .net csv parser. See here for more details and example: http://coding.abel.nu/2012/06/built-in-net-csv-parser/
My specific code:
// Create new parser object and setup parameters
var parser = new TextFieldParser(new StringReader(File.ReadAllText(filePath)))
{
HasFieldsEnclosedInQuotes = true,
Delimiters = new string[] { "," },
TrimWhiteSpace = true
};
var csvSplitList = new List<string>();
// Reads all fields on the current line of the CSV file and returns as a string array
// Joins each field together with new delimiter "|"
while (!parser.EndOfData)
{
csvSplitList.Add(String.Join("|", parser.ReadFields()));
}
// Newline characters added to each line and flattens List<string> into single string
var formattedCsvToSave = String.Join(Environment.NewLine, csvSplitList.Select(x => x));
// Write single string to file
File.WriteAllText(filePathFormatted, formattedCsvToSave);
parser.Close();

asp.net Convert CSV string to string[]

Is there an easy way to convert a string from csv format into a string[] or list?
I can guarantee that there are no commas in the data.
String.Split is just not going to cut it, but a Regex.Split may - Try this one:
using System.Text.RegularExpressions;
string[] line;
line = Regex.Split( input, ",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))");
Where 'input' is the csv line. This will handle quoted delimiters, and should give you back an array of strings representing each field in the line.
If you want robust CSV handling, check out FileHelpers
string[] splitString = origString.Split(',');
(Following comment not added by original answerer)
Please keep in mind that this answer addresses the SPECIFIC case where there are guaranteed to be NO commas in the data.
Try:
Regex rex = new Regex(",(?=([^\"]*\"[^\"]*\")*(?![^\"]*\"))");
string[] values = rex.Split( csvLine );
Source: http://weblogs.asp.net/prieck/archive/2004/01/16/59457.aspx
You can take a look at using the Microsoft.VisualBasic assembly with the
Microsoft.VisualBasic.FileIO.TextFieldParser
It handles CSV (or any delimiter) with quotes. I've found it quite handy recently.
There isn't a simple way to do this well, if you want to account for quoted elements with embedded commas, especially if they are mixed with non-quoted fields.
You will also probably want to convert the lines to a dictionary, keyed by the column name.
My code to do this is several hundred lines long.
I think there are some examples on the web, open source projects, etc.
Try this;
static IEnumerable<string> CsvParse(string input)
{
// null strings return a one-element enumeration containing null.
if (input == null)
{
yield return null;
yield break;
}
// we will 'eat' bits of the string until it's gone.
String remaining = input;
while (remaining.Length > 0)
{
if (remaining.StartsWith("\"")) // deal with quotes
{
remaining = remaining.Substring(1); // pass over the initial quote.
// find the end quote.
int endQuotePosition = remaining.IndexOf("\"");
switch (endQuotePosition)
{
case -1:
// unclosed quote.
throw new ArgumentOutOfRangeException("Unclosed quote");
case 0:
// the empty quote
yield return "";
remaining = remaining.Substring(2);
break;
default:
string quote = remaining.Substring(0, endQuotePosition).Trim();
remaining = remaining.Substring(endQuotePosition + 1);
yield return quote;
break;
}
}
else // deal with commas
{
int nextComma = remaining.IndexOf(",");
switch (nextComma)
{
case -1:
// no more commas -- read to end
yield return remaining.Trim();
yield break;
case 0:
// the empty cell
yield return "";
remaining = remaining.Substring(1);
break;
default:
// get everything until next comma
string cell = remaining.Substring(0, nextComma).Trim();
remaining = remaining.Substring(nextComma + 1);
yield return cell;
break;
}
}
}
}
CsvString.split(',');
Get a string[] of all the lines:
string[] lines = System.IO.File.ReadAllLines("yourfile.csv");
Then loop through and split those lines (this error prone because it doesn't check for commas in quote-delimited fields):
foreach (string line in lines)
{
string[] items = line.Split({','}};
}
string s = "1,2,3,4,5";
string myStrings[] = s.Split({','}};
Note that Split() takes an array of characters to split on.
Some CSV files have double quotes around the values along with a comma. Therefore sometimes you can split on this string literal: ","
A Csv file with Quoted fields, is not a Csv file. Far more things (Excel) output without quotes rather than with quotes when you select "Csv" in a save as.
If you want one you can use, free, or commit to, here's mine that also does IDataReader/Record. It also uses DataTable to define/convert/enforce columns and DbNull.
http://github.com/claco/csvdatareader/
It doesn't do quotes.. yet. I just tossed it together a few days ago to scratch an itch.
Forgotten Semicolon: Nice link. Thanks.
cfeduke: Thanks for the tip to Microsoft.VisualBasic.FileIO.TextFieldParser. Going into CsvDataReader tonight.
http://github.com/claco/csvdatareader/ updated using TextFieldParser suggested by cfeduke.
Just a few props away from exposing separators/trimspaces/type ig you just need code to steal.
I was already splitting on tabs so this did the trick for me:
public static string CsvToTabDelimited(string line) {
var ret = new StringBuilder(line.Length);
bool inQuotes = false;
for (int idx = 0; idx < line.Length; idx++) {
if (line[idx] == '"') {
inQuotes = !inQuotes;
} else {
if (line[idx] == ',') {
ret.Append(inQuotes ? ',' : '\t');
} else {
ret.Append(line[idx]);
}
}
}
return ret.ToString();
}
string test = "one,two,three";
string[] okNow = test.Split(',');
separationChar[] = {';'}; // or '\t' ',' etc.
var strArray = strCSV.Split(separationChar);
string[] splitStrings = myCsv.Split(",".ToCharArray());

Categories

Resources