Splitting CSV files with commas in the values [duplicate] - c#

How to split the CSV file in c sharp? And how to display this?

I've been using the TextFieldParser Class in the Microsoft.VisualBasic.FileIO namespace for a C# project I'm working on. It will handle complications such as embedded commas or fields that are enclosed in quotes etc. It returns a string[] and, in addition to CSV files, can also be used for parsing just about any type of structured text file.

Display where? About splitting, the best way is to use a good library to that effect.
This library is pretty good, I can recommend it heartily.
The problems using naïve methods is that the usually fail, there are tons of considerations without even thinking about performance:
What if the text contains commas
Support for the many existing formats (separated by semicolon, or text surrounded by quotes, or single quotes, etc.)
and many others

Import Micorosoft.VisualBasic as a reference (I know, its not that bad) and use Microsoft.VisualBasic.FileIO.TextFieldParser - this handles CSV files very well, and can be used in any .Net language.

read the file one line at a time, then ...
foreach (String line in line.Split(new char[] { ',' }))
Console.WriteLine(line);

This is a CSV parser I use on occasion.
Usage: (dgvMyView is a datagrid type.)
CSVReader reader = new CSVReader("C:\MyFile.txt");
reader.DisplayResults(dgvMyView);
Class:
using System.IO;
using System.Text.RegularExpressions;
using System.Windows.Forms;
public class CSVReader
{
private const string ESCAPE_SPLIT_REGEX = "({1}[^{1}]*{1})*(?<Separator>{0})({1}[^{1}]*{1})*";
private string[] FieldNames;
private List<string[]> Records;
private int ReadIndex;
public CSVReader(string File)
{
Records = new List<string[]>();
string[] Record = null;
StreamReader Reader = new StreamReader(File);
int Index = 0;
bool BlankRecord = true;
FieldNames = GetEscapedSVs(Reader.ReadLine());
while (!Reader.EndOfStream)
{
Record = GetEscapedSVs(Reader.ReadLine());
BlankRecord = true;
for (Index = 0; Index <= Record.Length - 1; Index++)
{
if (!string.IsNullOrEmpty(Record[Index])) BlankRecord = false;
}
if (!BlankRecord) Records.Add(Record);
}
ReadIndex = -1;
Reader.Close();
}
private string[] GetEscapedSVs(string Data)
{
return GetEscapedSVs(Data, ",", "\"");
}
private string[] GetEscapedSVs(string Data, string Separator, string Escape)
{
string[] Result = null;
int Index = 0;
int PriorMatchIndex = 0;
MatchCollection Matches = Regex.Matches(Data, string.Format(ESCAPE_SPLIT_REGEX, Separator, Escape));
Result = new string[Matches.Count];
for (Index = 0; Index <= Result.Length - 2; Index++)
{
Result[Index] = Data.Substring(PriorMatchIndex, Matches[Index].Groups["Separator"].Index - PriorMatchIndex);
PriorMatchIndex = Matches[Index].Groups["Separator"].Index + Separator.Length;
}
Result[Result.Length - 1] = Data.Substring(PriorMatchIndex);
for (Index = 0; Index <= Result.Length - 1; Index++)
{
if (Regex.IsMatch(Result[Index], string.Format("^{0}[^{0}].*[^{0}]{0}$", Escape))) Result[Index] = Result[Index].Substring(1, Result[Index].Length - 2);
Result[Index] = Result[Index].Replace(Escape + Escape, Escape);
if (Result[Index] == null) Result[Index] = "";
}
return Result;
}
public int FieldCount
{
get { return FieldNames.Length; }
}
public string GetString(int Index)
{
return Records[ReadIndex][Index];
}
public string GetName(int Index)
{
return FieldNames[Index];
}
public bool Read()
{
ReadIndex = ReadIndex + 1;
return ReadIndex < Records.Count;
}
public void DisplayResults(DataGridView DataView)
{
DataGridViewColumn col = default(DataGridViewColumn);
DataGridViewRow row = default(DataGridViewRow);
DataGridViewCell cell = default(DataGridViewCell);
DataGridViewColumnHeaderCell header = default(DataGridViewColumnHeaderCell);
int Index = 0;
ReadIndex = -1;
DataView.Rows.Clear();
DataView.Columns.Clear();
for (Index = 0; Index <= FieldCount - 1; Index++)
{
col = new DataGridViewColumn();
col.CellTemplate = new DataGridViewTextBoxCell();
header = new DataGridViewColumnHeaderCell();
header.Value = GetName(Index);
col.HeaderCell = header;
DataView.Columns.Add(col);
}
while (Read())
{
row = new DataGridViewRow();
for (Index = 0; Index <= FieldCount - 1; Index++)
{
cell = new DataGridViewTextBoxCell();
cell.Value = GetString(Index).ToString();
row.Cells.Add(cell);
}
DataView.Rows.Add(row);
}
}
}

I had got the result for my query. its like simple like i had read a file using io.file. and all the text are stored into a string. After that i splitted with a seperator. The code is shown below.
using System;
using System.Collections.Generic;
using System.Text;
namespace CSV
{
class Program
{
static void Main(string[] args)
{
string csv = "user1, user2, user3,user4,user5";
string[] split = csv.Split(new char[] {',',' '});
foreach(string s in split)
{
if (s.Trim() != "")
Console.WriteLine(s);
}
Console.ReadLine();
}
}
}

The following function takes a line from a CSV file and splits it into a List<string>.
Arguments:
string line = the line to split
string textQualifier = what (if any) text qualifier (i.e. "" or "\"" or "'")
char delim = the field delimiter (i.e. ',' or ';' or '|' or '\t')
int colCount = the expected number of fields (0 means don't check)
Example usage:
List<string> fields = SplitLine(line, "\"", ',', 5);
// or
List<string> fields = SplitLine(line, "'", '|', 10);
// or
List<string> fields = SplitLine(line, "", '\t', 0);
Function:
private List<string> SplitLine(string line, string textQualifier, char delim, int colCount)
{
List<string> fields = new List<string>();
string origLine = line;
char textQual = '"';
bool hasTextQual = false;
if (!String.IsNullOrEmpty(textQualifier))
{
hasTextQual = true;
textQual = textQualifier[0];
}
if (hasTextQual)
{
while (!String.IsNullOrEmpty(line))
{
if (line[0] == textQual) // field is text qualified so look for next unqualified delimiter
{
int fieldLen = 1;
while (true)
{
if (line.Length == 2) // must be final field (zero length)
{
fieldLen = 2;
break;
}
else if (fieldLen + 1 >= line.Length) // must be final field
{
fieldLen += 1;
break;
}
else if (line[fieldLen] == textQual && line[fieldLen + 1] == textQual) // escaped text qualifier
{
fieldLen += 2;
}
else if (line[fieldLen] == textQual && line[fieldLen + 1] == delim) // must be end of field
{
fieldLen += 1;
break;
}
else // not a delimiter
{
fieldLen += 1;
}
}
string escapedQual = textQual.ToString() + textQual.ToString();
fields.Add(line.Substring(1, fieldLen - 2).Replace(escapedQual, textQual.ToString())); // replace escaped qualifiers
if (line.Length >= fieldLen + 1)
{
line = line.Substring(fieldLen + 1);
if (line == "") // blank final field
{
fields.Add("");
}
}
else
{
line = "";
}
}
else // field is not text qualified
{
int fieldLen = line.IndexOf(delim);
if (fieldLen != -1) // check next delimiter position
{
fields.Add(line.Substring(0, fieldLen));
line = line.Substring(fieldLen + 1);
if (line == "") // final field must be blank
{
fields.Add("");
}
}
else // must be last field
{
fields.Add(line);
line = "";
}
}
}
}
else // if there is no text qualifier, then use existing split function
{
fields.AddRange(line.Split(delim));
}
if (colCount > 0 && colCount != fields.Count) // count doesn't match expected so throw exception
{
throw new Exception("Field count was:" + fields.Count.ToString() + ", expected:" + colCount.ToString() + ". Line:" + origLine);
}
return fields;
}

Problem: Convert a comma separated string into an array where commas in "quoted strings,,," should not be considered as separators but as part of an entry
Input:
String: First,"Second","Even,With,Commas",,Normal,"Sentence,with ""different"" problems",3,4,5
Output:
String-Array: ['First','Second','Even,With,Commas','','Normal','Sentence,with "different" problems','3','4','5']
Code:
string sLine;
sLine = "First,\"Second\",\"Even,With,Commas\",,Normal,\"Sentence,with \"\"different\"\" problems\",3,4,5";
// 1. Split line by separator; do not split if separator is within quotes
string Separator = ",";
string Escape = '"'.ToString();
MatchCollection Matches = Regex.Matches(sLine,
string.Format("({1}[^{1}]*{1})*(?<Separator>{0})({1}[^{1}]*{1})*", Separator, Escape));
string[] asColumns = new string[Matches.Count + 1];
int PriorMatchIndex = 0;
for (int Index = 0; Index <= asColumns.Length - 2; Index++)
{
asColumns[Index] = sLine.Substring(PriorMatchIndex, Matches[Index].Groups["Separator"].Index - PriorMatchIndex);
PriorMatchIndex = Matches[Index].Groups["Separator"].Index + Separator.Length;
}
asColumns[asColumns.Length - 1] = sLine.Substring(PriorMatchIndex);
// 2. Remove quotes
for (int Index = 0; Index <= asColumns.Length - 1; Index++)
{
if (Regex.IsMatch(asColumns[Index], string.Format("^{0}[^{0}].*[^{0}]{0}$", Escape))) // If "Text" is sourrounded by quotes (but ignore double quotes => "Leave ""inside"" quotes")
{
asColumns[Index] = asColumns[Index].Substring(1, asColumns[Index].Length - 2); // "Text" => Text
}
asColumns[Index] = asColumns[Index].Replace(Escape + Escape, Escape); // Remove double quotes ('My ""special"" text' => 'My "special" text')
if (asColumns[Index] == null) asColumns[Index] = "";
}
The output array is asColumns

Related

Split a string if delimiter is between single quotes [duplicate]

This question already has answers here:
How to split csv whose columns may contain comma
(9 answers)
Closed 4 years ago.
I have the following comma-separated string that I need to split. The problem is that some of the content is within quotes and contains commas that shouldn't be used in the split.
String:
111,222,"33,44,55",666,"77,88","99"
I want the output:
111
222
33,44,55
666
77,88
99
I have tried this:
(?:,?)((?<=")[^"]+(?=")|[^",]+)
But it reads the comma between "77,88","99" as a hit and I get the following output:
111
222
33,44,55
666
77,88
,
99
Depending on your needs you may not be able to use a csv parser, and may in fact want to re-invent the wheel!!
You can do so with some simple regex
(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)
This will do the following:
(?:^|,) = Match expression "Beginning of line or string ,"
(\"(?:[^\"]+|\"\")*\"|[^,]*) = A numbered capture group, this will select between 2 alternatives:
stuff in quotes
stuff between commas
This should give you the output you are looking for.
Example code in C#
static Regex csvSplit = new Regex("(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)", RegexOptions.Compiled);
public static string[] SplitCSV(string input)
{
List<string> list = new List<string>();
string curr = null;
foreach (Match match in csvSplit.Matches(input))
{
curr = match.Value;
if (0 == curr.Length)
{
list.Add("");
}
list.Add(curr.TrimStart(','));
}
return list.ToArray();
}
private void button1_Click(object sender, RoutedEventArgs e)
{
Console.WriteLine(SplitCSV("111,222,\"33,44,55\",666,\"77,88\",\"99\""));
}
Warning As per #MrE's comment - if a rogue new line character appears in a badly formed csv file and you end up with an uneven ("string) you'll get catastrophic backtracking (https://www.regular-expressions.info/catastrophic.html) in your regex and your system will likely crash (like our production system did). Can easily be replicated in Visual Studio and as I've discovered will crash it. A simple try/catch will not trap this issue either.
You should use:
(?:^|,)(\"(?:[^\"])*\"|[^,]*)
instead
Fast and easy:
public static string[] SplitCsv(string line)
{
List<string> result = new List<string>();
StringBuilder currentStr = new StringBuilder("");
bool inQuotes = false;
for (int i = 0; i < line.Length; i++) // For each character
{
if (line[i] == '\"') // Quotes are closing or opening
inQuotes = !inQuotes;
else if (line[i] == ',') // Comma
{
if (!inQuotes) // If not in quotes, end of current string, add it to result
{
result.Add(currentStr.ToString());
currentStr.Clear();
}
else
currentStr.Append(line[i]); // If in quotes, just add it
}
else // Add any other character to current string
currentStr.Append(line[i]);
}
result.Add(currentStr.ToString());
return result.ToArray(); // Return array of all strings
}
With this string as input :
111,222,"33,44,55",666,"77,88","99"
It will return :
111
222
33,44,55
666
77,88
99
i really like jimplode's answer, but I think a version with yield return is a little bit more useful, so here it is:
public IEnumerable<string> SplitCSV(string input)
{
Regex csvSplit = new Regex("(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)", RegexOptions.Compiled);
foreach (Match match in csvSplit.Matches(input))
{
yield return match.Value.TrimStart(',');
}
}
Maybe it's even more useful to have it like an extension method:
public static class StringHelper
{
public static IEnumerable<string> SplitCSV(this string input)
{
Regex csvSplit = new Regex("(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)", RegexOptions.Compiled);
foreach (Match match in csvSplit.Matches(input))
{
yield return match.Value.TrimStart(',');
}
}
}
This regular expression works without the need to loop through values and TrimStart(','), like in the accepted answer:
((?<=\")[^\"]*(?=\"(,|$)+)|(?<=,|^)[^,\"]*(?=,|$))
Here is the implementation in C#:
string values = "111,222,\"33,44,55\",666,\"77,88\",\"99\"";
MatchCollection matches = new Regex("((?<=\")[^\"]*(?=\"(,|$)+)|(?<=,|^)[^,\"]*(?=,|$))").Matches(values);
foreach (var match in matches)
{
Console.WriteLine(match);
}
Outputs
111
222
33,44,55
666
77,88
99
None of these answers work when the string has a comma inside quotes, as in "value, 1", or escaped double-quotes, as in "value ""1""", which are valid CSV that should be parsed as value, 1 and value "1", respectively.
This will also work with the tab-delimited format if you pass in a tab instead of a comma as your delimiter.
public static IEnumerable<string> SplitRow(string row, char delimiter = ',')
{
var currentString = new StringBuilder();
var inQuotes = false;
var quoteIsEscaped = false; //Store when a quote has been escaped.
row = string.Format("{0}{1}", row, delimiter); //We add new cells at the delimiter, so append one for the parser.
foreach (var character in row.Select((val, index) => new {val, index}))
{
if (character.val == delimiter) //We hit a delimiter character...
{
if (!inQuotes) //Are we inside quotes? If not, we've hit the end of a cell value.
{
Console.WriteLine(currentString);
yield return currentString.ToString();
currentString.Clear();
}
else
{
currentString.Append(character.val);
}
} else {
if (character.val != ' ')
{
if(character.val == '"') //If we've hit a quote character...
{
if(character.val == '\"' && inQuotes) //Does it appear to be a closing quote?
{
if (row[character.index + 1] == character.val) //If the character afterwards is also a quote, this is to escape that (not a closing quote).
{
quoteIsEscaped = true; //Flag that we are escaped for the next character. Don't add the escaping quote.
}
else if (quoteIsEscaped)
{
quoteIsEscaped = false; //This is an escaped quote. Add it and revert quoteIsEscaped to false.
currentString.Append(character.val);
}
else
{
inQuotes = false;
}
}
else
{
if (!inQuotes)
{
inQuotes = true;
}
else
{
currentString.Append(character.val); //...It's a quote inside a quote.
}
}
}
else
{
currentString.Append(character.val);
}
}
else
{
if (!string.IsNullOrWhiteSpace(currentString.ToString())) //Append only if not new cell
{
currentString.Append(character.val);
}
}
}
}
}
With minor updates to the function provided by "Chad Hedgcock".
Updates are on:
Line 26: character.val == '\"' - This can never be true due to the check made on Line 24. i.e. character.val == '"'
Line 28: if (row[character.index + 1] == character.val) added !quoteIsEscaped to escape 3 consecutive quotes.
public static IEnumerable<string> SplitRow(string row, char delimiter = ',')
{
var currentString = new StringBuilder();
var inQuotes = false;
var quoteIsEscaped = false; //Store when a quote has been escaped.
row = string.Format("{0}{1}", row, delimiter); //We add new cells at the delimiter, so append one for the parser.
foreach (var character in row.Select((val, index) => new {val, index}))
{
if (character.val == delimiter) //We hit a delimiter character...
{
if (!inQuotes) //Are we inside quotes? If not, we've hit the end of a cell value.
{
//Console.WriteLine(currentString);
yield return currentString.ToString();
currentString.Clear();
}
else
{
currentString.Append(character.val);
}
} else {
if (character.val != ' ')
{
if(character.val == '"') //If we've hit a quote character...
{
if(character.val == '"' && inQuotes) //Does it appear to be a closing quote?
{
if (row[character.index + 1] == character.val && !quoteIsEscaped) //If the character afterwards is also a quote, this is to escape that (not a closing quote).
{
quoteIsEscaped = true; //Flag that we are escaped for the next character. Don't add the escaping quote.
}
else if (quoteIsEscaped)
{
quoteIsEscaped = false; //This is an escaped quote. Add it and revert quoteIsEscaped to false.
currentString.Append(character.val);
}
else
{
inQuotes = false;
}
}
else
{
if (!inQuotes)
{
inQuotes = true;
}
else
{
currentString.Append(character.val); //...It's a quote inside a quote.
}
}
}
else
{
currentString.Append(character.val);
}
}
else
{
if (!string.IsNullOrWhiteSpace(currentString.ToString())) //Append only if not new cell
{
currentString.Append(character.val);
}
}
}
}
}
For Jay's answer, if you use a 2nd boolean then you can have nested double-quotes inside single-quotes and vice-versa.
private string[] splitString(string stringToSplit)
{
char[] characters = stringToSplit.ToCharArray();
List<string> returnValueList = new List<string>();
string tempString = "";
bool blockUntilEndQuote = false;
bool blockUntilEndQuote2 = false;
int characterCount = 0;
foreach (char character in characters)
{
characterCount = characterCount + 1;
if (character == '"' && !blockUntilEndQuote2)
{
if (blockUntilEndQuote == false)
{
blockUntilEndQuote = true;
}
else if (blockUntilEndQuote == true)
{
blockUntilEndQuote = false;
}
}
if (character == '\'' && !blockUntilEndQuote)
{
if (blockUntilEndQuote2 == false)
{
blockUntilEndQuote2 = true;
}
else if (blockUntilEndQuote2 == true)
{
blockUntilEndQuote2 = false;
}
}
if (character != ',')
{
tempString = tempString + character;
}
else if (character == ',' && (blockUntilEndQuote == true || blockUntilEndQuote2 == true))
{
tempString = tempString + character;
}
else
{
returnValueList.Add(tempString);
tempString = "";
}
if (characterCount == characters.Length)
{
returnValueList.Add(tempString);
tempString = "";
}
}
string[] returnValue = returnValueList.ToArray();
return returnValue;
}
The original version
Currently I use the following regex:
public static Regex regexCSVSplit = new Regex(#"(?x:(
(?<FULL>
(^|[,;\t\r\n])\s*
( (?<QUODAT> (?<QUO>[""'])(?<DAT>([^,;\t\r\n]|(?<!\k<QUO>\s*)[,;\t\r\n])*)\k<QUO>) |
(?<QUODAT> (?<DAT> [^""',;\s\r\n]* )) )
(?=\s*([,;\t\r\n]|$))
) |
(?<FULL>
(^|[\s\t\r\n])
( (?<QUODAT> (?<QUO>[""'])(?<DAT> [^""',;\s\t\r\n]* )\k<QUO>) |
(?<QUODAT> (?<DAT> [^""',;\s\t\r\n]* )) )
(?=[,;\s\t\r\n]|$)
)
))", RegexOptions.Compiled);
This solution can handle pretty chaotic cases too like below:
This is how to feed the result into an array:
var data = regexCSVSplit.Matches(line_to_process).Cast<Match>().
Select(x => x.Groups["DAT"].Value).ToArray();
See this example in action HERE
Note: The regular expression contains two set of <FULL> block and each of them contains two <QUODAT> block separated by "or" (|). Depending on your task you may only need one of them.
Note: That this regular expression gives us one string array, and works on single line with or without <carrier return> and/or <line feed>.
Simplified version
The following regular expression will already cover many complex cases:
public static Regex regexCSVSplit = new Regex(#"(?x:(
(?<FULL>
(^|[,;\t\r\n])\s*
(?<QUODAT> (?<QUO>[""'])(?<DAT>([^,;\t\r\n]|(?<!\k<QUO>\s*)[,;\t\r\n])*)\k<QUO>)
(?=\s*([,;\t\r\n]|$))
)
))", RegexOptions.Compiled);
See this example in action: HERE
It can process complex, easy and empty items too:
This is how to feed the result into an array:
var data = regexCSVSplit.Matches(line_to_process).Cast<Match>().
Select(x => x.Groups["DAT"].Value).ToArray();
The main rule here is that every item may contain anything but the <quotation mark><separators><comma> sequence AND each item shall being and end with the same <quotation mark>.
<quotation mark>: <">, <'>
<comma>: <,>, <;>, <tab>, <carrier return>, <line feed>
Edit notes: I added some more explanation to make it easier to understand and replaces the text "CO" with "QUO".
Try this:
string s = #"111,222,""33,44,55"",666,""77,88"",""99""";
List<string> result = new List<string>();
var splitted = s.Split('"').ToList<string>();
splitted.RemoveAll(x => x == ",");
foreach (var it in splitted)
{
if (it.StartsWith(",") || it.EndsWith(","))
{
var tmp = it.TrimEnd(',').TrimStart(',');
result.AddRange(tmp.Split(','));
}
else
{
if(!string.IsNullOrEmpty(it)) result.Add(it);
}
}
//Results:
foreach (var it in result)
{
Console.WriteLine(it);
}
I know I'm a bit late to this, but for searches, here is how I did what you are asking about in C sharp
private string[] splitString(string stringToSplit)
{
char[] characters = stringToSplit.ToCharArray();
List<string> returnValueList = new List<string>();
string tempString = "";
bool blockUntilEndQuote = false;
int characterCount = 0;
foreach (char character in characters)
{
characterCount = characterCount + 1;
if (character == '"')
{
if (blockUntilEndQuote == false)
{
blockUntilEndQuote = true;
}
else if (blockUntilEndQuote == true)
{
blockUntilEndQuote = false;
}
}
if (character != ',')
{
tempString = tempString + character;
}
else if (character == ',' && blockUntilEndQuote == true)
{
tempString = tempString + character;
}
else
{
returnValueList.Add(tempString);
tempString = "";
}
if (characterCount == characters.Length)
{
returnValueList.Add(tempString);
tempString = "";
}
}
string[] returnValue = returnValueList.ToArray();
return returnValue;
}
Don't reinvent a CSV parser, try FileHelpers.
I needed something a little more robust, so I took from here and created this... This solution is a little less elegant and a little more verbose, but in my testing (with a 1,000,000 row sample), I found this to be 2 to 3 times faster. Plus it handles non-escaped, embedded quotes. I used string delimiter and qualifiers instead of chars because of the requirements of my solution. I found it more difficult than I expected to find a good, generic CSV parser so I hope this parsing algorithm can help someone.
public static string[] SplitRow(string record, string delimiter, string qualifier, bool trimData)
{
// In-Line for example, but I implemented as string extender in production code
Func <string, int, int> IndexOfNextNonWhiteSpaceChar = delegate (string source, int startIndex)
{
if (startIndex >= 0)
{
if (source != null)
{
for (int i = startIndex; i < source.Length; i++)
{
if (!char.IsWhiteSpace(source[i]))
{
return i;
}
}
}
}
return -1;
};
var results = new List<string>();
var result = new StringBuilder();
var inQualifier = false;
var inField = false;
// We add new columns at the delimiter, so append one for the parser.
var row = $"{record}{delimiter}";
for (var idx = 0; idx < row.Length; idx++)
{
// A delimiter character...
if (row[idx]== delimiter[0])
{
// Are we inside qualifier? If not, we've hit the end of a column value.
if (!inQualifier)
{
results.Add(trimData ? result.ToString().Trim() : result.ToString());
result.Clear();
inField = false;
}
else
{
result.Append(row[idx]);
}
}
// NOT a delimiter character...
else
{
// ...Not a space character
if (row[idx] != ' ')
{
// A qualifier character...
if (row[idx] == qualifier[0])
{
// Qualifier is closing qualifier...
if (inQualifier && row[IndexOfNextNonWhiteSpaceChar(row, idx + 1)] == delimiter[0])
{
inQualifier = false;
continue;
}
else
{
// ...Qualifier is opening qualifier
if (!inQualifier)
{
inQualifier = true;
}
// ...It's a qualifier inside a qualifier.
else
{
inField = true;
result.Append(row[idx]);
}
}
}
// Not a qualifier character...
else
{
result.Append(row[idx]);
inField = true;
}
}
// ...A space character
else
{
if (inQualifier || inField)
{
result.Append(row[idx]);
}
}
}
}
return results.ToArray<string>();
}
Some test code:
//var input = "111,222,\"33,44,55\",666,\"77,88\",\"99\"";
var input =
"111, 222, \"99\",\"33,44,55\" , \"666 \"mark of a man\"\", \" spaces \"77,88\" \"";
Console.WriteLine("Split with trim");
Console.WriteLine("---------------");
var result = SplitRow(input, ",", "\"", true);
foreach (var r in result)
{
Console.WriteLine(r);
}
Console.WriteLine("");
// Split 2
Console.WriteLine("Split with no trim");
Console.WriteLine("------------------");
var result2 = SplitRow(input, ",", "\"", false);
foreach (var r in result2)
{
Console.WriteLine(r);
}
Console.WriteLine("");
// Time Trial 1
Console.WriteLine("Experimental Process (1,000,000) iterations");
Console.WriteLine("-------------------------------------------");
watch = Stopwatch.StartNew();
for (var i = 0; i < 1000000; i++)
{
var x1 = SplitRow(input, ",", "\"", false);
}
watch.Stop();
elapsedMs = watch.ElapsedMilliseconds;
Console.WriteLine($"Total Process Time: {string.Format("{0:0.###}", elapsedMs / 1000.0)} Seconds");
Console.WriteLine("");
Results
Split with trim
---------------
111
222
99
33,44,55
666 "mark of a man"
spaces "77,88"
Split with no trim
------------------
111
222
99
33,44,55
666 "mark of a man"
spaces "77,88"
Original Process (1,000,000) iterations
-------------------------------
Total Process Time: 7.538 Seconds
Experimental Process (1,000,000) iterations
--------------------------------------------
Total Process Time: 3.363 Seconds
I once had to do something similar and in the end I got stuck with Regular Expressions. The inability for Regex to have state makes it pretty tricky - I just ended up writing a simple little parser.
If you're doing CSV parsing you should just stick to using a CSV parser - don't reinvent the wheel.
Here is my fastest implementation based upon string raw pointer manipulation:
string[] FastSplit(string sText, char? cSeparator = null, char? cQuotes = null)
{
string[] oTokens;
if (null == cSeparator)
{
cSeparator = DEFAULT_PARSEFIELDS_SEPARATOR;
}
if (null == cQuotes)
{
cQuotes = DEFAULT_PARSEFIELDS_QUOTE;
}
unsafe
{
fixed (char* lpText = sText)
{
#region Fast array estimatation
char* lpCurrent = lpText;
int nEstimatedSize = 0;
while (0 != *lpCurrent)
{
if (cSeparator == *lpCurrent)
{
nEstimatedSize++;
}
lpCurrent++;
}
nEstimatedSize++; // Add EOL char(s)
string[] oEstimatedTokens = new string[nEstimatedSize];
#endregion
#region Parsing
char[] oBuffer = new char[sText.Length];
int nIndex = 0;
int nTokens = 0;
lpCurrent = lpText;
while (0 != *lpCurrent)
{
if (cQuotes == *lpCurrent)
{
// Quotes parsing
lpCurrent++; // Skip quote
nIndex = 0; // Reset buffer
while (
(0 != *lpCurrent)
&& (cQuotes != *lpCurrent)
)
{
oBuffer[nIndex] = *lpCurrent; // Store char
lpCurrent++; // Move source cursor
nIndex++; // Move target cursor
}
}
else if (cSeparator == *lpCurrent)
{
// Separator char parsing
oEstimatedTokens[nTokens++] = new string(oBuffer, 0, nIndex); // Store token
nIndex = 0; // Skip separator and Reset buffer
}
else
{
// Content parsing
oBuffer[nIndex] = *lpCurrent; // Store char
nIndex++; // Move target cursor
}
lpCurrent++; // Move source cursor
}
// Recover pending buffer
if (nIndex > 0)
{
// Store token
oEstimatedTokens[nTokens++] = new string(oBuffer, 0, nIndex);
}
// Build final tokens list
if (nTokens == nEstimatedSize)
{
oTokens = oEstimatedTokens;
}
else
{
oTokens = new string[nTokens];
Array.Copy(oEstimatedTokens, 0, oTokens, 0, nTokens);
}
#endregion
}
}
// Epilogue
return oTokens;
}
Try this
private string[] GetCommaSeperatedWords(string sep, string line)
{
List<string> list = new List<string>();
StringBuilder word = new StringBuilder();
int doubleQuoteCount = 0;
for (int i = 0; i < line.Length; i++)
{
string chr = line[i].ToString();
if (chr == "\"")
{
if (doubleQuoteCount == 0)
doubleQuoteCount++;
else
doubleQuoteCount--;
continue;
}
if (chr == sep && doubleQuoteCount == 0)
{
list.Add(word.ToString());
word = new StringBuilder();
continue;
}
word.Append(chr);
}
list.Add(word.ToString());
return list.ToArray();
}
This is Chad's answer rewritten with state based logic. His answered failed for me when it came across """BRAD""" as a field. That should return "BRAD" but it just ate up all the remaining fields. When I tried to debug it I just ended up rewriting it as state based logic:
enum SplitState { s_begin, s_infield, s_inquotefield, s_foundquoteinfield };
public static IEnumerable<string> SplitRow(string row, char delimiter = ',')
{
var currentString = new StringBuilder();
SplitState state = SplitState.s_begin;
row = string.Format("{0}{1}", row, delimiter); //We add new cells at the delimiter, so append one for the parser.
foreach (var character in row.Select((val, index) => new { val, index }))
{
//Console.WriteLine("character = " + character.val + " state = " + state);
switch (state)
{
case SplitState.s_begin:
if (character.val == delimiter)
{
/* empty field */
yield return currentString.ToString();
currentString.Clear();
} else if (character.val == '"')
{
state = SplitState.s_inquotefield;
} else
{
currentString.Append(character.val);
state = SplitState.s_infield;
}
break;
case SplitState.s_infield:
if (character.val == delimiter)
{
/* field with data */
yield return currentString.ToString();
state = SplitState.s_begin;
currentString.Clear();
} else
{
currentString.Append(character.val);
}
break;
case SplitState.s_inquotefield:
if (character.val == '"')
{
// could be end of field, or escaped quote.
state = SplitState.s_foundquoteinfield;
} else
{
currentString.Append(character.val);
}
break;
case SplitState.s_foundquoteinfield:
if (character.val == '"')
{
// found escaped quote.
currentString.Append(character.val);
state = SplitState.s_inquotefield;
}
else if (character.val == delimiter)
{
// must have been last quote so we must find delimiter
yield return currentString.ToString();
state = SplitState.s_begin;
currentString.Clear();
}
else
{
throw new Exception("Quoted field not terminated.");
}
break;
default:
throw new Exception("unknown state:" + state);
}
}
//Console.WriteLine("currentstring = " + currentString.ToString());
}
This is a lot more lines of code than the other solutions, but it is easy to modify to add edge cases.

How can I use indexof and substring to find words in a string?

In the constructor :
var tempFR = File.ReadAllText(file);
GetResults(tempFR);
Then :
private List<string> GetResults(string file)
{
List<string> results = new List<string>();
string word = textBox1.Text;
string[] words = word.Split(new string[] { ",," }, StringSplitOptions.None);
for(int i = 0; i < words.Length; i++)
{
int start = file.IndexOf(words[i], 0);
results.Add(file.Substring(start));
}
return results;
}
words contains in this case 3 words System , public , test
I want to find all the words in file and add them to the list results using indexof and substring.
The way it is now start value is -1 all the time.
To clear some things.
This is a screenshot of the textBox1 :
That is why I'm using two commas to split and get the words.
This screenshot showing the words after split them from the textBox1 :
And this is the file string content :
I want to add to the List results all the words in the file.
When looking at the last screenshot there should be 11 results.
Three time the word using three times the word system five times the word public.
but the variable start is -1
Update :
Tried Barns solution/s but for me it's not working good.
First the code that make a search and then loop over the files and reporting to backgroundworker :
int numberofdirs = 0;
void DirSearch(string rootDirectory, string filesExtension, string[] textToSearch, BackgroundWorker worker, DoWorkEventArgs e)
{
List<string> filePathList = new List<string>();
int numberoffiles = 0;
try
{
filePathList = SearchAccessibleFilesNoDistinct(rootDirectory, null, worker, e).ToList();
}
catch (Exception err)
{
}
label21.Invoke((MethodInvoker)delegate
{
label21.Text = "Phase 2: Searching in files";
});
MyProgress myp = new MyProgress();
myp.Report4 = filePathList.Count.ToString();
foreach (string file in filePathList)
{
try
{
var tempFR = File.ReadAllText(file);
_busy.WaitOne();
if (worker.CancellationPending == true)
{
e.Cancel = true;
return;
}
bool reportedFile = false;
for (int i = 0; i < textToSearch.Length; i++)
{
if (tempFR.IndexOf(textToSearch[i], StringComparison.InvariantCultureIgnoreCase) >= 0)
{
if (!reportedFile)
{
numberoffiles++;
myp.Report1 = file;
myp.Report2 = numberoffiles.ToString();
myp.Report3 = textToSearch[i];
myp.Report5 = FindWordsWithtRegex(tempFR, textToSearch);
backgroundWorker1.ReportProgress(0, myp);
reportedFile = true;
}
}
}
numberofdirs++;
label1.Invoke((MethodInvoker)delegate
{
label1.Text = string.Format("{0}/{1}", numberofdirs, myp.Report4);
label1.Visible = true;
});
}
catch (Exception err)
{
}
}
}
I have the words array already in textToSearch and the file content in tempFR then I'm using the first solution of Barns :
private List<string> FindWordsWithtRegex(string filecontent, string[] words)
{
var res = new List<string>();
foreach (var word in words)
{
Regex reg = new Regex(word);
var c = reg.Matches(filecontent);
int k = 0;
foreach (var g in c)
{
Console.WriteLine(g.ToString());
res.Add(g + ":" + k++);
}
}
Console.WriteLine("Results of FindWordsWithtRegex");
res.ForEach(f => Console.WriteLine(f));
Console.WriteLine();
return res;
}
But the results I'm getting in the List res is not the same output in Barns solution/s this is the results I'm getting the List res for the first file :
In this case two words system and using but it found only the using 3 times but there is also system 3 times in the file content. and the output format is not the same as in the Barns solutions :
Here is an alternative using Regex instead of using IndexOf. Note I have created my own string to parse, so my results will be a bit different.
EDIT
private List<string> FindWordsWithCountRegex(string filecontent, string[] words)
{
var res = new List<string>();
foreach (var word in words)
{
Regex reg = new Regex(word, RegexOptions.IgnoreCase);
var c = reg.Matches(filecontent).Count();
res.Add(word + ":" + c);
}
return res;
}
Simple change this part and use a single char typically a space not a comma:
string[] words = word.Split(' ');
int start = file.IndexOf(words[i],0);
start will be -1 if the word is not found.
MSDN: IndexOf(String, Int32)
for(int i = 0; i < words.Length; i++)
{
int start = file.IndexOf(words[i], 0);
// only add to results if word is found (index >= 0)
if (start >= 0) results.Add(file.Substring(start));
}
If you want all appearance of the words you need an extra loop
int fileLength = file.Length;
for(int i = 0; i < words.Length; i++)
{
int startIdx = 0;
while (startIdx < fileLength ){
int idx = file.IndexOf(words[i], startIdx]);
if (start >= 0) {
// add to results
results.Add(file.Substring(start));
// and let Word-search continue from last found Word Position Ending
startIdx = (start + words.Length);
}
}
int start = file.IndexOf(words[i], 0);
// only add to results if word is found (index >= 0)
if (start >= 0) results.Add(file.Substring(start));
}
MayBe you want a caseinsensitiv search
file.IndexOf(words[i], 0, StringComparison.CurrentCultureIgnoreCase); MSDN: StringComparer Class

c# Add Specific columns from a TEXT file to DataGridView

Hello Everyone,
As shown in the above image I want to add the decimal numbers column wise from a text file to datagrid control.
Following is my code snippet
List<string> str = new List<string>();
String st = "";
int k = 0;
string[] s ;
//Path to write contents to text file
string filename = #"E:\Vivek\contentcopy\clientlist.txt";
Form.CheckForIllegalCrossThreadCalls = false;
OpenFileDialog ofd = new OpenFileDialog();
ofd.FileName = "";
ofd.ShowDialog();
st = ofd.FileName;
if (string.IsNullOrEmpty(ofd.FileName))
return;
string Name = "", No1 = "",No2="";
string[] lines = File.ReadAllLines(st).Where(sw => !string.IsNullOrWhiteSpace(sw)).ToArray();
for (int i = 0; i < lines.Length; i++)
{
if (lines[i].Contains("VENTURA SECURITIES LIMITED (NSE F&O)")) continue;
if (lines[i].Contains("ALL EXCHANGES DERIVATIVES CLIENTWISE STATEMENT AS ON 16-05-2012")) continue;
if (lines[i].Contains("-------------------------------------------------------")) continue;
s = lines[i].Split(' ');
if (s[0] == "PARTY" || s[0] == "") continue;
int z;
Name = "";
for (z = 1; z < s.Length; z++)
{
if (s[z] == "") continue;
if (s[z].Contains('.'))
{
No1+=s[z]+" ";
No2 = No1 + " ";
}
else
{
Name += s[z];
str.Add(s[0]+" "+Name);
}
}
dataGridView1.Rows.Add();
dataGridView1.Rows[k].Cells[0].Value = s[0];
dataGridView1.Rows[k].Cells[1].Value = Name;
dataGridView1.Rows[k].Cells[2].Value = No1;
dataGridView1.Rows[k].Cells[3].Value = No2;
k++;
}
File.WriteAllLines(filename, str);
dataGridView1.ReadOnly = true;
}
The line No1=s[z] directly takes the last column values ie 46,123.19 and so on.I want to fetch each column from the text file and store it in a string variable and then assign it to the datagrid view
I hope my doubt is clear.If not please let me know
Here is the simplest Solution:
Add a DataGrid View to Form and add a Button:
private void button1_Click(object sender, EventArgs e)
{
ReadAndFileter();
}
private void ReadAndFileter()
{
try
{
using(System.IO.StreamReader reader = new System.IO.StreamReader("file.txt"))
{
string line;
string []array;
int rowcount= 0;
decimal number;
string[] separators = { "\t", " " };
int columnCount = 0;
while ((line = reader.ReadLine()) != null)
{
array = line.Split(separators, StringSplitOptions.RemoveEmptyEntries);
dataGridView1.Rows.Add();
foreach (string str in array)
{
if (Decimal.TryParse(str,out number))
{
dataGridView1.Rows[rowcount].Cells[columnCount++].Value = number;
}
}
rowcount++;
columnCount = 0;
}
}
}
catch (Exception ex)
{
}
}
The File Contents are:
Abc 20.122 69.33 0.00 693.25 0.00
def 36.20 96.20 1.15 69.56 8.96
And the final output:
Lets say, you have for lines in your test file, then u need to do following things:
Use StreamReader.ReadLine(), to read one line at time.
Spilt the line using split(' ') and store it in a array
Remove all the empty ones from the array
Now at index 2,3,4,5,6 of the resulting array will have the string equivalent of the decimal numbers.
Repeat this for each StreamReader.ReadLine()
Hope this will help.
Your problem is that you are overwriting No1 every time you read a string, which explains why you only get the last value. What you could do is either;
Append the string:
No1 += s[z] + " ";
Which will put all the values behind eachother, seperated by a whitespace.
Or, you could make a List<String> and add each value to the list, meaning you have them stored seperated:
List<String> values = new List<String>();
foreach(...)
{
if (s[z] == "") continue;
if (s[z].Contains('.'))
{
values.Add(s[z])
}
else
{
Name += s[z];
str.Add(s[0] + " " + Name);
}
}
You can thereafter loop through the list and add each value to a row. Considering your code piece;
int i = 2;
foreach(string value in values)
{
dataGridView1.Rows[k].Cells[i].Value = value;
i++;
}
This should work.
Hope this helps.
Here is edited code: but for future I must suggest to give a try at least..
private void ReadAndFileter1()
{
try
{
using (System.IO.StreamReader reader = new System.IO.StreamReader("file.txt"))
{
string line;
string[] array;
int rowcount = 0;
decimal number;
string[] separators = { "\t", " " };
int columnCount = 1;
string[] lines = File.ReadAllLines("file.txt");
for (int i = 0; i < lines.Length; i++)
{
if (lines[i].Contains("VENTURA SECURITIES LIMITED (NSE F&O)")) continue;
if (lines[i].Contains("ALL EXCHANGES DERIVATIVES CLIENTWISE STATEMENT AS ON 16-05-2012")) continue;
if (lines[i].Contains("-------------------------------------------------------")) continue;
array = lines[i].Split(separators, StringSplitOptions.RemoveEmptyEntries);
if (array[0] == "PARTY" || array[0] == "") continue;
dataGridView1.Rows.Add();
foreach (string str in array)
{
if (Decimal.TryParse(str, out number))
{
dataGridView1.Rows[rowcount].Cells[columnCount++].Value = number;
}
}
dataGridView1.Rows[rowcount].Cells[0].Value = array[0];
rowcount++;
columnCount = 1;
}
}
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}
}
Here it is:
static void Main(string[] args)
{
Decimal result;
string[] splitchar = new string[]{" "};
using(StreamReader reader = new StreamReader(#"C:\Users\Dell\Desktop\input.txt"))
{
while(!reader.EndOfStream)
{
string[] splittedArray = reader.ReadLine().Split(splitchar, StringSplitOptions.RemoveEmptyEntries).Where(x => Decimal.TryParse(x, out result)).ToArray();
// put your code here to get insert the values in datagrid
}
}
}

Read Text File from specific places

I have a question about read a text file, because i dont know if i'm thinking right. I want to read from specific string to specific character.
My text would look like this:
...
...
CM_ "Hello, how are you?
Rules: Don't smoke!
- love others
End";
...
CM_ "Why you?";
...// Many CM_
...
After Splited should look like that:
1. CM_
2. "Hello, how are you?
Rules: Don't smoke!
- love others
End"
3. CM_
4. "Why you?"
... // many CM_
I want to read from "CM_" till ";"
My Code i tried so far:
StreamReader fin = new StreamReader("text.txt");
string tmp = "";
tmp = fin.ReadToEnd();
if (tmp.StartsWith("CM_ ") && tmp.EndWith(";"))
{
var result = tmp.Split(new[] { '"' }).SelectMany((s, i) =>
{
if (i % 2 == 1) return new[] { s };
return s.Split(new[] { ' ', ';' }, StringSplitOptions.RemoveEmptyEntries);
}).ToList();
}
foreach (string x in result)
{
Console.WriteLine(x);
}
static void PRegex()
{
using (StreamReader fin = new StreamReader("text.txt"))
{
string tmp = fin.ReadToEnd();
var matches = Regex.Matches(tmp, "(CM_) ([^;]*);", RegexOptions.Singleline);
for (int i = 0; i < matches.Count; i++)
if (matches[i].Groups.Count == 3)
Console.WriteLine((2 * i + 1).ToString() + ". " + matches[i].Groups[1].Value + "\r\n" + (2 * (i + 1)).ToString() + ". " + matches[i].Groups[2].Value);
}
Console.ReadLine();
}
static void PLineByLine()
{
using (StreamReader fin = new StreamReader("text.txt"))
{
int index = 0;
string line = null;
string currentCMBlock = null;
bool endOfBlock = true;
while ((line = fin.ReadLine()) != null)
{
bool endOfLine = false;
while (!endOfLine)
{
if (endOfBlock)
{
int startIndex = line.IndexOf("CM_ ");
if (startIndex == -1)
{
endOfLine = true;
continue;
}
line = line.Substring(startIndex + 4, line.Length - startIndex - 4);
endOfBlock = false;
}
if (!endOfBlock)
{
int startIndex = line.IndexOf(";");
if (startIndex == -1)
{
currentCMBlock += line + "\r\n";
endOfLine = true;
continue;
}
currentCMBlock += line.Substring(0, startIndex);
if (!string.IsNullOrEmpty(currentCMBlock))
Console.WriteLine((++index) + ". CM_\r\n" + (++index) + ". " + currentCMBlock);
currentCMBlock = null;
line = line.Substring(startIndex + 1, line.Length - startIndex - 1);
endOfBlock = true;
}
}
}
}
Console.ReadLine();
}
You are reading the whole file into tmp. So, if there is any text before "CM_" then your conditional statement won't be entered.
Instead, try reading line by line with fin.ReadLine in a loop over all lines.
Read the whole file:
string FileToRead = File.ReadAllText("Path");
string GetContent(string StartAt, string EndAt, bool LastIndex)
{
string ReturnVal;
if(LastIndex)
{
ReturnVal = FileToRead.Remove(FileToRead.IndexOf(StartAt), FileToRead.IndexOf(EndAt));
Return ReturnVal;
}
else
{
ReturnVal = FileToRead.Remove(FileToRead.LastIndex(StartAt), FileToRead.LastIndex(EndAt));
Return ReturnVal;
}
}
-Hope I didn't do anything wrong here. (Free mind typing)
You read the file, and we remove all the content, infront of the first index. and all after it.
You can set it if will return the FIRST result found. or the last.
NOTE: I think it would be better to use a StringReader. (If I don't remember wrong...)
If you are to think about the memory usage of your application.
I tried something else, don't know if this is good. It still read the first Line, dont know that i did wrong here
my Code:
while ((tmp = fin.ReadLine()) != null)
{
if (tmp.StartsWith("CM_ "))
{
//string[] tmpList = tmp.Split(new Char[] { ' ', ';' }, StringSplitOptions.RemoveEmptyEntries);
var result = tmp.Split(new[] { '"' }).SelectMany((s, i) =>
{
if (i % 2 == 1) return new[] { s };
return s.Split(new[] { ' ', ';' }, StringSplitOptions.RemoveEmptyEntries);
}).ToList();
if (tmp.EndsWith(";")) break;
fin.ReadLine();
if (tmp.EndsWith(";"))
{
result.ToList();
break;
}
else
{
result.ToList();
fin.ReadLine();
}
foreach (string x in result)
{
Console.WriteLine(x);
}
}
I suggest you look into using Regular Expressions. It may be just what you need and much more flexible than Split().

Reading CSV file and storing values into an array

I am trying to read a *.csv-file.
The *.csv-file consist of two columns separated by semicolon (";").
I am able to read the *.csv-file using StreamReader and able to separate each line by using the Split() function. I want to store each column into a separate array and then display it.
Is it possible to do that?
You can do it like this:
using System.IO;
static void Main(string[] args)
{
using(var reader = new StreamReader(#"C:\test.csv"))
{
List<string> listA = new List<string>();
List<string> listB = new List<string>();
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
var values = line.Split(';');
listA.Add(values[0]);
listB.Add(values[1]);
}
}
}
My favourite CSV parser is one built into .NET library. This is a hidden treasure inside Microsoft.VisualBasic namespace.
Below is a sample code:
using Microsoft.VisualBasic.FileIO;
var path = #"C:\Person.csv"; // Habeeb, "Dubai Media City, Dubai"
using (TextFieldParser csvParser = new TextFieldParser(path))
{
csvParser.CommentTokens = new string[] { "#" };
csvParser.SetDelimiters(new string[] { "," });
csvParser.HasFieldsEnclosedInQuotes = true;
// Skip the row with the column names
csvParser.ReadLine();
while (!csvParser.EndOfData)
{
// Read current line fields, pointer moves to the next line.
string[] fields = csvParser.ReadFields();
string Name = fields[0];
string Address = fields[1];
}
}
Remember to add reference to Microsoft.VisualBasic
More details about the parser is given here: http://codeskaters.blogspot.ae/2015/11/c-easiest-csv-parser-built-in-net.html
LINQ way:
var lines = File.ReadAllLines("test.txt").Select(a => a.Split(';'));
var csv = from line in lines
select (from piece in line
select piece);
^^Wrong - Edit by Nick
It appears the original answerer was attempting to populate csv with a 2 dimensional array - an array containing arrays. Each item in the first array contains an array representing that line number with each item in the nested array containing the data for that specific column.
var csv = from line in lines
select (line.Split(',')).ToArray();
Just came across this library: https://github.com/JoshClose/CsvHelper
Very intuitive and easy to use. Has a nuget package too which made is quick to implement: https://www.nuget.org/packages/CsvHelper/27.2.1. Also appears to be actively maintained which I like.
Configuring it to use a semi-colon is easy: https://github.com/JoshClose/CsvHelper/wiki/Custom-Configurations
You can't create an array immediately because you need to know the number of rows from the beginning (and this would require to read the csv file twice)
You can store values in two List<T> and then use them or convert into an array using List<T>.ToArray()
Very simple example:
var column1 = new List<string>();
var column2 = new List<string>();
using (var rd = new StreamReader("filename.csv"))
{
while (!rd.EndOfStream)
{
var splits = rd.ReadLine().Split(';');
column1.Add(splits[0]);
column2.Add(splits[1]);
}
}
// print column1
Console.WriteLine("Column 1:");
foreach (var element in column1)
Console.WriteLine(element);
// print column2
Console.WriteLine("Column 2:");
foreach (var element in column2)
Console.WriteLine(element);
N.B.
Please note that this is just a very simple example. Using string.Split does not account for cases where some records contain the separator ; inside it.
For a safer approach, consider using some csv specific libraries like CsvHelper on nuget.
I usually use this parser from codeproject, since there's a bunch of character escapes and similar that it handles for me.
Here is my variation of the top voted answer:
var contents = File.ReadAllText(filename).Split('\n');
var csv = from line in contents
select line.Split(',').ToArray();
The csv variable can then be used as in the following example:
int headerRows = 5;
foreach (var row in csv.Skip(headerRows)
.TakeWhile(r => r.Length > 1 && r.Last().Trim().Length > 0))
{
String zerothColumnValue = row[0]; // leftmost column
var firstColumnValue = row[1];
}
If you need to skip (head-)lines and/or columns, you can use this to create a 2-dimensional array:
var lines = File.ReadAllLines(path).Select(a => a.Split(';'));
var csv = (from line in lines
select (from col in line
select col).Skip(1).ToArray() // skip the first column
).Skip(2).ToArray(); // skip 2 headlines
This is quite useful if you need to shape the data before you process it further (assuming the first 2 lines consist of the headline, and the first column is a row title - which you don't need to have in the array because you just want to regard the data).
N.B. You can easily get the headlines and the 1st column by using the following code:
var coltitle = (from line in lines
select line.Skip(1).ToArray() // skip 1st column
).Skip(1).Take(1).FirstOrDefault().ToArray(); // take the 2nd row
var rowtitle = (from line in lines select line[0] // take 1st column
).Skip(2).ToArray(); // skip 2 headlines
This code example assumes the following structure of your *.csv file:
Note: If you need to skip empty rows - which can by handy sometimes, you can do so by inserting
where line.Any(a=>!string.IsNullOrWhiteSpace(a))
between the from and the select statement in the LINQ code examples above.
You can use Microsoft.VisualBasic.FileIO.TextFieldParser dll in C# for better performance
get below code example from above article
static void Main()
{
string csv_file_path=#"C:\Users\Administrator\Desktop\test.csv";
DataTable csvData = GetDataTabletFromCSVFile(csv_file_path);
Console.WriteLine("Rows count:" + csvData.Rows.Count);
Console.ReadLine();
}
private static DataTable GetDataTabletFromCSVFile(string csv_file_path)
{
DataTable csvData = new DataTable();
try
{
using(TextFieldParser csvReader = new TextFieldParser(csv_file_path))
{
csvReader.SetDelimiters(new string[] { "," });
csvReader.HasFieldsEnclosedInQuotes = true;
string[] colFields = csvReader.ReadFields();
foreach (string column in colFields)
{
DataColumn datecolumn = new DataColumn(column);
datecolumn.AllowDBNull = true;
csvData.Columns.Add(datecolumn);
}
while (!csvReader.EndOfData)
{
string[] fieldData = csvReader.ReadFields();
//Making empty value as null
for (int i = 0; i < fieldData.Length; i++)
{
if (fieldData[i] == "")
{
fieldData[i] = null;
}
}
csvData.Rows.Add(fieldData);
}
}
}
catch (Exception ex)
{
}
return csvData;
}
Hi all, I created a static class for doing this.
+ column check
+ quota sign removal
public static class CSV
{
public static List<string[]> Import(string file, char csvDelimiter, bool ignoreHeadline, bool removeQuoteSign)
{
return ReadCSVFile(file, csvDelimiter, ignoreHeadline, removeQuoteSign);
}
private static List<string[]> ReadCSVFile(string filename, char csvDelimiter, bool ignoreHeadline, bool removeQuoteSign)
{
string[] result = new string[0];
List<string[]> lst = new List<string[]>();
string line;
int currentLineNumner = 0;
int columnCount = 0;
// Read the file and display it line by line.
using (System.IO.StreamReader file = new System.IO.StreamReader(filename))
{
while ((line = file.ReadLine()) != null)
{
currentLineNumner++;
string[] strAr = line.Split(csvDelimiter);
// save column count of dirst line
if (currentLineNumner == 1)
{
columnCount = strAr.Count();
}
else
{
//Check column count of every other lines
if (strAr.Count() != columnCount)
{
throw new Exception(string.Format("CSV Import Exception: Wrong column count in line {0}", currentLineNumner));
}
}
if (removeQuoteSign) strAr = RemoveQouteSign(strAr);
if (ignoreHeadline)
{
if(currentLineNumner !=1) lst.Add(strAr);
}
else
{
lst.Add(strAr);
}
}
}
return lst;
}
private static string[] RemoveQouteSign(string[] ar)
{
for (int i = 0;i< ar.Count() ; i++)
{
if (ar[i].StartsWith("\"") || ar[i].StartsWith("'")) ar[i] = ar[i].Substring(1);
if (ar[i].EndsWith("\"") || ar[i].EndsWith("'")) ar[i] = ar[i].Substring(0,ar[i].Length-1);
}
return ar;
}
}
I have spend few hours searching for a right library, but finally I wrote my own code :)
You can read file (or database) with whatever tools you want and then apply the following routine to each line:
private static string[] SmartSplit(string line, char separator = ',')
{
var inQuotes = false;
var token = "";
var lines = new List<string>();
for (var i = 0; i < line.Length; i++) {
var ch = line[i];
if (inQuotes) // process string in quotes,
{
if (ch == '"') {
if (i<line.Length-1 && line[i + 1] == '"') {
i++;
token += '"';
}
else inQuotes = false;
} else token += ch;
} else {
if (ch == '"') inQuotes = true;
else if (ch == separator) {
lines.Add(token);
token = "";
} else token += ch;
}
}
lines.Add(token);
return lines.ToArray();
}
var firstColumn = new List<string>();
var lastColumn = new List<string>();
// your code for reading CSV file
foreach(var line in file)
{
var array = line.Split(';');
firstColumn.Add(array[0]);
lastColumn.Add(array[1]);
}
var firstArray = firstColumn.ToArray();
var lastArray = lastColumn.ToArray();
Here's a special case where one of data field has semicolon (";") as part of it's data in that case most of answers above will fail.
Solution in that case will be
string[] csvRows = System.IO.File.ReadAllLines(FullyQaulifiedFileName);
string[] fields = null;
List<string> lstFields;
string field;
bool quoteStarted = false;
foreach (string csvRow in csvRows)
{
lstFields = new List<string>();
field = "";
for (int i = 0; i < csvRow.Length; i++)
{
string tmp = csvRow.ElementAt(i).ToString();
if(String.Compare(tmp,"\"")==0)
{
quoteStarted = !quoteStarted;
}
if (String.Compare(tmp, ";") == 0 && !quoteStarted)
{
lstFields.Add(field);
field = "";
}
else if (String.Compare(tmp, "\"") != 0)
{
field += tmp;
}
}
if(!string.IsNullOrEmpty(field))
{
lstFields.Add(field);
field = "";
}
// This will hold values for each column for current row under processing
fields = lstFields.ToArray();
}
The open-source Angara.Table library allows to load CSV into typed columns, so you can get the arrays from the columns. Each column can be indexed both by name or index. See http://predictionmachines.github.io/Angara.Table/saveload.html.
The library follows RFC4180 for CSV; it enables type inference and multiline strings.
Example:
using System.Collections.Immutable;
using Angara.Data;
using Angara.Data.DelimitedFile;
...
ReadSettings settings = new ReadSettings(Delimiter.Semicolon, false, true, null, null);
Table table = Table.Load("data.csv", settings);
ImmutableArray<double> a = table["double-column-name"].Rows.AsReal;
for(int i = 0; i < a.Length; i++)
{
Console.WriteLine("{0}: {1}", i, a[i]);
}
You can see a column type using the type Column, e.g.
Column c = table["double-column-name"];
Console.WriteLine("Column {0} is double: {1}", c.Name, c.Rows.IsRealColumn);
Since the library is focused on F#, you might need to add a reference to the FSharp.Core 4.4 assembly; click 'Add Reference' on the project and choose FSharp.Core 4.4 under "Assemblies" -> "Extensions".
I have been using csvreader.com(paid component) for years, and I have never had a problem. It is solid, small and fast, but you do have to pay for it. You can set the delimiter to whatever you like.
using (CsvReader reader = new CsvReader(s) {
reader.Settings.Delimiter = ';';
reader.ReadHeaders(); // if headers on a line by themselves. Makes reader.Headers[] available
while (reader.ReadRecord())
... use reader.Values[col_i] ...
}
I am just student working on my master's thesis, but this is the way I solved it and it worked well for me. First you select your file from directory (only in csv format) and then you put the data into the lists.
List<float> t = new List<float>();
List<float> SensorI = new List<float>();
List<float> SensorII = new List<float>();
List<float> SensorIII = new List<float>();
using (OpenFileDialog dialog = new OpenFileDialog())
{
try
{
dialog.Filter = "csv files (*.csv)|*.csv";
dialog.Multiselect = false;
dialog.InitialDirectory = ".";
dialog.Title = "Select file (only in csv format)";
if (dialog.ShowDialog() == DialogResult.OK)
{
var fs = File.ReadAllLines(dialog.FileName).Select(a => a.Split(';'));
int counter = 0;
foreach (var line in fs)
{
counter++;
if (counter > 2) // Skip first two headder lines
{
this.t.Add(float.Parse(line[0]));
this.SensorI.Add(float.Parse(line[1]));
this.SensorII.Add(float.Parse(line[2]));
this.SensorIII.Add(float.Parse(line[3]));
}
}
}
}
catch (Exception exc)
{
MessageBox.Show(
"Error while opening the file.\n" + exc.Message,
this.Text,
MessageBoxButtons.OK,
MessageBoxIcon.Error
);
}
}
This is my 2 simple static methods to convert text from csv file to List<List<string>> and vice versa. Each method use row convertor.
This code should take into account all the possibilities of the csv file. You can define own csv separator and this methods try to correct escape double 'quote' char, and deals with the situation when all text in quotes are one cell and csv separator is inside quoted string including multiple lines in one cell and can ignore empty rows.
Last method is only for testing. So you can ignore it, or test your own, or others solution with this test method :). For testing I used this hard csv with 2 rows on 4 lines:
0,a,""bc,d
"e, f",g,"this,is, o
ne ""lo
ng, cell""",h
This is final code. For simplicity, I removed all try catch blocks.
using System;
using System.Collections.Generic;
using System.Linq;
public static class Csv {
public static string FromListToString(List<List<string>> csv, string separator = ",", char quotation = '"', bool returnFirstRow = true)
{
string content = "";
for (int row = 0; row < csv.Count; row++) {
content += (row > 0 ? Environment.NewLine : "") + RowFromListToString(csv[row], separator, quotation);
}
return content;
}
public static List<List<string>> FromStringToList(string content, string separator = ",", char quotation = '"', bool returnFirstRow = true, bool ignoreEmptyRows = true)
{
List<List<string>> csv = new List<List<string>>();
string[] rows = content.Split(new string[] { Environment.NewLine }, StringSplitOptions.None);
if (rows.Length <= (returnFirstRow ? 0 : 1)) { return csv; }
List<string> csvRow = null;
for (int rowIndex = 0; rowIndex < rows.Length; rowIndex++) {
(List<string> row, bool rowClosed) = RowFromStringToList(rows[rowIndex], csvRow, separator, quotation);
if (rowClosed) { if (!ignoreEmptyRows || row.Any(rowItem => rowItem.Length > 0)) { csv.Add(row); csvRow = null; } } // row ok, add to list
else { csvRow = row; } // not fully created, continue
}
if (!returnFirstRow) { csv.RemoveAt(0); } // remove header
return csv;
}
public static string RowFromListToString(List<string> csvData, string separator = ",", char quotation = '"')
{
csvData = csvData.Select(element =>
{
if (element.Contains(quotation)) {
element = element.Replace(quotation.ToString(), quotation.ToString() + quotation.ToString());
}
if (element.Contains(separator) || element.Contains(Environment.NewLine)) {
element = "\"" + element + "\"";
}
return element;
}).ToList();
return string.Join(separator, csvData);
}
public static (List<string>, bool) RowFromStringToList(string csvRow, List<string> continueWithRow = null, string separator = ",", char quotation = '"')
{
bool rowClosed = true;
if (continueWithRow != null && continueWithRow.Count > 0) {
// in previous result quotation are fixed so i need convert back to double quotation
string previousCell = quotation.ToString() + continueWithRow.Last().Replace(quotation.ToString(), quotation.ToString() + quotation.ToString()) + Environment.NewLine;
continueWithRow.RemoveAt(continueWithRow.Count - 1);
csvRow = previousCell + csvRow;
}
char tempQuote = (char)162;
while (csvRow.Contains(tempQuote)) { tempQuote = (char)(tempQuote + 1); }
char tempSeparator = (char)(tempQuote + 1);
while (csvRow.Contains(tempSeparator)) { tempSeparator = (char)(tempSeparator + 1); }
csvRow = csvRow.Replace(quotation.ToString() + quotation.ToString(), tempQuote.ToString());
if(csvRow.Split(new char[] { quotation }, StringSplitOptions.None).Length % 2 == 0) { rowClosed = !rowClosed; }
string[] csvSplit = csvRow.Split(new string[] { separator }, StringSplitOptions.None);
List<string> csvList = csvSplit
.ToList()
.Aggregate("",
(string row, string item) => {
if (row.Count((ch) => ch == quotation) % 2 == 0) { return row + (row.Length > 0 ? tempSeparator.ToString() : "") + item; }
else { return row + separator + item; }
},
(string row) => row.Split(tempSeparator).Select((string item) => item.Trim(quotation).Replace(tempQuote, quotation))
).ToList();
if (continueWithRow != null && continueWithRow.Count > 0) {
return (continueWithRow.Concat(csvList).ToList(), rowClosed);
}
return (csvList, rowClosed);
}
public static bool Test()
{
string csvText = "0,a,\"\"bc,d" + Environment.NewLine + "\"e, f\",g,\"this,is, o" + Environment.NewLine + "ne \"\"lo" + Environment.NewLine + "ng, cell\"\"\",h";
List<List<string>> csvList = new List<List<string>>() { new List<string>() { "0", "a", "\"bc", "d" }, new List<string>() { "e, f", "g", "this,is, o" + Environment.NewLine + "ne \"lo" + Environment.NewLine + "ng, cell\"", "h" } };
List<List<string>> csvTextAsList = Csv.FromStringToList(csvText);
bool ok = Enumerable.SequenceEqual(csvList[0], csvTextAsList[0]) && Enumerable.SequenceEqual(csvList[1], csvTextAsList[1]);
string csvListAsText = Csv.FromListToString(csvList);
return ok && csvListAsText == csvText;
}
}
Usage examples:
// get List<List<string>> representation of csv
var csvFromText = Csv.FromStringToList(csvAsText);
// read csv file with custom separator and quote
// return no header and ignore empty rows
var csvFile = File.ReadAllText(csvFileFullPath);
var csvFromFile = Csv.FromStringToList(csvFile, ";", '"', false, false);
// get text representation of csvData from List<List<string>>
var csvAsText = Csv.FromListToString(csvData);
Notes:
This: char tempQuote = (char)162; is first rare character from ASCI table. The script searches for this, or the first next few ascii character that is NOT in the text and uses it as a temporary escape and quote characters.
Still wrong. You need to compensate for "" in quotes.
Here is my solution Microsoft style csv.
/// <summary>
/// Microsoft style csv file. " is the quote character, "" is an escaped quote.
/// </summary>
/// <param name="fileName"></param>
/// <param name="sepChar"></param>
/// <param name="quoteChar"></param>
/// <param name="escChar"></param>
/// <returns></returns>
public static List<string[]> ReadCSVFileMSStyle(string fileName, char sepChar = ',', char quoteChar = '"')
{
List<string[]> ret = new List<string[]>();
string[] csvRows = System.IO.File.ReadAllLines(fileName);
foreach (string csvRow in csvRows)
{
bool inQuotes = false;
List<string> fields = new List<string>();
string field = "";
for (int i = 0; i < csvRow.Length; i++)
{
if (inQuotes)
{
// Is it a "" inside quoted area? (escaped litteral quote)
if(i < csvRow.Length - 1 && csvRow[i] == quoteChar && csvRow[i+1] == quoteChar)
{
i++;
field += quoteChar;
}
else if(csvRow[i] == quoteChar)
{
inQuotes = false;
}
else
{
field += csvRow[i];
}
}
else // Not in quoted region
{
if (csvRow[i] == quoteChar)
{
inQuotes = true;
}
if (csvRow[i] == sepChar)
{
fields.Add(field);
field = "";
}
else
{
field += csvRow[i];
}
}
}
if (!string.IsNullOrEmpty(field))
{
fields.Add(field);
field = "";
}
ret.Add(fields.ToArray());
}
return ret;
}
}
I have a library that is doing exactly you need.
Some time ago I had wrote simple and fast enough library for work with CSV files. You can find it by the following link: https://github.com/ukushu/DataExporter/blob/master/Csv.cs
It works with CSV like with 2 dimensions array. Exactly like you need.
As example, in case of you need all of values of 3rd row only you need is to write:
Csv csv = new Csv();
csv.FileOpen("c:\\file1.csv");
var allValuesOf3rdRow = csv.Rows[2];
or to read 2nd cell of 3rd row:
var value = csv.Rows[2][1];
Headers are required in csv for json conversion in the below code
You can use below code as is without making any changes.
This code will work with two row headers or with one row header.
Below code reads the uploaded IForm File and converts to memory stream.
If you want to use file path instead of uploaded file you can replace
new StreamReader(ms, System.Text.Encoding.UTF8, true)) with new StreamReader("../../examplefilepath");
using (var ms = new MemoryStream())
{
administrativesViewModel.csvFile.CopyTo(ms);
ms.Position = 0;
using (StreamReader csvReader = new StreamReader(ms, System.Text.Encoding.UTF8, true))
{
List<string> lines = new List<string>();
while (!csvReader.EndOfStream)
{
var line = csvReader.ReadLine();
var values = line.Split(';');
if (values[0] != "" && values[0] != null)
{
lines.Add(values[0]);
}
}
var csv = new List<string[]>();
foreach (string item in lines)
{
csv.Add(item.Split(','));
}
var properties = lines[0].Split(',');
int csvI = 1;
var listObjResult = new List<Dictionary<string, string>>();
if (lines.Count() > 1)
{
var ln = lines[0].Substring(0, lines[0].Count() - 1);
var ln1 = lines[1].Substring(0, lines[1].Count() - 1);
var lnSplit = ln.Split(',');
var ln1Split = ln1.Split(',');
if (lnSplit.Count() != ln1Split.Count())
{
properties = lines[1].Split(',');
csvI = 2;
}
}
for (int i = csvI; i < csv.Count(); i++)
{
var objResult = new Dictionary<string, string>();
if (csvI > 0)
{
var splitProp = lines[0].Split(":");
if (splitProp.Count() > 1)
{
if (splitProp[0] != "" && splitProp[0] != null && splitProp[1] != "" && splitProp[1] != null)
{
objResult.Add(splitProp[0], splitProp[1]);
}
}
}
for (int j = 0; j < properties.Length; j++)
if (!properties[j].Contains(":"))
{
objResult.Add(properties[j], csv[i][j]);
}
listObjResult.Add(objResult);
}
var result = JsonConvert.SerializeObject(listObjResult);
var result2 = JArray.Parse(result);
Console.WriteLine(result2);
}
}
look at this
using CsvFramework;
using System.Collections.Generic;
namespace CvsParser
{
public class Customer
{
public int Id { get; set; }
public string Name { get; set; }
public List<Order> Orders { get; set; }
}
public class Order
{
public int Id { get; set; }
public int CustomerId { get; set; }
public int Quantity { get; set; }
public int Amount { get; set; }
public List<OrderItem> OrderItems { get; set; }
}
public class Address
{
public int Id { get; set; }
public int CustomerId { get; set; }
public string Name { get; set; }
}
public class OrderItem
{
public int Id { get; set; }
public int OrderId { get; set; }
public string ProductName { get; set; }
}
class Program
{
static void Main(string[] args)
{
var customerLines = System.IO.File.ReadAllLines(#"Customers.csv");
var orderLines = System.IO.File.ReadAllLines(#"Orders.csv");
var orderItemLines = System.IO.File.ReadAllLines(#"OrderItemLines.csv");
CsvFactory.Register<Customer>(builder =>
{
builder.Add(a => a.Id).Type(typeof(int)).Index(0).IsKey(true);
builder.Add(a => a.Name).Type(typeof(string)).Index(1);
builder.AddNavigation(n => n.Orders).RelationKey<Order, int>(k => k.CustomerId);
}, false, ',', customerLines);
CsvFactory.Register<Order>(builder =>
{
builder.Add(a => a.Id).Type(typeof(int)).Index(0).IsKey(true);
builder.Add(a => a.CustomerId).Type(typeof(int)).Index(1);
builder.Add(a => a.Quantity).Type(typeof(int)).Index(2);
builder.Add(a => a.Amount).Type(typeof(int)).Index(3);
builder.AddNavigation(n => n.OrderItems).RelationKey<OrderItem, int>(k => k.OrderId);
}, true, ',', orderLines);
CsvFactory.Register<OrderItem>(builder =>
{
builder.Add(a => a.Id).Type(typeof(int)).Index(0).IsKey(true);
builder.Add(a => a.OrderId).Type(typeof(int)).Index(1);
builder.Add(a => a.ProductName).Type(typeof(string)).Index(2);
}, false, ',', orderItemLines);
var customers = CsvFactory.Parse<Customer>();
}
}
}

Categories

Resources