Regex behaving strangely .net - c#

I have some code which reads every row of a CSV file and if the value doesn't match the correct value, it will add it to the error list which is returned to the users screen. The problem I am having is with the regex itself.
protected void ReadData(string filePath, bool upload)
{
StringBuilder sb = new StringBuilder();
#region upload
if (upload == true) // CSV file upload chosen
{
using (CsvReader csv = new CsvReader(new StreamReader(filePath), true)) // Cache CSV file to memory
{
int fieldCount = csv.FieldCount; // Total number of fields per row
string[] headers = csv.GetFieldHeaders(); // Correct CSV headers stored in array
SortedList<int, string> errorList = new SortedList<int, string>(); // This list will contain error values
bool errorFlag = false;
int errorCount = 0;
// Check if headers are correct first before reading data
if (headers[0] != "first name" || headers[1] != "last name" || headers[2] != "job title" || headers[3] != "email address" || headers[4] != "telephone number" || headers[5] != "company" || headers[6] != "research manager" || headers[7] != "user card number")
{
sb.Append("Headers are incorrect");
}
else
{
while (csv.ReadNextRecord())
try
{
//Check csv obj data for valid values
for (int i = 0; i < fieldCount; i++)
{
if (i == 0 || i == 1) // FirstName and LastName
{
if (Regex.IsMatch(csv[i].ToString(), "[a-zA-Z]", RegexOptions.IgnoreCase)) //REGEX letters only min of 5 char max of 20
{
errorList.Add(errorCount, csv[i]);
errorCount += 1;
errorFlag = true;
string text = csv[i].ToString();
}
}
else if (i == 5) // Company name
{
string text = csv[i];
text.Replace("&", "and");
}
}
if (errorFlag == true)
{
sb.Append("<b>" + "Number of Error: " + errorCount + "</b>");
sb.Append("<ul>");
foreach (KeyValuePair<int, string> key in errorList)
{
sb.Append("<li>" + key.Value + "</li>");
}
}
else // All validation checks equaled to false. Create User
{
ORCLdap.CreateUserAccount(rootLDAPPath, svcUsername, svcPassword, csv[0], csv[1], csv[2], csv[3], csv[4], csv[5], csv[7]);
sb.Append("<b>New user data uploaded successfully</b>");
}
}// end of try
catch (Exception ex)
{
sb.Append(ex.ToString());
}
finally
{
lblMessage.Text = sb.ToString();
sb.Remove(0, sb.Length);
}
}
}
#endregion
The lblMessage.text contains this html:
Number of Error: 4
David1212
smith
Nick444
Gowdy333
When it should be 3 errors because smith doesnt contain a number.
Does anyone have suggestions for this?

You also have a logic error:
if (Regex.IsMatch(csv[i].ToString(), "[a-zA-Z]", RegexOptions.IgnoreCase)) //REGEX letters only min of 5 char max of 20
should be
if (!Regex.IsMatch(csv[i].ToString(), "^[a-zA-Z]+$", RegexOptions.IgnoreCase)) //REGEX letters only min of 5 char max of 20
because it is only an error if the name has other characters than [a-zA-Z] in it, right?
(and if you use RegexOptions.IgnoreCase you don't need [a-zA-Z], [a-z] would do)

You need to add word boundaries to your regex, or starting '^' and end '$'
i.e.
^[a-zA-Z]+$
http://regexr.com?3298g
Your current regex is incorrect, and will match any string which contains a-z or A-Z , any letter,at any position.
http://regexr.com?3298j

Related

Split a string if delimiter is between single quotes [duplicate]

This question already has answers here:
How to split csv whose columns may contain comma
(9 answers)
Closed 4 years ago.
I have the following comma-separated string that I need to split. The problem is that some of the content is within quotes and contains commas that shouldn't be used in the split.
String:
111,222,"33,44,55",666,"77,88","99"
I want the output:
111
222
33,44,55
666
77,88
99
I have tried this:
(?:,?)((?<=")[^"]+(?=")|[^",]+)
But it reads the comma between "77,88","99" as a hit and I get the following output:
111
222
33,44,55
666
77,88
,
99
Depending on your needs you may not be able to use a csv parser, and may in fact want to re-invent the wheel!!
You can do so with some simple regex
(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)
This will do the following:
(?:^|,) = Match expression "Beginning of line or string ,"
(\"(?:[^\"]+|\"\")*\"|[^,]*) = A numbered capture group, this will select between 2 alternatives:
stuff in quotes
stuff between commas
This should give you the output you are looking for.
Example code in C#
static Regex csvSplit = new Regex("(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)", RegexOptions.Compiled);
public static string[] SplitCSV(string input)
{
List<string> list = new List<string>();
string curr = null;
foreach (Match match in csvSplit.Matches(input))
{
curr = match.Value;
if (0 == curr.Length)
{
list.Add("");
}
list.Add(curr.TrimStart(','));
}
return list.ToArray();
}
private void button1_Click(object sender, RoutedEventArgs e)
{
Console.WriteLine(SplitCSV("111,222,\"33,44,55\",666,\"77,88\",\"99\""));
}
Warning As per #MrE's comment - if a rogue new line character appears in a badly formed csv file and you end up with an uneven ("string) you'll get catastrophic backtracking (https://www.regular-expressions.info/catastrophic.html) in your regex and your system will likely crash (like our production system did). Can easily be replicated in Visual Studio and as I've discovered will crash it. A simple try/catch will not trap this issue either.
You should use:
(?:^|,)(\"(?:[^\"])*\"|[^,]*)
instead
Fast and easy:
public static string[] SplitCsv(string line)
{
List<string> result = new List<string>();
StringBuilder currentStr = new StringBuilder("");
bool inQuotes = false;
for (int i = 0; i < line.Length; i++) // For each character
{
if (line[i] == '\"') // Quotes are closing or opening
inQuotes = !inQuotes;
else if (line[i] == ',') // Comma
{
if (!inQuotes) // If not in quotes, end of current string, add it to result
{
result.Add(currentStr.ToString());
currentStr.Clear();
}
else
currentStr.Append(line[i]); // If in quotes, just add it
}
else // Add any other character to current string
currentStr.Append(line[i]);
}
result.Add(currentStr.ToString());
return result.ToArray(); // Return array of all strings
}
With this string as input :
111,222,"33,44,55",666,"77,88","99"
It will return :
111
222
33,44,55
666
77,88
99
i really like jimplode's answer, but I think a version with yield return is a little bit more useful, so here it is:
public IEnumerable<string> SplitCSV(string input)
{
Regex csvSplit = new Regex("(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)", RegexOptions.Compiled);
foreach (Match match in csvSplit.Matches(input))
{
yield return match.Value.TrimStart(',');
}
}
Maybe it's even more useful to have it like an extension method:
public static class StringHelper
{
public static IEnumerable<string> SplitCSV(this string input)
{
Regex csvSplit = new Regex("(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)", RegexOptions.Compiled);
foreach (Match match in csvSplit.Matches(input))
{
yield return match.Value.TrimStart(',');
}
}
}
This regular expression works without the need to loop through values and TrimStart(','), like in the accepted answer:
((?<=\")[^\"]*(?=\"(,|$)+)|(?<=,|^)[^,\"]*(?=,|$))
Here is the implementation in C#:
string values = "111,222,\"33,44,55\",666,\"77,88\",\"99\"";
MatchCollection matches = new Regex("((?<=\")[^\"]*(?=\"(,|$)+)|(?<=,|^)[^,\"]*(?=,|$))").Matches(values);
foreach (var match in matches)
{
Console.WriteLine(match);
}
Outputs
111
222
33,44,55
666
77,88
99
None of these answers work when the string has a comma inside quotes, as in "value, 1", or escaped double-quotes, as in "value ""1""", which are valid CSV that should be parsed as value, 1 and value "1", respectively.
This will also work with the tab-delimited format if you pass in a tab instead of a comma as your delimiter.
public static IEnumerable<string> SplitRow(string row, char delimiter = ',')
{
var currentString = new StringBuilder();
var inQuotes = false;
var quoteIsEscaped = false; //Store when a quote has been escaped.
row = string.Format("{0}{1}", row, delimiter); //We add new cells at the delimiter, so append one for the parser.
foreach (var character in row.Select((val, index) => new {val, index}))
{
if (character.val == delimiter) //We hit a delimiter character...
{
if (!inQuotes) //Are we inside quotes? If not, we've hit the end of a cell value.
{
Console.WriteLine(currentString);
yield return currentString.ToString();
currentString.Clear();
}
else
{
currentString.Append(character.val);
}
} else {
if (character.val != ' ')
{
if(character.val == '"') //If we've hit a quote character...
{
if(character.val == '\"' && inQuotes) //Does it appear to be a closing quote?
{
if (row[character.index + 1] == character.val) //If the character afterwards is also a quote, this is to escape that (not a closing quote).
{
quoteIsEscaped = true; //Flag that we are escaped for the next character. Don't add the escaping quote.
}
else if (quoteIsEscaped)
{
quoteIsEscaped = false; //This is an escaped quote. Add it and revert quoteIsEscaped to false.
currentString.Append(character.val);
}
else
{
inQuotes = false;
}
}
else
{
if (!inQuotes)
{
inQuotes = true;
}
else
{
currentString.Append(character.val); //...It's a quote inside a quote.
}
}
}
else
{
currentString.Append(character.val);
}
}
else
{
if (!string.IsNullOrWhiteSpace(currentString.ToString())) //Append only if not new cell
{
currentString.Append(character.val);
}
}
}
}
}
With minor updates to the function provided by "Chad Hedgcock".
Updates are on:
Line 26: character.val == '\"' - This can never be true due to the check made on Line 24. i.e. character.val == '"'
Line 28: if (row[character.index + 1] == character.val) added !quoteIsEscaped to escape 3 consecutive quotes.
public static IEnumerable<string> SplitRow(string row, char delimiter = ',')
{
var currentString = new StringBuilder();
var inQuotes = false;
var quoteIsEscaped = false; //Store when a quote has been escaped.
row = string.Format("{0}{1}", row, delimiter); //We add new cells at the delimiter, so append one for the parser.
foreach (var character in row.Select((val, index) => new {val, index}))
{
if (character.val == delimiter) //We hit a delimiter character...
{
if (!inQuotes) //Are we inside quotes? If not, we've hit the end of a cell value.
{
//Console.WriteLine(currentString);
yield return currentString.ToString();
currentString.Clear();
}
else
{
currentString.Append(character.val);
}
} else {
if (character.val != ' ')
{
if(character.val == '"') //If we've hit a quote character...
{
if(character.val == '"' && inQuotes) //Does it appear to be a closing quote?
{
if (row[character.index + 1] == character.val && !quoteIsEscaped) //If the character afterwards is also a quote, this is to escape that (not a closing quote).
{
quoteIsEscaped = true; //Flag that we are escaped for the next character. Don't add the escaping quote.
}
else if (quoteIsEscaped)
{
quoteIsEscaped = false; //This is an escaped quote. Add it and revert quoteIsEscaped to false.
currentString.Append(character.val);
}
else
{
inQuotes = false;
}
}
else
{
if (!inQuotes)
{
inQuotes = true;
}
else
{
currentString.Append(character.val); //...It's a quote inside a quote.
}
}
}
else
{
currentString.Append(character.val);
}
}
else
{
if (!string.IsNullOrWhiteSpace(currentString.ToString())) //Append only if not new cell
{
currentString.Append(character.val);
}
}
}
}
}
For Jay's answer, if you use a 2nd boolean then you can have nested double-quotes inside single-quotes and vice-versa.
private string[] splitString(string stringToSplit)
{
char[] characters = stringToSplit.ToCharArray();
List<string> returnValueList = new List<string>();
string tempString = "";
bool blockUntilEndQuote = false;
bool blockUntilEndQuote2 = false;
int characterCount = 0;
foreach (char character in characters)
{
characterCount = characterCount + 1;
if (character == '"' && !blockUntilEndQuote2)
{
if (blockUntilEndQuote == false)
{
blockUntilEndQuote = true;
}
else if (blockUntilEndQuote == true)
{
blockUntilEndQuote = false;
}
}
if (character == '\'' && !blockUntilEndQuote)
{
if (blockUntilEndQuote2 == false)
{
blockUntilEndQuote2 = true;
}
else if (blockUntilEndQuote2 == true)
{
blockUntilEndQuote2 = false;
}
}
if (character != ',')
{
tempString = tempString + character;
}
else if (character == ',' && (blockUntilEndQuote == true || blockUntilEndQuote2 == true))
{
tempString = tempString + character;
}
else
{
returnValueList.Add(tempString);
tempString = "";
}
if (characterCount == characters.Length)
{
returnValueList.Add(tempString);
tempString = "";
}
}
string[] returnValue = returnValueList.ToArray();
return returnValue;
}
The original version
Currently I use the following regex:
public static Regex regexCSVSplit = new Regex(#"(?x:(
(?<FULL>
(^|[,;\t\r\n])\s*
( (?<QUODAT> (?<QUO>[""'])(?<DAT>([^,;\t\r\n]|(?<!\k<QUO>\s*)[,;\t\r\n])*)\k<QUO>) |
(?<QUODAT> (?<DAT> [^""',;\s\r\n]* )) )
(?=\s*([,;\t\r\n]|$))
) |
(?<FULL>
(^|[\s\t\r\n])
( (?<QUODAT> (?<QUO>[""'])(?<DAT> [^""',;\s\t\r\n]* )\k<QUO>) |
(?<QUODAT> (?<DAT> [^""',;\s\t\r\n]* )) )
(?=[,;\s\t\r\n]|$)
)
))", RegexOptions.Compiled);
This solution can handle pretty chaotic cases too like below:
This is how to feed the result into an array:
var data = regexCSVSplit.Matches(line_to_process).Cast<Match>().
Select(x => x.Groups["DAT"].Value).ToArray();
See this example in action HERE
Note: The regular expression contains two set of <FULL> block and each of them contains two <QUODAT> block separated by "or" (|). Depending on your task you may only need one of them.
Note: That this regular expression gives us one string array, and works on single line with or without <carrier return> and/or <line feed>.
Simplified version
The following regular expression will already cover many complex cases:
public static Regex regexCSVSplit = new Regex(#"(?x:(
(?<FULL>
(^|[,;\t\r\n])\s*
(?<QUODAT> (?<QUO>[""'])(?<DAT>([^,;\t\r\n]|(?<!\k<QUO>\s*)[,;\t\r\n])*)\k<QUO>)
(?=\s*([,;\t\r\n]|$))
)
))", RegexOptions.Compiled);
See this example in action: HERE
It can process complex, easy and empty items too:
This is how to feed the result into an array:
var data = regexCSVSplit.Matches(line_to_process).Cast<Match>().
Select(x => x.Groups["DAT"].Value).ToArray();
The main rule here is that every item may contain anything but the <quotation mark><separators><comma> sequence AND each item shall being and end with the same <quotation mark>.
<quotation mark>: <">, <'>
<comma>: <,>, <;>, <tab>, <carrier return>, <line feed>
Edit notes: I added some more explanation to make it easier to understand and replaces the text "CO" with "QUO".
Try this:
string s = #"111,222,""33,44,55"",666,""77,88"",""99""";
List<string> result = new List<string>();
var splitted = s.Split('"').ToList<string>();
splitted.RemoveAll(x => x == ",");
foreach (var it in splitted)
{
if (it.StartsWith(",") || it.EndsWith(","))
{
var tmp = it.TrimEnd(',').TrimStart(',');
result.AddRange(tmp.Split(','));
}
else
{
if(!string.IsNullOrEmpty(it)) result.Add(it);
}
}
//Results:
foreach (var it in result)
{
Console.WriteLine(it);
}
I know I'm a bit late to this, but for searches, here is how I did what you are asking about in C sharp
private string[] splitString(string stringToSplit)
{
char[] characters = stringToSplit.ToCharArray();
List<string> returnValueList = new List<string>();
string tempString = "";
bool blockUntilEndQuote = false;
int characterCount = 0;
foreach (char character in characters)
{
characterCount = characterCount + 1;
if (character == '"')
{
if (blockUntilEndQuote == false)
{
blockUntilEndQuote = true;
}
else if (blockUntilEndQuote == true)
{
blockUntilEndQuote = false;
}
}
if (character != ',')
{
tempString = tempString + character;
}
else if (character == ',' && blockUntilEndQuote == true)
{
tempString = tempString + character;
}
else
{
returnValueList.Add(tempString);
tempString = "";
}
if (characterCount == characters.Length)
{
returnValueList.Add(tempString);
tempString = "";
}
}
string[] returnValue = returnValueList.ToArray();
return returnValue;
}
Don't reinvent a CSV parser, try FileHelpers.
I needed something a little more robust, so I took from here and created this... This solution is a little less elegant and a little more verbose, but in my testing (with a 1,000,000 row sample), I found this to be 2 to 3 times faster. Plus it handles non-escaped, embedded quotes. I used string delimiter and qualifiers instead of chars because of the requirements of my solution. I found it more difficult than I expected to find a good, generic CSV parser so I hope this parsing algorithm can help someone.
public static string[] SplitRow(string record, string delimiter, string qualifier, bool trimData)
{
// In-Line for example, but I implemented as string extender in production code
Func <string, int, int> IndexOfNextNonWhiteSpaceChar = delegate (string source, int startIndex)
{
if (startIndex >= 0)
{
if (source != null)
{
for (int i = startIndex; i < source.Length; i++)
{
if (!char.IsWhiteSpace(source[i]))
{
return i;
}
}
}
}
return -1;
};
var results = new List<string>();
var result = new StringBuilder();
var inQualifier = false;
var inField = false;
// We add new columns at the delimiter, so append one for the parser.
var row = $"{record}{delimiter}";
for (var idx = 0; idx < row.Length; idx++)
{
// A delimiter character...
if (row[idx]== delimiter[0])
{
// Are we inside qualifier? If not, we've hit the end of a column value.
if (!inQualifier)
{
results.Add(trimData ? result.ToString().Trim() : result.ToString());
result.Clear();
inField = false;
}
else
{
result.Append(row[idx]);
}
}
// NOT a delimiter character...
else
{
// ...Not a space character
if (row[idx] != ' ')
{
// A qualifier character...
if (row[idx] == qualifier[0])
{
// Qualifier is closing qualifier...
if (inQualifier && row[IndexOfNextNonWhiteSpaceChar(row, idx + 1)] == delimiter[0])
{
inQualifier = false;
continue;
}
else
{
// ...Qualifier is opening qualifier
if (!inQualifier)
{
inQualifier = true;
}
// ...It's a qualifier inside a qualifier.
else
{
inField = true;
result.Append(row[idx]);
}
}
}
// Not a qualifier character...
else
{
result.Append(row[idx]);
inField = true;
}
}
// ...A space character
else
{
if (inQualifier || inField)
{
result.Append(row[idx]);
}
}
}
}
return results.ToArray<string>();
}
Some test code:
//var input = "111,222,\"33,44,55\",666,\"77,88\",\"99\"";
var input =
"111, 222, \"99\",\"33,44,55\" , \"666 \"mark of a man\"\", \" spaces \"77,88\" \"";
Console.WriteLine("Split with trim");
Console.WriteLine("---------------");
var result = SplitRow(input, ",", "\"", true);
foreach (var r in result)
{
Console.WriteLine(r);
}
Console.WriteLine("");
// Split 2
Console.WriteLine("Split with no trim");
Console.WriteLine("------------------");
var result2 = SplitRow(input, ",", "\"", false);
foreach (var r in result2)
{
Console.WriteLine(r);
}
Console.WriteLine("");
// Time Trial 1
Console.WriteLine("Experimental Process (1,000,000) iterations");
Console.WriteLine("-------------------------------------------");
watch = Stopwatch.StartNew();
for (var i = 0; i < 1000000; i++)
{
var x1 = SplitRow(input, ",", "\"", false);
}
watch.Stop();
elapsedMs = watch.ElapsedMilliseconds;
Console.WriteLine($"Total Process Time: {string.Format("{0:0.###}", elapsedMs / 1000.0)} Seconds");
Console.WriteLine("");
Results
Split with trim
---------------
111
222
99
33,44,55
666 "mark of a man"
spaces "77,88"
Split with no trim
------------------
111
222
99
33,44,55
666 "mark of a man"
spaces "77,88"
Original Process (1,000,000) iterations
-------------------------------
Total Process Time: 7.538 Seconds
Experimental Process (1,000,000) iterations
--------------------------------------------
Total Process Time: 3.363 Seconds
I once had to do something similar and in the end I got stuck with Regular Expressions. The inability for Regex to have state makes it pretty tricky - I just ended up writing a simple little parser.
If you're doing CSV parsing you should just stick to using a CSV parser - don't reinvent the wheel.
Here is my fastest implementation based upon string raw pointer manipulation:
string[] FastSplit(string sText, char? cSeparator = null, char? cQuotes = null)
{
string[] oTokens;
if (null == cSeparator)
{
cSeparator = DEFAULT_PARSEFIELDS_SEPARATOR;
}
if (null == cQuotes)
{
cQuotes = DEFAULT_PARSEFIELDS_QUOTE;
}
unsafe
{
fixed (char* lpText = sText)
{
#region Fast array estimatation
char* lpCurrent = lpText;
int nEstimatedSize = 0;
while (0 != *lpCurrent)
{
if (cSeparator == *lpCurrent)
{
nEstimatedSize++;
}
lpCurrent++;
}
nEstimatedSize++; // Add EOL char(s)
string[] oEstimatedTokens = new string[nEstimatedSize];
#endregion
#region Parsing
char[] oBuffer = new char[sText.Length];
int nIndex = 0;
int nTokens = 0;
lpCurrent = lpText;
while (0 != *lpCurrent)
{
if (cQuotes == *lpCurrent)
{
// Quotes parsing
lpCurrent++; // Skip quote
nIndex = 0; // Reset buffer
while (
(0 != *lpCurrent)
&& (cQuotes != *lpCurrent)
)
{
oBuffer[nIndex] = *lpCurrent; // Store char
lpCurrent++; // Move source cursor
nIndex++; // Move target cursor
}
}
else if (cSeparator == *lpCurrent)
{
// Separator char parsing
oEstimatedTokens[nTokens++] = new string(oBuffer, 0, nIndex); // Store token
nIndex = 0; // Skip separator and Reset buffer
}
else
{
// Content parsing
oBuffer[nIndex] = *lpCurrent; // Store char
nIndex++; // Move target cursor
}
lpCurrent++; // Move source cursor
}
// Recover pending buffer
if (nIndex > 0)
{
// Store token
oEstimatedTokens[nTokens++] = new string(oBuffer, 0, nIndex);
}
// Build final tokens list
if (nTokens == nEstimatedSize)
{
oTokens = oEstimatedTokens;
}
else
{
oTokens = new string[nTokens];
Array.Copy(oEstimatedTokens, 0, oTokens, 0, nTokens);
}
#endregion
}
}
// Epilogue
return oTokens;
}
Try this
private string[] GetCommaSeperatedWords(string sep, string line)
{
List<string> list = new List<string>();
StringBuilder word = new StringBuilder();
int doubleQuoteCount = 0;
for (int i = 0; i < line.Length; i++)
{
string chr = line[i].ToString();
if (chr == "\"")
{
if (doubleQuoteCount == 0)
doubleQuoteCount++;
else
doubleQuoteCount--;
continue;
}
if (chr == sep && doubleQuoteCount == 0)
{
list.Add(word.ToString());
word = new StringBuilder();
continue;
}
word.Append(chr);
}
list.Add(word.ToString());
return list.ToArray();
}
This is Chad's answer rewritten with state based logic. His answered failed for me when it came across """BRAD""" as a field. That should return "BRAD" but it just ate up all the remaining fields. When I tried to debug it I just ended up rewriting it as state based logic:
enum SplitState { s_begin, s_infield, s_inquotefield, s_foundquoteinfield };
public static IEnumerable<string> SplitRow(string row, char delimiter = ',')
{
var currentString = new StringBuilder();
SplitState state = SplitState.s_begin;
row = string.Format("{0}{1}", row, delimiter); //We add new cells at the delimiter, so append one for the parser.
foreach (var character in row.Select((val, index) => new { val, index }))
{
//Console.WriteLine("character = " + character.val + " state = " + state);
switch (state)
{
case SplitState.s_begin:
if (character.val == delimiter)
{
/* empty field */
yield return currentString.ToString();
currentString.Clear();
} else if (character.val == '"')
{
state = SplitState.s_inquotefield;
} else
{
currentString.Append(character.val);
state = SplitState.s_infield;
}
break;
case SplitState.s_infield:
if (character.val == delimiter)
{
/* field with data */
yield return currentString.ToString();
state = SplitState.s_begin;
currentString.Clear();
} else
{
currentString.Append(character.val);
}
break;
case SplitState.s_inquotefield:
if (character.val == '"')
{
// could be end of field, or escaped quote.
state = SplitState.s_foundquoteinfield;
} else
{
currentString.Append(character.val);
}
break;
case SplitState.s_foundquoteinfield:
if (character.val == '"')
{
// found escaped quote.
currentString.Append(character.val);
state = SplitState.s_inquotefield;
}
else if (character.val == delimiter)
{
// must have been last quote so we must find delimiter
yield return currentString.ToString();
state = SplitState.s_begin;
currentString.Clear();
}
else
{
throw new Exception("Quoted field not terminated.");
}
break;
default:
throw new Exception("unknown state:" + state);
}
}
//Console.WriteLine("currentstring = " + currentString.ToString());
}
This is a lot more lines of code than the other solutions, but it is easy to modify to add edge cases.

c# check textbox for unique values

I have a textbox where user inputs values, each one in a new row, now i want to check if those input values are unique, but looks like it does not work if duplicated value is a last value, don't know why. Any tips?
Lets say it is:
1
2
3
3
It will not work, but
1
2
3
3
5
Will work and show an error as duplicate
Here is a code i use:
First I split textbox into array of strings
string[] linesValues = textBoxValues.Text.Split(new Char[] { '\n' }, StringSplitOptions.RemoveEmptyEntries);
then check for duplicates and show error
if (linesValues.Distinct().Count() != linesValues.Count()) { MessageBox.Show("Question values must be unique!", "Duplicated values found", MessageBoxButtons.OK, MessageBoxIcon.Error); return; }
I suggest processing the values: removing empty / all whitespaces strings, leading and trailing spaces at least (note, that "3 " != " 3" != "3"):
new Char[] { '\n', '\r' } note \r can appear as a new line; let's be on the safer side of the road
.Where(item => !string.IsNullOrWhiteSpace(item)) we don't want all whitespaces line (empty lines included) like " "
item => item.Trim() questions processing; let "3" be equal to "3 "
Code:
string[] linesValues = textBoxValues
.Text
.Split(new Char[] { '\n', '\r' }, StringSplitOptions.RemoveEmptyEntries)
.Where(item => !string.IsNullOrWhiteSpace(item))
.Select(item => item.Trim())
.ToArray();
and then check as you do
using System.Linq;
...
if (linesValues.Distinct().Count() != linesValues.Count()) {
MessageBox.Show("Question values must be unique!",
"Duplicated values found",
MessageBoxButtons.OK,
MessageBoxIcon.Error);
return;
}
private void button1_Click(object sender, EventArgs e)
{
int[] linesValues = new int[] { 1, 2, 3, 4, 1 };
if (AnyDuplicate(linesValues))
{
MessageBox.Show("Question values must be unique!", "Duplicated values found", MessageBoxButtons.OK, MessageBoxIcon.Error);
return;
}
}
private bool AnyDuplicate(int[] numbers)
{
for (int i = 0; i < numbers.Length; i++)
{
for (int j = i + 1; j < numbers.Length; j++)
{
if (numbers[i] == numbers[j])
{
return true;
}
}
}
return false;
}

Writing a program in C# where depending on the Id (txtBxNumber) it will either update or create a new record in a text file and a Rich Text Box

fileName = txtBxFileNamePath.Text;
if (File.Exists(fileName))
{
if (txtBxDate.Text != null && txtBxNumber.Text != null && txtBxUnit.Text != null && txtBxUnitPrice.Text != null && txtBxShipTo.Text != null
&& txtBxOrdered.Text != null && richTxBxDesc.Text != null)
{
try
{
int higherThanZero = Int32.Parse(txtBxNumber.Text);
if (higherThanZero > 0)
{
using (StreamReader reader = File.OpenText(fileName))
{
string[] lines = File.ReadAllLines(fileName);
for (int i = 0; i < lines.Length - 1; i++)
{
string firstNum = lines[i].Substring(0, 2);
if (firstNum == txtBxNumber.Text)
{
string record = "hello ";
lines[i].Replace(lines[i], record);
}
else
{
int orderNum = Int32.Parse(txtBxOrdered.Text);
int unitPriceNum = Int32.Parse(txtBxUnitPrice.Text);
double tax = .13;
int taxInt = (int)tax;
int amount = orderNum * unitPriceNum;
string amountStr = amount.ToString();
int amountTotal = amount * taxInt;
string amountTotalStr = amountTotal.ToString();
amountList.Add(amountStr);
amountTotalList.Add(amountTotalStr);
string record = amountTotalStr.PadRight(30) + amountStr.PadRight(30);
richTxtBxRecord.Text += record + "\n";
using (StreamWriter write = new StreamWriter(fileName, true))
{
write.WriteLine(record + "\n");
write.Close();
}
}
}
}
}
else
{
richTxtBxError.Text += "Textbox Number must contain a digit higher than 0 ";
}
}
catch
{
richTxtBxError.Text += "Please make sure number text box is a digit";
}
}
else
{
richTxtBxError.Text += "please make sure that no text boxes are empty";
}
}
else
{
richTxtBxError.Text += "Please select a file that already exists";
}
I am having an issue where once i get past the try-catch statement "please make sure number is a digit, no code executes. I am trying to obtain the first few characters in a text file and match it with the users input. If the input is the same as what is already inserted in the text file, i update the whole record. If there is no match (non existent number) i write in a brand new record.
I can't quite follow your logic, but I tried. You should be able to take this code and do what you want (whatever it is).
I started by declaring some class level variables.
private DateTime _dateValue;
private int _numberValue;
private decimal _unitPrice;
private int _numberOrdered;
Then, since you have so many preconditions and so many text boxes, I factored out the validation and setting of these variables. It makes the logic (whatever it supposed to be) much easier to follow:
private bool ValidateUserEntry()
{
bool isError = false;
if (!File.Exists(txtBxFileNamePath.Text))
{
AddError("File Name must exist");
isError = true;
}
if (txtBxDate.Text == string.Empty || !DateTime.TryParse(txtBxDate.Text, out var _dateValue))
{
AddError("The date must be a valid date");
isError = true;
}
if (txtBxNumber.Text == string.Empty || !int.TryParse(txtBxNumber.Text, out _numberValue) ||
_numberValue <= 0)
{
AddError("You must enter a number greater than 0 for [Number]");
isError = true;
}
if (txtBxUnitPrice.Text == string.Empty || !decimal.TryParse(txtBxUnitPrice.Text, out _unitPrice) ||
_unitPrice <= 0.0m)
{
AddError("The unit price must be a positive decimal number");
isError = true;
}
if (txtBxShipTo.Text == string.Empty)
{
AddError("A ship to address is required");
isError = true;
}
if (txtBxOrdered.Text == string.Empty || !int.TryParse(txtBxOrdered.Text, out _numberOrdered) ||
_numberOrdered <= 0)
{
AddError("The Number ordered must be a number greater than 0");
isError = true;
}
if (richTxBxDesc.Text == string.Empty)
{
AddError("A description is required");
isError = true;
}
return !isError;
}
I also added two utility functions for managing the error list:
private void ClearError()
{
richTxtBxError.Text = string.Empty;
}
private void AddError(string errorMessage)
{
richTxtBxError.Text += (errorMessage + Environment.NewLine);
richTxtBxError.SelectionStart = richTxtBxError.Text.Length;
richTxtBxError.SelectionLength = 0;
}
Now comes the real code. Near as I can tell, you want to scan a text file. If the number in the first few character positions matches a number in your input, then you change the line to some constant text. Otherwise, you want to do a calculation and put the results of the calculation on the line of text.
My input file looks like this:
1 First
2 Second
3 Third
12 Twelth
13 Thirteenth
34 Thirty-fourth
and the code that I run looks like what's below. The logic makes no sense, but it was what I could discern from your code. Instead of trying to do things on the fly to a file (which never really turns out well unless you are really careful), I gather the output into a List<string>. Once I have all the output, I put it in a text box control and overwrite the file.
ClearError();
//check pre-conditions
if (!ValidateUserEntry())
{
return;
}
string[] lines;
using (StreamReader reader = File.OpenText(txtBxFileNamePath.Text))
{
lines = File.ReadAllLines(txtBxFileNamePath.Text);
}
List<string> newLines = new List<string>();
for (var lineIndex = 0; lineIndex < lines.Length; ++lineIndex)
{
var line = lines[lineIndex];
if (line.Length > 2 && int.TryParse(line.Substring(0, 2), out var linePrefixNumber) &&
linePrefixNumber == _numberValue)
{
newLines.Add("Bingo, hit the right record");
}
else
{
decimal tax = .13m;
var amount = _numberOrdered * _unitPrice;
var amountTotal = amount * (1m + tax);
//amountList.Add(amount.TosString());
//amountTotalList.Add(amountTotal.ToString());
var newRecord = $"{amountTotal,30:C}{amount,30:C}";
newLines.Add(newRecord); //every record but one will be the same, but, such is life
}
}
//at this point, the newLines list has what I want
//put it in the text box
richTxtBxRecord.Text = string.Join(Environment.NewLine, newLines);
//and write it out
using (StreamWriter write = new StreamWriter(txtBxFileNamePath.Text, append:false))
{
write.Write(richTxtBxRecord.Text);
write.Flush();
}
With inputs that look like:
Number: 12
Number Ordered: 3
Unit Price: 1.23
The output (oddly enough - but it's what I could figure from your code) looks like:
$4.17 $3.69
$4.17 $3.69
$4.17 $3.69
Bingo, hit the right record
$4.17 $3.69
$4.17 $3.69
You can see that the input line that had the 12 at the start gets switched for bingo. The rest get the same information. I'm sure that's not what you want. But, with this code, you should be able to get something that you'd like.
Also note that I treat all the currency values as decimal (not int or double). For the life of me, I have no idea what you were trying to do with the taxInt variable (it will always be zero the way you have coded it). Instead, I did a rational tax calculation.
All of the code below the catch block is inside an else block, so I wouldn't expect it to execute. If you want something to execute after the catch, remove it from the else block.

Splitting CSV files with commas in the values [duplicate]

How to split the CSV file in c sharp? And how to display this?
I've been using the TextFieldParser Class in the Microsoft.VisualBasic.FileIO namespace for a C# project I'm working on. It will handle complications such as embedded commas or fields that are enclosed in quotes etc. It returns a string[] and, in addition to CSV files, can also be used for parsing just about any type of structured text file.
Display where? About splitting, the best way is to use a good library to that effect.
This library is pretty good, I can recommend it heartily.
The problems using naïve methods is that the usually fail, there are tons of considerations without even thinking about performance:
What if the text contains commas
Support for the many existing formats (separated by semicolon, or text surrounded by quotes, or single quotes, etc.)
and many others
Import Micorosoft.VisualBasic as a reference (I know, its not that bad) and use Microsoft.VisualBasic.FileIO.TextFieldParser - this handles CSV files very well, and can be used in any .Net language.
read the file one line at a time, then ...
foreach (String line in line.Split(new char[] { ',' }))
Console.WriteLine(line);
This is a CSV parser I use on occasion.
Usage: (dgvMyView is a datagrid type.)
CSVReader reader = new CSVReader("C:\MyFile.txt");
reader.DisplayResults(dgvMyView);
Class:
using System.IO;
using System.Text.RegularExpressions;
using System.Windows.Forms;
public class CSVReader
{
private const string ESCAPE_SPLIT_REGEX = "({1}[^{1}]*{1})*(?<Separator>{0})({1}[^{1}]*{1})*";
private string[] FieldNames;
private List<string[]> Records;
private int ReadIndex;
public CSVReader(string File)
{
Records = new List<string[]>();
string[] Record = null;
StreamReader Reader = new StreamReader(File);
int Index = 0;
bool BlankRecord = true;
FieldNames = GetEscapedSVs(Reader.ReadLine());
while (!Reader.EndOfStream)
{
Record = GetEscapedSVs(Reader.ReadLine());
BlankRecord = true;
for (Index = 0; Index <= Record.Length - 1; Index++)
{
if (!string.IsNullOrEmpty(Record[Index])) BlankRecord = false;
}
if (!BlankRecord) Records.Add(Record);
}
ReadIndex = -1;
Reader.Close();
}
private string[] GetEscapedSVs(string Data)
{
return GetEscapedSVs(Data, ",", "\"");
}
private string[] GetEscapedSVs(string Data, string Separator, string Escape)
{
string[] Result = null;
int Index = 0;
int PriorMatchIndex = 0;
MatchCollection Matches = Regex.Matches(Data, string.Format(ESCAPE_SPLIT_REGEX, Separator, Escape));
Result = new string[Matches.Count];
for (Index = 0; Index <= Result.Length - 2; Index++)
{
Result[Index] = Data.Substring(PriorMatchIndex, Matches[Index].Groups["Separator"].Index - PriorMatchIndex);
PriorMatchIndex = Matches[Index].Groups["Separator"].Index + Separator.Length;
}
Result[Result.Length - 1] = Data.Substring(PriorMatchIndex);
for (Index = 0; Index <= Result.Length - 1; Index++)
{
if (Regex.IsMatch(Result[Index], string.Format("^{0}[^{0}].*[^{0}]{0}$", Escape))) Result[Index] = Result[Index].Substring(1, Result[Index].Length - 2);
Result[Index] = Result[Index].Replace(Escape + Escape, Escape);
if (Result[Index] == null) Result[Index] = "";
}
return Result;
}
public int FieldCount
{
get { return FieldNames.Length; }
}
public string GetString(int Index)
{
return Records[ReadIndex][Index];
}
public string GetName(int Index)
{
return FieldNames[Index];
}
public bool Read()
{
ReadIndex = ReadIndex + 1;
return ReadIndex < Records.Count;
}
public void DisplayResults(DataGridView DataView)
{
DataGridViewColumn col = default(DataGridViewColumn);
DataGridViewRow row = default(DataGridViewRow);
DataGridViewCell cell = default(DataGridViewCell);
DataGridViewColumnHeaderCell header = default(DataGridViewColumnHeaderCell);
int Index = 0;
ReadIndex = -1;
DataView.Rows.Clear();
DataView.Columns.Clear();
for (Index = 0; Index <= FieldCount - 1; Index++)
{
col = new DataGridViewColumn();
col.CellTemplate = new DataGridViewTextBoxCell();
header = new DataGridViewColumnHeaderCell();
header.Value = GetName(Index);
col.HeaderCell = header;
DataView.Columns.Add(col);
}
while (Read())
{
row = new DataGridViewRow();
for (Index = 0; Index <= FieldCount - 1; Index++)
{
cell = new DataGridViewTextBoxCell();
cell.Value = GetString(Index).ToString();
row.Cells.Add(cell);
}
DataView.Rows.Add(row);
}
}
}
I had got the result for my query. its like simple like i had read a file using io.file. and all the text are stored into a string. After that i splitted with a seperator. The code is shown below.
using System;
using System.Collections.Generic;
using System.Text;
namespace CSV
{
class Program
{
static void Main(string[] args)
{
string csv = "user1, user2, user3,user4,user5";
string[] split = csv.Split(new char[] {',',' '});
foreach(string s in split)
{
if (s.Trim() != "")
Console.WriteLine(s);
}
Console.ReadLine();
}
}
}
The following function takes a line from a CSV file and splits it into a List<string>.
Arguments:
string line = the line to split
string textQualifier = what (if any) text qualifier (i.e. "" or "\"" or "'")
char delim = the field delimiter (i.e. ',' or ';' or '|' or '\t')
int colCount = the expected number of fields (0 means don't check)
Example usage:
List<string> fields = SplitLine(line, "\"", ',', 5);
// or
List<string> fields = SplitLine(line, "'", '|', 10);
// or
List<string> fields = SplitLine(line, "", '\t', 0);
Function:
private List<string> SplitLine(string line, string textQualifier, char delim, int colCount)
{
List<string> fields = new List<string>();
string origLine = line;
char textQual = '"';
bool hasTextQual = false;
if (!String.IsNullOrEmpty(textQualifier))
{
hasTextQual = true;
textQual = textQualifier[0];
}
if (hasTextQual)
{
while (!String.IsNullOrEmpty(line))
{
if (line[0] == textQual) // field is text qualified so look for next unqualified delimiter
{
int fieldLen = 1;
while (true)
{
if (line.Length == 2) // must be final field (zero length)
{
fieldLen = 2;
break;
}
else if (fieldLen + 1 >= line.Length) // must be final field
{
fieldLen += 1;
break;
}
else if (line[fieldLen] == textQual && line[fieldLen + 1] == textQual) // escaped text qualifier
{
fieldLen += 2;
}
else if (line[fieldLen] == textQual && line[fieldLen + 1] == delim) // must be end of field
{
fieldLen += 1;
break;
}
else // not a delimiter
{
fieldLen += 1;
}
}
string escapedQual = textQual.ToString() + textQual.ToString();
fields.Add(line.Substring(1, fieldLen - 2).Replace(escapedQual, textQual.ToString())); // replace escaped qualifiers
if (line.Length >= fieldLen + 1)
{
line = line.Substring(fieldLen + 1);
if (line == "") // blank final field
{
fields.Add("");
}
}
else
{
line = "";
}
}
else // field is not text qualified
{
int fieldLen = line.IndexOf(delim);
if (fieldLen != -1) // check next delimiter position
{
fields.Add(line.Substring(0, fieldLen));
line = line.Substring(fieldLen + 1);
if (line == "") // final field must be blank
{
fields.Add("");
}
}
else // must be last field
{
fields.Add(line);
line = "";
}
}
}
}
else // if there is no text qualifier, then use existing split function
{
fields.AddRange(line.Split(delim));
}
if (colCount > 0 && colCount != fields.Count) // count doesn't match expected so throw exception
{
throw new Exception("Field count was:" + fields.Count.ToString() + ", expected:" + colCount.ToString() + ". Line:" + origLine);
}
return fields;
}
Problem: Convert a comma separated string into an array where commas in "quoted strings,,," should not be considered as separators but as part of an entry
Input:
String: First,"Second","Even,With,Commas",,Normal,"Sentence,with ""different"" problems",3,4,5
Output:
String-Array: ['First','Second','Even,With,Commas','','Normal','Sentence,with "different" problems','3','4','5']
Code:
string sLine;
sLine = "First,\"Second\",\"Even,With,Commas\",,Normal,\"Sentence,with \"\"different\"\" problems\",3,4,5";
// 1. Split line by separator; do not split if separator is within quotes
string Separator = ",";
string Escape = '"'.ToString();
MatchCollection Matches = Regex.Matches(sLine,
string.Format("({1}[^{1}]*{1})*(?<Separator>{0})({1}[^{1}]*{1})*", Separator, Escape));
string[] asColumns = new string[Matches.Count + 1];
int PriorMatchIndex = 0;
for (int Index = 0; Index <= asColumns.Length - 2; Index++)
{
asColumns[Index] = sLine.Substring(PriorMatchIndex, Matches[Index].Groups["Separator"].Index - PriorMatchIndex);
PriorMatchIndex = Matches[Index].Groups["Separator"].Index + Separator.Length;
}
asColumns[asColumns.Length - 1] = sLine.Substring(PriorMatchIndex);
// 2. Remove quotes
for (int Index = 0; Index <= asColumns.Length - 1; Index++)
{
if (Regex.IsMatch(asColumns[Index], string.Format("^{0}[^{0}].*[^{0}]{0}$", Escape))) // If "Text" is sourrounded by quotes (but ignore double quotes => "Leave ""inside"" quotes")
{
asColumns[Index] = asColumns[Index].Substring(1, asColumns[Index].Length - 2); // "Text" => Text
}
asColumns[Index] = asColumns[Index].Replace(Escape + Escape, Escape); // Remove double quotes ('My ""special"" text' => 'My "special" text')
if (asColumns[Index] == null) asColumns[Index] = "";
}
The output array is asColumns

Converting multifasta parser from Python to C#

I am trying to convert a multi fasta parser from Python to C#. For the input
>header1
ACTG
GCTA
>header2
GATTACA
it would return the dictionary {'header2': 'GATTACA', 'header1': 'ACTGGCTA'}
The original Python code looks like:
def fastaParser(handle):
""" Adapted from https://github.com/biopython/biopython/blob/master/Bio/SeqIO/FastaIO.py#L39 """
fastaDict = {}
#Skip any text before the first record (e.g. blank lines, comments)
while True:
line = handle.readline()
if line == "":
return # Premature end of file, or just empty?
if line[0] == ">":
break
while True:
if line[0] != ">":
raise ValueError("Records in Fasta files should start with '>' character")
title = line[1:].rstrip()
lines = []
line = handle.readline()
while True:
if not line:
break
if line[0] == ">":
break
lines.append(line.rstrip())
line = handle.readline()
#Remove trailing whitespace, and any internal spaces
sequence = "".join(lines).replace(" ", "").replace("\r", "")
fastaDict[title] = sequence
if not line:
return fastaDict
if __name__ == '__main__':
with open('fasta.txt') as f:
print fastaParser(f)
What I have as C# code is (my code expects a string instead of an open filehandle):
public Dictionary<int, string> parseFasta(string multiFasta)
{
Dictionary<int, string> fastaDict = new Dictionary<int, string>();
using (System.IO.StringReader multiFastaReader = new System.IO.StringReader(multiFasta))
{
// Skip any text before the first record (e.g. blank lines, comments)
while (true)
{
string line = multiFastaReader.ReadLine();
if (line == "")
{
return fastaDict; // Premature end of file, or just empty?
}
if (line[0] == '>')
{
break;
}
}
while (true)
{
if (line[0] != '>') // <- Here I get the error: "the name 'line' does not exist in the current context
{
throw new Exception("Records in Fasta files should start with '>' character");
}
string title= line[1:].TrimEnd();
List<string> lines = new List<string>();
line = multiFastaReader.ReadLine();
while (true)
{
if (!line)
{
break;
}
if (line[0] == '>')
{
break;
}
lines.Add(line.TrimEnd());
line = multiFastaReader.ReadLine();
}
// Remove trailing whitespace, and any internal spaces
string sequence = String.Join("", lines).Replace(" ", "").Replace("\r", "");
fastaDict.Add(title, sequence);
if (!line)
{
return fastaDict;
}
}
}
}
The error that 'm getting is that Visual Studio says that the variables called line after the second while (true) don't exist in the current context.
I finally got it to work with this code:
public Dictionary<string, string> parseFasta(string multiFasta)
{
Dictionary<string, string> fastaDict = new Dictionary<string, string>();
using (System.IO.StringReader multiFastaReader = new System.IO.StringReader(multiFasta))
{
// Skip any text before the first record (e.g. blank lines, comments)
string line = multiFastaReader.ReadLine();
while (true)
{
if (line == "")
{
return fastaDict; // Premature end of file, or just empty?
}
if (line[0] == '>')
{
break;
}
}
while (true)
{
if (line[0] != '>')
{
throw new Exception("Records in Fasta files should start with '>' character");
}
string title= line.Substring(1, line.Length-1).TrimEnd();
List<string> lines = new List<string>();
line = multiFastaReader.ReadLine();
while (true)
{
if (line == "")
{
break;
}
if (line == null)
{
break;
}
if (line[0] == '>')
{
break;
}
lines.Add(line.TrimEnd());
line = multiFastaReader.ReadLine();
}
// Remove trailing whitespace, and any internal spaces
string sequence = String.Join("", lines).Replace(" ", "").Replace("\r", "");
fastaDict.Add(title, sequence);
if (line == null)
{
return fastaDict;
}
}
}
}

Categories

Resources