Display special (non printable) characters in WPF control - c#

I have raw binary data received from device. I would like to display that data something like HEX editors do - display hex values, but also display corresponding characters.
I found fonts that have characters for ASCII codes 0 - 32, but I cannot get them to show on screen.
I tried this with WPF listbox, itemscontrol and textbox.
Is there some setting that can make this work?
Or maybe some WPF control that will show this characters?
Edit:
After some thinking and testing, only characters that make problems are line feed, form feed, carriage return, backspace, horizontal and vertical tab. As quick solution I decided to replace those characters with ASCII 16 (10HEX) character. I tested this with ASCII, UTF-8 and Unicode files and it works with those three formats.
Here is regex that I am using for this:
rawLine = Regex.Replace(inputLine, "[\t\n\r\f\b\v]", '\x0010'.ToString());
It replaces all occurrences of this 6 problematic characters with some boxy sign. It shows that this is not "regular printable" character and it works for me.

Not sure if that's excatly what you want, but I would recommend you to have a look in the #develop project. Their editor can display spaces, tabs and end-of-line markers.
I had a quick look at the source code and in the namespace ICSharpCode.AvalonEdit.Rendering the SingleCharacterElementGenerator class, seems to do what you want.

This should help you can expand it
private static string GetPrintableCharacter(char character)
{
switch (character)
{
case '\a':
{
return "\\a";
}
case '\b':
{
return "\\b";
}
case '\t':
{
return "\\t";
}
case '\n':
{
return "\\n";
}
case '\v':
{
return "\\v";
}
case '\f':
{
return "\\f";
}
case '\r':
{
return "\\r";
}
default:
{
if (character == ' ')
{
break;
}
else
{
throw new InvalidArgumentException(Resources.NOTSUPPORTCHAR, new object[] { character });
}
}
}
return "\\x20";
}
public static string GetPrintableText(string text)
{
StringBuilder stringBuilder = new StringBuilder(1024);
if (text == null)
{
return "[~NULL~]";
}
if (text.Length == 0)
{
return "[~EMPTY~]";
}
stringBuilder.Remove(0, stringBuilder.Length);
int num = 0;
for (int i = 0; i < text.Length; i++)
{
if (text[i] == '\a' || text[i] == '\b' || text[i] == '\f' || text[i] == '\v' || text[i] == '\t' || text[i] == '\n' || text[i] == '\r' || text[i] == ' ')
{
num += 3;
}
}
int length = text.Length + num;
if (stringBuilder.Capacity < length)
{
stringBuilder = new StringBuilder(length);
}
string str = text;
for (int j = 0; j < str.Length; j++)
{
char chr = str[j];
if (chr > ' ')
{
stringBuilder.Append(chr);
}
else
{
stringBuilder.Append(StringHelper.GetPrintableCharacter(chr));
}
}
return stringBuilder.ToString();
}

Related

Split a string if delimiter is between single quotes [duplicate]

This question already has answers here:
How to split csv whose columns may contain comma
(9 answers)
Closed 4 years ago.
I have the following comma-separated string that I need to split. The problem is that some of the content is within quotes and contains commas that shouldn't be used in the split.
String:
111,222,"33,44,55",666,"77,88","99"
I want the output:
111
222
33,44,55
666
77,88
99
I have tried this:
(?:,?)((?<=")[^"]+(?=")|[^",]+)
But it reads the comma between "77,88","99" as a hit and I get the following output:
111
222
33,44,55
666
77,88
,
99
Depending on your needs you may not be able to use a csv parser, and may in fact want to re-invent the wheel!!
You can do so with some simple regex
(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)
This will do the following:
(?:^|,) = Match expression "Beginning of line or string ,"
(\"(?:[^\"]+|\"\")*\"|[^,]*) = A numbered capture group, this will select between 2 alternatives:
stuff in quotes
stuff between commas
This should give you the output you are looking for.
Example code in C#
static Regex csvSplit = new Regex("(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)", RegexOptions.Compiled);
public static string[] SplitCSV(string input)
{
List<string> list = new List<string>();
string curr = null;
foreach (Match match in csvSplit.Matches(input))
{
curr = match.Value;
if (0 == curr.Length)
{
list.Add("");
}
list.Add(curr.TrimStart(','));
}
return list.ToArray();
}
private void button1_Click(object sender, RoutedEventArgs e)
{
Console.WriteLine(SplitCSV("111,222,\"33,44,55\",666,\"77,88\",\"99\""));
}
Warning As per #MrE's comment - if a rogue new line character appears in a badly formed csv file and you end up with an uneven ("string) you'll get catastrophic backtracking (https://www.regular-expressions.info/catastrophic.html) in your regex and your system will likely crash (like our production system did). Can easily be replicated in Visual Studio and as I've discovered will crash it. A simple try/catch will not trap this issue either.
You should use:
(?:^|,)(\"(?:[^\"])*\"|[^,]*)
instead
Fast and easy:
public static string[] SplitCsv(string line)
{
List<string> result = new List<string>();
StringBuilder currentStr = new StringBuilder("");
bool inQuotes = false;
for (int i = 0; i < line.Length; i++) // For each character
{
if (line[i] == '\"') // Quotes are closing or opening
inQuotes = !inQuotes;
else if (line[i] == ',') // Comma
{
if (!inQuotes) // If not in quotes, end of current string, add it to result
{
result.Add(currentStr.ToString());
currentStr.Clear();
}
else
currentStr.Append(line[i]); // If in quotes, just add it
}
else // Add any other character to current string
currentStr.Append(line[i]);
}
result.Add(currentStr.ToString());
return result.ToArray(); // Return array of all strings
}
With this string as input :
111,222,"33,44,55",666,"77,88","99"
It will return :
111
222
33,44,55
666
77,88
99
i really like jimplode's answer, but I think a version with yield return is a little bit more useful, so here it is:
public IEnumerable<string> SplitCSV(string input)
{
Regex csvSplit = new Regex("(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)", RegexOptions.Compiled);
foreach (Match match in csvSplit.Matches(input))
{
yield return match.Value.TrimStart(',');
}
}
Maybe it's even more useful to have it like an extension method:
public static class StringHelper
{
public static IEnumerable<string> SplitCSV(this string input)
{
Regex csvSplit = new Regex("(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)", RegexOptions.Compiled);
foreach (Match match in csvSplit.Matches(input))
{
yield return match.Value.TrimStart(',');
}
}
}
This regular expression works without the need to loop through values and TrimStart(','), like in the accepted answer:
((?<=\")[^\"]*(?=\"(,|$)+)|(?<=,|^)[^,\"]*(?=,|$))
Here is the implementation in C#:
string values = "111,222,\"33,44,55\",666,\"77,88\",\"99\"";
MatchCollection matches = new Regex("((?<=\")[^\"]*(?=\"(,|$)+)|(?<=,|^)[^,\"]*(?=,|$))").Matches(values);
foreach (var match in matches)
{
Console.WriteLine(match);
}
Outputs
111
222
33,44,55
666
77,88
99
None of these answers work when the string has a comma inside quotes, as in "value, 1", or escaped double-quotes, as in "value ""1""", which are valid CSV that should be parsed as value, 1 and value "1", respectively.
This will also work with the tab-delimited format if you pass in a tab instead of a comma as your delimiter.
public static IEnumerable<string> SplitRow(string row, char delimiter = ',')
{
var currentString = new StringBuilder();
var inQuotes = false;
var quoteIsEscaped = false; //Store when a quote has been escaped.
row = string.Format("{0}{1}", row, delimiter); //We add new cells at the delimiter, so append one for the parser.
foreach (var character in row.Select((val, index) => new {val, index}))
{
if (character.val == delimiter) //We hit a delimiter character...
{
if (!inQuotes) //Are we inside quotes? If not, we've hit the end of a cell value.
{
Console.WriteLine(currentString);
yield return currentString.ToString();
currentString.Clear();
}
else
{
currentString.Append(character.val);
}
} else {
if (character.val != ' ')
{
if(character.val == '"') //If we've hit a quote character...
{
if(character.val == '\"' && inQuotes) //Does it appear to be a closing quote?
{
if (row[character.index + 1] == character.val) //If the character afterwards is also a quote, this is to escape that (not a closing quote).
{
quoteIsEscaped = true; //Flag that we are escaped for the next character. Don't add the escaping quote.
}
else if (quoteIsEscaped)
{
quoteIsEscaped = false; //This is an escaped quote. Add it and revert quoteIsEscaped to false.
currentString.Append(character.val);
}
else
{
inQuotes = false;
}
}
else
{
if (!inQuotes)
{
inQuotes = true;
}
else
{
currentString.Append(character.val); //...It's a quote inside a quote.
}
}
}
else
{
currentString.Append(character.val);
}
}
else
{
if (!string.IsNullOrWhiteSpace(currentString.ToString())) //Append only if not new cell
{
currentString.Append(character.val);
}
}
}
}
}
With minor updates to the function provided by "Chad Hedgcock".
Updates are on:
Line 26: character.val == '\"' - This can never be true due to the check made on Line 24. i.e. character.val == '"'
Line 28: if (row[character.index + 1] == character.val) added !quoteIsEscaped to escape 3 consecutive quotes.
public static IEnumerable<string> SplitRow(string row, char delimiter = ',')
{
var currentString = new StringBuilder();
var inQuotes = false;
var quoteIsEscaped = false; //Store when a quote has been escaped.
row = string.Format("{0}{1}", row, delimiter); //We add new cells at the delimiter, so append one for the parser.
foreach (var character in row.Select((val, index) => new {val, index}))
{
if (character.val == delimiter) //We hit a delimiter character...
{
if (!inQuotes) //Are we inside quotes? If not, we've hit the end of a cell value.
{
//Console.WriteLine(currentString);
yield return currentString.ToString();
currentString.Clear();
}
else
{
currentString.Append(character.val);
}
} else {
if (character.val != ' ')
{
if(character.val == '"') //If we've hit a quote character...
{
if(character.val == '"' && inQuotes) //Does it appear to be a closing quote?
{
if (row[character.index + 1] == character.val && !quoteIsEscaped) //If the character afterwards is also a quote, this is to escape that (not a closing quote).
{
quoteIsEscaped = true; //Flag that we are escaped for the next character. Don't add the escaping quote.
}
else if (quoteIsEscaped)
{
quoteIsEscaped = false; //This is an escaped quote. Add it and revert quoteIsEscaped to false.
currentString.Append(character.val);
}
else
{
inQuotes = false;
}
}
else
{
if (!inQuotes)
{
inQuotes = true;
}
else
{
currentString.Append(character.val); //...It's a quote inside a quote.
}
}
}
else
{
currentString.Append(character.val);
}
}
else
{
if (!string.IsNullOrWhiteSpace(currentString.ToString())) //Append only if not new cell
{
currentString.Append(character.val);
}
}
}
}
}
For Jay's answer, if you use a 2nd boolean then you can have nested double-quotes inside single-quotes and vice-versa.
private string[] splitString(string stringToSplit)
{
char[] characters = stringToSplit.ToCharArray();
List<string> returnValueList = new List<string>();
string tempString = "";
bool blockUntilEndQuote = false;
bool blockUntilEndQuote2 = false;
int characterCount = 0;
foreach (char character in characters)
{
characterCount = characterCount + 1;
if (character == '"' && !blockUntilEndQuote2)
{
if (blockUntilEndQuote == false)
{
blockUntilEndQuote = true;
}
else if (blockUntilEndQuote == true)
{
blockUntilEndQuote = false;
}
}
if (character == '\'' && !blockUntilEndQuote)
{
if (blockUntilEndQuote2 == false)
{
blockUntilEndQuote2 = true;
}
else if (blockUntilEndQuote2 == true)
{
blockUntilEndQuote2 = false;
}
}
if (character != ',')
{
tempString = tempString + character;
}
else if (character == ',' && (blockUntilEndQuote == true || blockUntilEndQuote2 == true))
{
tempString = tempString + character;
}
else
{
returnValueList.Add(tempString);
tempString = "";
}
if (characterCount == characters.Length)
{
returnValueList.Add(tempString);
tempString = "";
}
}
string[] returnValue = returnValueList.ToArray();
return returnValue;
}
The original version
Currently I use the following regex:
public static Regex regexCSVSplit = new Regex(#"(?x:(
(?<FULL>
(^|[,;\t\r\n])\s*
( (?<QUODAT> (?<QUO>[""'])(?<DAT>([^,;\t\r\n]|(?<!\k<QUO>\s*)[,;\t\r\n])*)\k<QUO>) |
(?<QUODAT> (?<DAT> [^""',;\s\r\n]* )) )
(?=\s*([,;\t\r\n]|$))
) |
(?<FULL>
(^|[\s\t\r\n])
( (?<QUODAT> (?<QUO>[""'])(?<DAT> [^""',;\s\t\r\n]* )\k<QUO>) |
(?<QUODAT> (?<DAT> [^""',;\s\t\r\n]* )) )
(?=[,;\s\t\r\n]|$)
)
))", RegexOptions.Compiled);
This solution can handle pretty chaotic cases too like below:
This is how to feed the result into an array:
var data = regexCSVSplit.Matches(line_to_process).Cast<Match>().
Select(x => x.Groups["DAT"].Value).ToArray();
See this example in action HERE
Note: The regular expression contains two set of <FULL> block and each of them contains two <QUODAT> block separated by "or" (|). Depending on your task you may only need one of them.
Note: That this regular expression gives us one string array, and works on single line with or without <carrier return> and/or <line feed>.
Simplified version
The following regular expression will already cover many complex cases:
public static Regex regexCSVSplit = new Regex(#"(?x:(
(?<FULL>
(^|[,;\t\r\n])\s*
(?<QUODAT> (?<QUO>[""'])(?<DAT>([^,;\t\r\n]|(?<!\k<QUO>\s*)[,;\t\r\n])*)\k<QUO>)
(?=\s*([,;\t\r\n]|$))
)
))", RegexOptions.Compiled);
See this example in action: HERE
It can process complex, easy and empty items too:
This is how to feed the result into an array:
var data = regexCSVSplit.Matches(line_to_process).Cast<Match>().
Select(x => x.Groups["DAT"].Value).ToArray();
The main rule here is that every item may contain anything but the <quotation mark><separators><comma> sequence AND each item shall being and end with the same <quotation mark>.
<quotation mark>: <">, <'>
<comma>: <,>, <;>, <tab>, <carrier return>, <line feed>
Edit notes: I added some more explanation to make it easier to understand and replaces the text "CO" with "QUO".
Try this:
string s = #"111,222,""33,44,55"",666,""77,88"",""99""";
List<string> result = new List<string>();
var splitted = s.Split('"').ToList<string>();
splitted.RemoveAll(x => x == ",");
foreach (var it in splitted)
{
if (it.StartsWith(",") || it.EndsWith(","))
{
var tmp = it.TrimEnd(',').TrimStart(',');
result.AddRange(tmp.Split(','));
}
else
{
if(!string.IsNullOrEmpty(it)) result.Add(it);
}
}
//Results:
foreach (var it in result)
{
Console.WriteLine(it);
}
I know I'm a bit late to this, but for searches, here is how I did what you are asking about in C sharp
private string[] splitString(string stringToSplit)
{
char[] characters = stringToSplit.ToCharArray();
List<string> returnValueList = new List<string>();
string tempString = "";
bool blockUntilEndQuote = false;
int characterCount = 0;
foreach (char character in characters)
{
characterCount = characterCount + 1;
if (character == '"')
{
if (blockUntilEndQuote == false)
{
blockUntilEndQuote = true;
}
else if (blockUntilEndQuote == true)
{
blockUntilEndQuote = false;
}
}
if (character != ',')
{
tempString = tempString + character;
}
else if (character == ',' && blockUntilEndQuote == true)
{
tempString = tempString + character;
}
else
{
returnValueList.Add(tempString);
tempString = "";
}
if (characterCount == characters.Length)
{
returnValueList.Add(tempString);
tempString = "";
}
}
string[] returnValue = returnValueList.ToArray();
return returnValue;
}
Don't reinvent a CSV parser, try FileHelpers.
I needed something a little more robust, so I took from here and created this... This solution is a little less elegant and a little more verbose, but in my testing (with a 1,000,000 row sample), I found this to be 2 to 3 times faster. Plus it handles non-escaped, embedded quotes. I used string delimiter and qualifiers instead of chars because of the requirements of my solution. I found it more difficult than I expected to find a good, generic CSV parser so I hope this parsing algorithm can help someone.
public static string[] SplitRow(string record, string delimiter, string qualifier, bool trimData)
{
// In-Line for example, but I implemented as string extender in production code
Func <string, int, int> IndexOfNextNonWhiteSpaceChar = delegate (string source, int startIndex)
{
if (startIndex >= 0)
{
if (source != null)
{
for (int i = startIndex; i < source.Length; i++)
{
if (!char.IsWhiteSpace(source[i]))
{
return i;
}
}
}
}
return -1;
};
var results = new List<string>();
var result = new StringBuilder();
var inQualifier = false;
var inField = false;
// We add new columns at the delimiter, so append one for the parser.
var row = $"{record}{delimiter}";
for (var idx = 0; idx < row.Length; idx++)
{
// A delimiter character...
if (row[idx]== delimiter[0])
{
// Are we inside qualifier? If not, we've hit the end of a column value.
if (!inQualifier)
{
results.Add(trimData ? result.ToString().Trim() : result.ToString());
result.Clear();
inField = false;
}
else
{
result.Append(row[idx]);
}
}
// NOT a delimiter character...
else
{
// ...Not a space character
if (row[idx] != ' ')
{
// A qualifier character...
if (row[idx] == qualifier[0])
{
// Qualifier is closing qualifier...
if (inQualifier && row[IndexOfNextNonWhiteSpaceChar(row, idx + 1)] == delimiter[0])
{
inQualifier = false;
continue;
}
else
{
// ...Qualifier is opening qualifier
if (!inQualifier)
{
inQualifier = true;
}
// ...It's a qualifier inside a qualifier.
else
{
inField = true;
result.Append(row[idx]);
}
}
}
// Not a qualifier character...
else
{
result.Append(row[idx]);
inField = true;
}
}
// ...A space character
else
{
if (inQualifier || inField)
{
result.Append(row[idx]);
}
}
}
}
return results.ToArray<string>();
}
Some test code:
//var input = "111,222,\"33,44,55\",666,\"77,88\",\"99\"";
var input =
"111, 222, \"99\",\"33,44,55\" , \"666 \"mark of a man\"\", \" spaces \"77,88\" \"";
Console.WriteLine("Split with trim");
Console.WriteLine("---------------");
var result = SplitRow(input, ",", "\"", true);
foreach (var r in result)
{
Console.WriteLine(r);
}
Console.WriteLine("");
// Split 2
Console.WriteLine("Split with no trim");
Console.WriteLine("------------------");
var result2 = SplitRow(input, ",", "\"", false);
foreach (var r in result2)
{
Console.WriteLine(r);
}
Console.WriteLine("");
// Time Trial 1
Console.WriteLine("Experimental Process (1,000,000) iterations");
Console.WriteLine("-------------------------------------------");
watch = Stopwatch.StartNew();
for (var i = 0; i < 1000000; i++)
{
var x1 = SplitRow(input, ",", "\"", false);
}
watch.Stop();
elapsedMs = watch.ElapsedMilliseconds;
Console.WriteLine($"Total Process Time: {string.Format("{0:0.###}", elapsedMs / 1000.0)} Seconds");
Console.WriteLine("");
Results
Split with trim
---------------
111
222
99
33,44,55
666 "mark of a man"
spaces "77,88"
Split with no trim
------------------
111
222
99
33,44,55
666 "mark of a man"
spaces "77,88"
Original Process (1,000,000) iterations
-------------------------------
Total Process Time: 7.538 Seconds
Experimental Process (1,000,000) iterations
--------------------------------------------
Total Process Time: 3.363 Seconds
I once had to do something similar and in the end I got stuck with Regular Expressions. The inability for Regex to have state makes it pretty tricky - I just ended up writing a simple little parser.
If you're doing CSV parsing you should just stick to using a CSV parser - don't reinvent the wheel.
Here is my fastest implementation based upon string raw pointer manipulation:
string[] FastSplit(string sText, char? cSeparator = null, char? cQuotes = null)
{
string[] oTokens;
if (null == cSeparator)
{
cSeparator = DEFAULT_PARSEFIELDS_SEPARATOR;
}
if (null == cQuotes)
{
cQuotes = DEFAULT_PARSEFIELDS_QUOTE;
}
unsafe
{
fixed (char* lpText = sText)
{
#region Fast array estimatation
char* lpCurrent = lpText;
int nEstimatedSize = 0;
while (0 != *lpCurrent)
{
if (cSeparator == *lpCurrent)
{
nEstimatedSize++;
}
lpCurrent++;
}
nEstimatedSize++; // Add EOL char(s)
string[] oEstimatedTokens = new string[nEstimatedSize];
#endregion
#region Parsing
char[] oBuffer = new char[sText.Length];
int nIndex = 0;
int nTokens = 0;
lpCurrent = lpText;
while (0 != *lpCurrent)
{
if (cQuotes == *lpCurrent)
{
// Quotes parsing
lpCurrent++; // Skip quote
nIndex = 0; // Reset buffer
while (
(0 != *lpCurrent)
&& (cQuotes != *lpCurrent)
)
{
oBuffer[nIndex] = *lpCurrent; // Store char
lpCurrent++; // Move source cursor
nIndex++; // Move target cursor
}
}
else if (cSeparator == *lpCurrent)
{
// Separator char parsing
oEstimatedTokens[nTokens++] = new string(oBuffer, 0, nIndex); // Store token
nIndex = 0; // Skip separator and Reset buffer
}
else
{
// Content parsing
oBuffer[nIndex] = *lpCurrent; // Store char
nIndex++; // Move target cursor
}
lpCurrent++; // Move source cursor
}
// Recover pending buffer
if (nIndex > 0)
{
// Store token
oEstimatedTokens[nTokens++] = new string(oBuffer, 0, nIndex);
}
// Build final tokens list
if (nTokens == nEstimatedSize)
{
oTokens = oEstimatedTokens;
}
else
{
oTokens = new string[nTokens];
Array.Copy(oEstimatedTokens, 0, oTokens, 0, nTokens);
}
#endregion
}
}
// Epilogue
return oTokens;
}
Try this
private string[] GetCommaSeperatedWords(string sep, string line)
{
List<string> list = new List<string>();
StringBuilder word = new StringBuilder();
int doubleQuoteCount = 0;
for (int i = 0; i < line.Length; i++)
{
string chr = line[i].ToString();
if (chr == "\"")
{
if (doubleQuoteCount == 0)
doubleQuoteCount++;
else
doubleQuoteCount--;
continue;
}
if (chr == sep && doubleQuoteCount == 0)
{
list.Add(word.ToString());
word = new StringBuilder();
continue;
}
word.Append(chr);
}
list.Add(word.ToString());
return list.ToArray();
}
This is Chad's answer rewritten with state based logic. His answered failed for me when it came across """BRAD""" as a field. That should return "BRAD" but it just ate up all the remaining fields. When I tried to debug it I just ended up rewriting it as state based logic:
enum SplitState { s_begin, s_infield, s_inquotefield, s_foundquoteinfield };
public static IEnumerable<string> SplitRow(string row, char delimiter = ',')
{
var currentString = new StringBuilder();
SplitState state = SplitState.s_begin;
row = string.Format("{0}{1}", row, delimiter); //We add new cells at the delimiter, so append one for the parser.
foreach (var character in row.Select((val, index) => new { val, index }))
{
//Console.WriteLine("character = " + character.val + " state = " + state);
switch (state)
{
case SplitState.s_begin:
if (character.val == delimiter)
{
/* empty field */
yield return currentString.ToString();
currentString.Clear();
} else if (character.val == '"')
{
state = SplitState.s_inquotefield;
} else
{
currentString.Append(character.val);
state = SplitState.s_infield;
}
break;
case SplitState.s_infield:
if (character.val == delimiter)
{
/* field with data */
yield return currentString.ToString();
state = SplitState.s_begin;
currentString.Clear();
} else
{
currentString.Append(character.val);
}
break;
case SplitState.s_inquotefield:
if (character.val == '"')
{
// could be end of field, or escaped quote.
state = SplitState.s_foundquoteinfield;
} else
{
currentString.Append(character.val);
}
break;
case SplitState.s_foundquoteinfield:
if (character.val == '"')
{
// found escaped quote.
currentString.Append(character.val);
state = SplitState.s_inquotefield;
}
else if (character.val == delimiter)
{
// must have been last quote so we must find delimiter
yield return currentString.ToString();
state = SplitState.s_begin;
currentString.Clear();
}
else
{
throw new Exception("Quoted field not terminated.");
}
break;
default:
throw new Exception("unknown state:" + state);
}
}
//Console.WriteLine("currentstring = " + currentString.ToString());
}
This is a lot more lines of code than the other solutions, but it is easy to modify to add edge cases.

Parse string in C# extension method with regex

I need to create extension method which pars(split) my string.
For example:
If I have string
COMMAND 1 PROCESSED "JOB command" 20160801 09:05:24
It should be split like this
COMMAND
1
PROCESSED
"JOB command"
20160801
09:05:24
Other example.
If I have string:
COMMAND 2 ERROR 06 00000032 "Message window is still active." 20160801
09:05:24
It should be split like this:
COMMAND
2
ERROR
06
00000032
"Message window is still active."
20160801 09:05:24
I have solution for this. But I am sure that there is much cleaner solution.
My solution:
public static List<string> GetTokens(this string line)
{
// TODO: Code refactoring:
var res = new List<string>();
var parts = Regex.Split(line, "/[^\\s\"']+|\"([^\"]*)\"|'([^']*)'/g");
var subParts = parts[0].Split(' ');
foreach (var val in subParts)
{
res.Add(val);
}
res.Add(parts[1]);
subParts = parts[2].Split(' ');
foreach (var val in subParts)
{
res.Add(val);
}
res.RemoveAll(f => f.Trim() == "");
return res;
}
I would like to implement cleaner solution. Any ideas?
I suggest implementing a simple loop instead of complex regular expression:
public static IEnumerable<String> GetTokens(this string value) {
if (string.IsNullOrEmpty(value))
yield break; // or throw exception in case of value == null
bool inQuotation = false;
int index = 0;
for (int i = 0; i < value.Length; ++i) {
char ch = value[i];
if (ch == '"')
inQuotation = !inQuotation;
else if ((ch == ' ') && (!inQuotation)) {
yield return value.Substring(index, i - index);
index = i + 1;
}
}
if (index < value.Length)
yield return value.Substring(index, value.Length - index);
}
Test
var source =
"COMMAND 2 ERROR 06 00000032 \"Message window is still active.\" 20160801 09:05:24";
Console.Write(string.Join(Environment.NewLine, GetTokens(source)));
Output
COMMAND
2
ERROR
06
00000032
"Message window is still active."
20160801
09:05:24
Edit: in case you want two quotation types with " (double) as well as ' (single):
public static IEnumerable<String> GetTokens(string value) {
if (string.IsNullOrEmpty(value))
yield break;
bool inQuotation = false;
bool inApostroph = false;
int index = 0;
for (int i = 0; i < value.Length; ++i) {
char ch = value[i];
if (inQuotation)
inQuotation = ch != '"';
else if (inApostroph)
inApostroph = ch != '\'';
else if (ch == '"')
inQuotation = true;
else if (ch == '\'')
inApostroph = true;
else if ((ch == ' ') && (!inQuotation)) {
yield return value.Substring(index, i - index);
index = i + 1;
}
}
if (index < value.Length)
yield return value.Substring(index, value.Length - index);
}
After a while a figured out some simple code:
public static List<string> GetTokens(this string line)
{
return Regex.Matches(line, #"([^\s""]+|""([^""]*)"")").OfType<Match>().Select(l => l.Groups[1].Value).ToList();
}
I tested the code with a MessageBox which showed the List with | in-between each item:
You can use regex like : ([^\s"]+|"[^"]*") with globlal identifier
Demo and Explaination
A pure regex solution:
public static List<string> GetTokens(this string line)
{
return Regex.Matches(line,
#""".*?""|\S+").Cast<Match>().Select(m => m.Value).ToList();
}
The ".*?"|\S+ regex matches either a quoted string or a non-space char sequence. These matches then can be returned as collection in one go.
Here is a demo: https://ideone.com/hmLQIt.

WebUtility.HtmlDecode vs HttpUtilty.HtmlDecode

I was using WebUtilty.HtmlDecode to decode HTML. It turns out that it doesn't decode properly, for example, – is supposed to decode to a "–" character, but WebUtilty.HtmlDecode does not decode it. HttpUtilty.HtmlDecode, however, does.
Debug.WriteLine(WebUtility.HtmlDecode("–"));
Debug.WriteLine(HttpUtility.HtmlDecode("–"));
> –
> –
The documentation for both of these is the same:
Converts a string that has been HTML-encoded for HTTP transmission into a decoded string.
Why are they different, which one should I be using, and what will change if I switch to WebUtility.HtmlDecode to get "–" to decode correctly?
The implementation of the two methods are indeed different on Windows Phone.
WebUtility.HtmlDecode:
public static void HtmlDecode(string value, TextWriter output)
{
if (value != null)
{
if (output == null)
{
throw new ArgumentNullException("output");
}
if (!StringRequiresHtmlDecoding(value))
{
output.Write(value);
}
else
{
int length = value.Length;
for (int i = 0; i < length; i++)
{
bool flag;
uint num4;
char ch = value[i];
if (ch != '&')
{
goto Label_01B6;
}
int num3 = value.IndexOfAny(_htmlEntityEndingChars, i + 1);
if ((num3 <= 0) || (value[num3] != ';'))
{
goto Label_01B6;
}
string entity = value.Substring(i + 1, (num3 - i) - 1);
if ((entity.Length <= 1) || (entity[0] != '#'))
{
goto Label_0188;
}
if ((entity[1] == 'x') || (entity[1] == 'X'))
{
flag = uint.TryParse(entity.Substring(2), NumberStyles.AllowHexSpecifier, NumberFormatInfo.InvariantInfo, out num4);
}
else
{
flag = uint.TryParse(entity.Substring(1), NumberStyles.Integer, NumberFormatInfo.InvariantInfo, out num4);
}
if (flag)
{
switch (_htmlDecodeConformance)
{
case UnicodeDecodingConformance.Strict:
flag = (num4 < 0xd800) || ((0xdfff < num4) && (num4 <= 0x10ffff));
goto Label_0151;
case UnicodeDecodingConformance.Compat:
flag = (0 < num4) && (num4 <= 0xffff);
goto Label_0151;
case UnicodeDecodingConformance.Loose:
flag = num4 <= 0x10ffff;
goto Label_0151;
}
flag = false;
}
Label_0151:
if (!flag)
{
goto Label_01B6;
}
if (num4 <= 0xffff)
{
output.Write((char) num4);
}
else
{
char ch2;
char ch3;
ConvertSmpToUtf16(num4, out ch2, out ch3);
output.Write(ch2);
output.Write(ch3);
}
i = num3;
goto Label_01BD;
Label_0188:
i = num3;
char ch4 = HtmlEntities.Lookup(entity);
if (ch4 != '\0')
{
ch = ch4;
}
else
{
output.Write('&');
output.Write(entity);
output.Write(';');
goto Label_01BD;
}
Label_01B6:
output.Write(ch);
Label_01BD:;
}
}
}
}
HttpUtility.HtmlDecode:
public static string HtmlDecode(string html)
{
if (html == null)
{
return null;
}
if (html.IndexOf('&') < 0)
{
return html;
}
StringBuilder sb = new StringBuilder();
StringWriter writer = new StringWriter(sb, CultureInfo.InvariantCulture);
int length = html.Length;
for (int i = 0; i < length; i++)
{
char ch = html[i];
if (ch == '&')
{
int num3 = html.IndexOfAny(s_entityEndingChars, i + 1);
if ((num3 > 0) && (html[num3] == ';'))
{
string entity = html.Substring(i + 1, (num3 - i) - 1);
if ((entity.Length > 1) && (entity[0] == '#'))
{
try
{
if ((entity[1] == 'x') || (entity[1] == 'X'))
{
ch = (char) int.Parse(entity.Substring(2), NumberStyles.AllowHexSpecifier, CultureInfo.InvariantCulture);
}
else
{
ch = (char) int.Parse(entity.Substring(1), CultureInfo.InvariantCulture);
}
i = num3;
}
catch (FormatException)
{
i++;
}
catch (ArgumentException)
{
i++;
}
}
else
{
i = num3;
char ch2 = HtmlEntities.Lookup(entity);
if (ch2 != '\0')
{
ch = ch2;
}
else
{
writer.Write('&');
writer.Write(entity);
writer.Write(';');
continue;
}
}
}
}
writer.Write(ch);
}
return sb.ToString();
}
Interestingly, WebUtility doesn't exist on WP7. Also, the WP8 implementation of WebUtility is identical to the desktop one. The desktop implementation of HttpUtility.HtmlDecode is just a wrapper around WebUtility.HtmlDecode. Last but not least, Silverlight 5 has the same implementation of HttpUtility.HtmlDecode as Windows Phone, and does not implement WebUtility.
From there, I can venture a guess: since the Windows Phone 7 runtime is based on Silverlight, WP7 inherited of the Silverlight version of HttpUtility.HtmlDecode, and WebUtility wasn't present. Then came WP8, whose runtime is based on WinRT. WinRT brought WebUtility, and the old version of HttpUtility.HtmlDecode was kept to ensure the compatibility with the legacy WP7 apps.
As to know which one you should use... If you want to target WP7 then you have no choice but to use HttpUtility.HtmlDecode. If you're targeting WP8, then just pick the method whose behavior suits your needs the best. WebUtility is probably the future-proof choice, just in case Microsoft decides to ditch the Silverlight runtime in an upcoming version of Windows Phone. But I'd just go with the practical choice of picking HttpUtility to not have to worry about manually supporting the example you've put in your question.
The methods do exactly the same. Moreover if you try to decompile them the implementations look like one was just copied from another.
The difference is only intended use. HttpUtility is contained in the System.Web assembly and is expected to be used in ASP.net applications which are built over this assembly. WebUtility is contained in the System assembly referenced by nearly all applications and is provided for more general purpose or client use.
Just to notify others who will find this in search. Use any function that mentioned in the question, but never use Windows.Data.Html.HtmlUtilities.ConvertToText(string input). It's 70 times slower than WebUtilty.HtmlDecode and produce crashes! Crash will be named as mshtml!IEPeekMessage in the DevCenter. It looks like this function call InternetExplorer to convert the string. Just avoid it.

Logic to decrease character values

I am working on a logic that decreases the value of an alphanumeric List<char>. For example, A10 becomes A9, BBA becomes BAZ, 123 becomes 122. And yes, if the value entered is the last one(like A or 0), then I should return -
An additional overhead is that there is a List<char> variable which is maintained by the user. It has characters which are to be skipped. For example, if the list contains A in it, the value GHB should become GGZ and not GHA.
The base of this logic is a very simple usage of decreasing the char but with these conditions, I am finding it very difficult.
My project is in Silverlight, the language is C#. Following is my code that I have been trying to do in the 3 methods:
List<char> lstGetDecrName(List<char> lstVal)//entry point of the value that returns decreased value
{
List<char> lstTmp = lstVal;
subCheckEmpty(ref lstTmp);
switch (lstTmp.Count)
{
case 0:
lstTmp.Add('-');
return lstTmp;
case 1:
if (lstTmp[0] == '-')
{
return lstTmp;
}
break;
case 2:
if (lstTmp[1] == '0')
{
if (lstTmp[0] == '1')
{
lstTmp.Clear();
lstTmp.Add('9');
return lstTmp;
}
if (lstTmp[0] == 'A')
{
lstTmp.Clear();
lstTmp.Add('-');
return lstTmp;
}
}
if (lstTmp[1] == 'A')
{
if (lstTmp[0] == 'A')
{
lstTmp.Clear();
lstTmp.Add('Z');
return lstTmp;
}
}
break;
}
return lstGetDecrValue(lstTmp,lstVal);
}
List<char> lstGetDecrValue(List<char> lstTmp,List<char> lstVal)
{
List<char> lstValue = new List<char>();
switch (lstTmp.Last())
{
case 'A':
lstValue = lstGetDecrTemp('Z', lstTmp, lstVal);
break;
case 'a':
lstValue = lstGetDecrTemp('z', lstTmp, lstVal);
break;
case '0':
lstValue = lstGetDecrTemp('9', lstTmp, lstVal);
break;
default:
char tmp = (char)(lstTmp.Last() - 1);
lstTmp.RemoveAt(lstTmp.Count - 1);
lstTmp.Add(tmp);
lstValue = lstTmp;
break;
}
return lstValue;
}
List<char> lstGetDecrTemp(char chrTemp, List<char> lstTmp, List<char> lstVal)//shifting places eg unit to ten,etc.
{
if (lstTmp.Count == 1)
{
lstTmp.Clear();
lstTmp.Add('-');
return lstTmp;
}
lstTmp.RemoveAt(lstTmp.Count - 1);
lstVal = lstGetDecrName(lstTmp);
lstVal.Insert(lstVal.Count, chrTemp);
return lstVal;
}
I seriously need help for this. Please help me out crack through this.
The problem you are trying to solve is actually how to decrement discreet sections of a sequence of characters, each with it's own counting system, where each section is separated by a change between Alpha and Numeric. The rest of the problem is easy once you identify this.
The skipping of unwanted characters is simply a matter of repeating the decrement if you get an unwanted character in the result.
One difficultly is the ambiguous definition of the sequences. e.g. what to do when you get down to say A00, what is next? "A" or "-". For the sake of argument I am assuming a practical implementation based loosely on Excel cell names (i.e. each section operates independently of the others).
The code below does 95% of what you wanted, however there is a bug in the exclusions code. e.g. "ABB" becomes "AAY". I feel the exclusions need to be applied at a higher level (e.g. repeat decrement until no character is in the exclusions list), but I don't have time to finish it now. Also it is resulting in a blank string when it counts down to nothing, rather than the "-" you wanted, but that is trivial to add at the end of the process.
Part 1 (divide the problem into sections):
public static string DecreaseName( string name, string exclusions )
{
if (string.IsNullOrEmpty(name))
{
return name;
}
// Split the problem into sections (reverse order)
List<StringBuilder> sections = new List<StringBuilder>();
StringBuilder result = new StringBuilder(name.Length);
bool isNumeric = char.IsNumber(name[0]);
StringBuilder sb = new StringBuilder();
sections.Add(sb);
foreach (char c in name)
{
// If we change between alpha and number, start new string.
if (char.IsNumber(c) != isNumeric)
{
isNumeric = char.IsNumber(c);
sb = new StringBuilder();
sections.Insert(0, sb);
}
sb.Append(c);
}
// Now process each section
bool cascadeToNext = true;
foreach (StringBuilder section in sections)
{
if (cascadeToNext)
{
result.Insert(0, DecrementString(section, exclusions, out cascadeToNext));
}
else
{
result.Insert(0, section);
}
}
return result.ToString().Replace(" ", "");
}
Part2 (decrement a given string):
private static string DecrementString(StringBuilder section, string exclusions, out bool cascadeToNext)
{
bool exclusionsExist = false;
do
{
exclusionsExist = false;
cascadeToNext = true;
// Process characters in reverse
for (int i = section.Length - 1; i >= 0 && cascadeToNext; i--)
{
char c = section[i];
switch (c)
{
case 'A':
c = (i > 0) ? 'Z' : ' ';
cascadeToNext = (i > 0);
break;
case 'a':
c = (i > 0) ? 'z' : ' ';
cascadeToNext = (i > 0);
break;
case '0':
c = (i > 0) ? '9' : ' ';
cascadeToNext = (i > 0);
break;
case ' ':
cascadeToNext = false;
break;
default:
c = (char)(((int)c) - 1);
if (i == 0 && c == '0')
{
c = ' ';
}
cascadeToNext = false;
break;
}
section[i] = c;
if (exclusions.Contains(c.ToString()))
{
exclusionsExist = true;
}
}
} while (exclusionsExist);
return section.ToString();
}
The dividing can of course be done more efficiently, just passing start and end indexes to the DecrementString, but this is easier to write & follow and not much slower in practical terms.
do a check if its a number if so then do a minus math of the number, if its a string then change it to char codes and then the char code minus 1
I couldn't stop thinking about this yesterday, so here's an idea. Note, this is just pseudo-code, and not tested, but I think the idea is valid and should work (with a few modifications).
The main point is to define your "alphabet" directly, and specify which characters in it are illegal and should be skipped, then use a list or array of positions in this alphabet to define the word you start with.
I can't spend any more time on this right now, but please let me know if you decide to use it and get it to work!
string[] alphabet = {a, b, c, d, e};
string[] illegal = {c, d};
public string ReduceString(string s){
// Create a list of the alphabet-positions for each letter:
int[] positionList = s.getCharsAsPosNrsInAlphabet();
int[] reducedPositionList = ReduceChar(positionList, positionList.length);
string result = "";
foreach(int pos in reducedPositionList){
result += alphabet[pos];
}
return result;
}
public string ReduceChar(string[] positionList, posToReduce){
int reducedCharPosition = ReduceToNextLegalChar(positionList[posToReduce]);
// put reduced char back in place:
positionList[posToReduce] = reducedCharPosition;
if(reducedCharPosition < 0){
if(posToReduce <= 0){
// Reached the end, reduced everything, return empty array!:
return new string[]();
}
// move to back of alphabet again (ie, like the 9 in "11 - 2 = 09"):
reducedCharPosition += alphabet.length;
// Recur and reduce next position (ie, like the 0 in "11 - 2 = 09"):
return ReduceChar(positionList, posToReduce-1);
}
return positionList;
}
public int ReduceToNextLegalChar(int pos){
int nextPos = pos--;
return (isLegalChar(nextPos) ? nextPos : ReduceToNextLegalChar(nextPos));
}
public boolean IsLegalChar(int pos){
return (! illegal.contains(alphabet[pos]));
}
enter code here
Without writing all your code for you, here's a suggestion as to how you can break this down:
char DecrementAlphaNumericChar(char input, out bool hadToWrap)
{
if (input == 'A')
{
hadToWrap = true;
return 'Z';
}
else if (input == '0')
{
hadToWrap = true;
return '9';
}
else if ((input > 'A' && input <= 'Z') || (input > '0' && input <= '9'))
{
hadToWrap = false;
return (char)((int)input - 1);
}
throw new ArgumentException(
"Characters must be digits or capital letters",
"input");
}
char DecrementAvoidingProhibited(
char input, List<char> prohibited, out bool hadToWrap)
{
var potential = DecrementAlphaNumericChar(input, out hadToWrap);
while (prohibited.Contains(potential))
{
bool temp;
potential = DecrementAlphaNumericChar(potential, out temp);
if (potential == input)
{
throw new ArgumentException(
"A whole class of characters was prohibited",
"prohibited");
}
hadToWrap |= temp;
}
return potential;
}
string DecrementString(string input, List<char> prohibited)
{
char[] chrs = input.ToCharArray();
for (int i = chrs.Length - 1; i >= 0; i--)
{
bool wrapped;
chrs[i] = DecrementAvoidingProhibited(
chrs[i], prohibited, out wrapped);
if (!wrapped)
return new string(chrs);
}
return "-";
}
The only issue here is that it will reduce e.g. A10 to A09 not A9. I actually prefer this myself, but it should be simple to write a final pass that removes the extra zeroes.
For a little more performance, replace the List<char>s with Hashset<char>s, they should allow a faster Contains lookup.
I found solution to my own answer with some other workarounds.
The calling function:
MyFunction()
{
//stuff I do before
strValue = lstGetDecrName(strValue.ToList());//decrease value here
if (strValue.Contains('-'))
{
strValue = "-";
}
//stuff I do after
}
In all there are 4 functions. 2 Main functions and 2 helper functions.
List<char> lstGetDecrName(List<char> lstVal)//entry point, returns decreased value
{
if (lstVal.Contains('-'))
{
return "-".ToList();
}
List<char> lstTmp = lstVal;
subCheckEmpty(ref lstTmp);
switch (lstTmp.Count)
{
case 0:
lstTmp.Add('-');
return lstTmp;
case 1:
if (lstTmp[0] == '-')
{
return lstTmp;
}
break;
case 2:
if (lstTmp[1] == '0')
{
if (lstTmp[0] == '1')
{
lstTmp.Clear();
lstTmp.Add('9');
return lstTmp;
}
if (lstTmp[0] == 'A')
{
lstTmp.Clear();
lstTmp.Add('-');
return lstTmp;
}
}
if (lstTmp[1] == 'A')
{
if (lstTmp[0] == 'A')
{
lstTmp.Clear();
lstTmp.Add('Z');
return lstTmp;
}
}
break;
}
List<char> lstValue = new List<char>();
switch (lstTmp.Last())
{
case 'A':
lstValue = lstGetDecrTemp('Z', lstTmp, lstVal);
break;
case 'a':
lstValue = lstGetDecrTemp('z', lstTmp, lstVal);
break;
case '0':
lstValue = lstGetDecrTemp('9', lstTmp, lstVal);
break;
default:
char tmp = (char)(lstTmp.Last() - 1);
lstTmp.RemoveAt(lstTmp.Count - 1);
lstTmp.Add(tmp);
subCheckEmpty(ref lstTmp);
lstValue = lstTmp;
break;
}
lstGetDecrSkipValue(lstValue);
return lstValue;
}
List<char> lstGetDecrSkipValue(List<char> lstValue)
{
bool blnSkip = false;
foreach (char tmpChar in lstValue)
{
if (lstChars.Contains(tmpChar))
{
blnSkip = true;
break;
}
}
if (blnSkip)
{
lstValue = lstGetDecrName(lstValue);
}
return lstValue;
}
void subCheckEmpty(ref List<char> lstTmp)
{
bool blnFirst = true;
int i = -1;
foreach (char tmpChar in lstTmp)
{
if (char.IsDigit(tmpChar) && blnFirst)
{
i = tmpChar == '0' ? lstTmp.IndexOf(tmpChar) : -1;
if (tmpChar == '0')
{
i = lstTmp.IndexOf(tmpChar);
}
blnFirst = false;
}
}
if (!blnFirst && i != -1)
{
lstTmp.RemoveAt(i);
subCheckEmpty(ref lstTmp);
}
}
List<char> lstGetDecrTemp(char chrTemp, List<char> lstTmp, List<char> lstVal)//shifting places eg unit to ten,etc.
{
if (lstTmp.Count == 1)
{
lstTmp.Clear();
lstTmp.Add('-');
return lstTmp;
}
lstTmp.RemoveAt(lstTmp.Count - 1);
lstVal = lstGetDecrName(lstTmp);
lstVal.Insert(lstVal.Count, chrTemp);
subCheckEmpty(ref lstVal);
return lstVal;
}

Replace occurance of character with all letters in the alphabet

I have created a scrabble game with a computer opponent. If a blank tile is found in the computer's rack during the word generation if needs to be swapped out for every letter in the alphabet. I have my current solution to solve this problem below, but was wondering if there is a better more efficient way to accomplish this task.
if (str.Contains("*"))
{
char c = 'A';
String made = "";
while(c < 'Z')
{
made = str.ReplaceFirst("*", c.ToString());
if (!made.Contains("*"))
{
wordsMade.Add(made);
if (theGame.theTrie.Search(made) == Trie.SearchResults.Found)
{
validWords.Add(made);
}
}
else
{
char ch = 'A';
String made2 = "";
while (ch < 'Z')
{
made2 = made.ReplaceFirst("*", c.ToString());
wordsMade.Add(made2);
if (theGame.theTrie.Search(made2) == Trie.SearchResults.Found)
{
validWords.Add(made2);
}
ch++;
}
}
c++;
}
Adam is right that the code could be refactored to make it notationally smaller (a lot smaller, in fact), but fundamentally, you have to examine all 26*26 combinations of wildcard characters. So while it is possible to make the code syntactically more efficient, I don't think you can make it algorithmically more efficient.
There's a lot of duplicated code here that can be refactored.
This routine is duplicated, and can be put into a separate method:
wordsMade.Add(made2);
if (theGame.theTrie.Search(made2) == Trie.SearchResults.Found)
{
validWords.Add(made2);
}
To something like this
void addWord(string newWordMade){
wordsMade.Add(newWordMade);
if (theGame.theTrie.Search(newWordMade) == Trie.SearchResults.Found)
{
validWords.Add(newWordMade);
}
}
This loop construct is also duplicated:
char ch = 'A';
String made2 = "";
while (ch < 'Z')
{
made2 = made.ReplaceFirst("*", c.ToString());
wordsMade.Add(made2);
if (theGame.theTrie.Search(made2) == Trie.SearchResults.Found)
{
validWords.Add(made2);
}
ch++;
}
Combining the previous refactor with this one, with a slick lambda, would yield something like this:
void loopCharactersAndDoThis(Action<char> DoThis) {
char ch = 'A';
while (ch < 'Z')
{
DoThis(ch);
ch++;
}
}
else
{
loopCharactersAndDoThis(ch => {
string made2 = made.ReplaceFirst("*", c.ToString());
addWord(made2);
});
}
Or even just:
else
{
loopCharactersAndDoThis(ch => addWord(made.ReplaceFirst("*", c.ToString())));
}

Categories

Resources