Superpower: match a string with tokenizer only if it begins a line - c#

When tokenizing in superpower, how to match a string only if it is the first thing in a line (note: this is a different question than this one) ?
For example, assume I have a language with only the following 4 characters (' ', ':', 'X', 'Y'), each of which is a token. There is also a 'Header' token to capture cases of the following regex pattern /^[XY]+:/ (any number of Xs and Ys followed by a colon, only if they start the line).
Here is a quick class for testing (the 4th test-case fails):
using System;
using Superpower;
using Superpower.Parsers;
using Superpower.Tokenizers;
public enum Tokens { Space, Colon, Header, X, Y }
public class XYTokenizer
{
static void Main(string[] args)
{
Test("X", Tokens.X);
Test("XY", Tokens.X, Tokens.Y);
Test("X Y:", Tokens.X, Tokens.Space, Tokens.Y, Tokens.Colon);
Test("X: X", Tokens.Header, Tokens.Space, Tokens.X);
}
public static readonly Tokenizer<Tokens> tokenizer = new TokenizerBuilder<Tokens>()
.Match(Character.EqualTo('X'), Tokens.X)
.Match(Character.EqualTo('Y'), Tokens.Y)
.Match(Character.EqualTo(':'), Tokens.Colon)
.Match(Character.EqualTo(' '), Tokens.Space)
.Build();
static void Test(string input, params Tokens[] expected)
{
var tokens = tokenizer.Tokenize(input);
var i = 0;
foreach (var t in tokens)
{
if (t.Kind != expected[i])
{
Console.WriteLine("tokens[" + i + "] was Tokens." + t.Kind
+ " not Tokens." + expected[i] + " for '" + input + "'");
return;
}
i++;
}
Console.WriteLine("OK");
}
}

I came up with a custom Tokenizer based on the example found here. I added comments throughout the code so you can follow what's happening.
public class MyTokenizer : Tokenizer<Tokens>
{
protected override IEnumerable<Result<Tokens>> Tokenize(TextSpan input)
{
Result<char> next = input.ConsumeChar();
bool checkForHeader = true;
while (next.HasValue)
{
// need to check for a header when starting a new line
if (checkForHeader)
{
var headerStartLocation = next.Location;
var tokenQueue = new List<Result<Tokens>>();
while (next.HasValue && (next.Value == 'X' || next.Value == 'Y'))
{
tokenQueue.Add(Result.Value(next.Value == 'X' ? Tokens.X : Tokens.Y, next.Location, next.Remainder));
next = next.Remainder.ConsumeChar();
}
// only if we had at least one X or one Y
if (tokenQueue.Any())
{
if (next.HasValue && next.Value == ':')
{
// this is a header token; we have to return a Result of the start location
// along with the remainder at this location
yield return Result.Value(Tokens.Header, headerStartLocation, next.Remainder);
next = next.Remainder.ConsumeChar();
}
else
{
// this isn't a header; we have to return all the tokens we parsed up to this point
foreach (Result<Tokens> tokenResult in tokenQueue)
{
yield return tokenResult;
}
}
}
if (!next.HasValue)
yield break;
}
checkForHeader = false;
if (next.Value == '\r')
{
// skip over the carriage return
next = next.Remainder.ConsumeChar();
continue;
}
if (next.Value == '\n')
{
// line break; check for a header token here
next = next.Remainder.ConsumeChar();
checkForHeader = true;
continue;
}
if (next.Value == 'A')
{
var abcStart = next.Location;
next = next.Remainder.ConsumeChar();
if (next.HasValue && next.Value == 'B')
{
next = next.Remainder.ConsumeChar();
if (next.HasValue && next.Value == 'C')
{
yield return Result.Value(Tokens.ABC, abcStart, next.Remainder);
next = next.Remainder.ConsumeChar();
}
else
{
yield return Result.Empty<Tokens>(next.Location, $"unrecognized `AB{next.Value}`");
}
}
else
{
yield return Result.Empty<Tokens>(next.Location, $"unrecognized `A{next.Value}`");
}
}
else if (next.Value == 'X')
{
yield return Result.Value(Tokens.X, next.Location, next.Remainder);
next = next.Remainder.ConsumeChar();
}
else if (next.Value == 'Y')
{
yield return Result.Value(Tokens.Y, next.Location, next.Remainder);
next = next.Remainder.ConsumeChar();
}
else if (next.Value == ':')
{
yield return Result.Value(Tokens.Colon, next.Location, next.Remainder);
next = next.Remainder.ConsumeChar();
}
else if (next.Value == ' ')
{
yield return Result.Value(Tokens.Space, next.Location, next.Remainder);
next = next.Remainder.ConsumeChar();
}
else
{
yield return Result.Empty<Tokens>(next.Location, $"unrecognized `{next.Value}`");
next = next.Remainder.ConsumeChar(); // Skip the character anyway
}
}
}
}
And you can call it like this:
var tokens = new MyTokenizer().Tokenize(input);

Related

Split a string if delimiter is between single quotes [duplicate]

This question already has answers here:
How to split csv whose columns may contain comma
(9 answers)
Closed 4 years ago.
I have the following comma-separated string that I need to split. The problem is that some of the content is within quotes and contains commas that shouldn't be used in the split.
String:
111,222,"33,44,55",666,"77,88","99"
I want the output:
111
222
33,44,55
666
77,88
99
I have tried this:
(?:,?)((?<=")[^"]+(?=")|[^",]+)
But it reads the comma between "77,88","99" as a hit and I get the following output:
111
222
33,44,55
666
77,88
,
99
Depending on your needs you may not be able to use a csv parser, and may in fact want to re-invent the wheel!!
You can do so with some simple regex
(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)
This will do the following:
(?:^|,) = Match expression "Beginning of line or string ,"
(\"(?:[^\"]+|\"\")*\"|[^,]*) = A numbered capture group, this will select between 2 alternatives:
stuff in quotes
stuff between commas
This should give you the output you are looking for.
Example code in C#
static Regex csvSplit = new Regex("(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)", RegexOptions.Compiled);
public static string[] SplitCSV(string input)
{
List<string> list = new List<string>();
string curr = null;
foreach (Match match in csvSplit.Matches(input))
{
curr = match.Value;
if (0 == curr.Length)
{
list.Add("");
}
list.Add(curr.TrimStart(','));
}
return list.ToArray();
}
private void button1_Click(object sender, RoutedEventArgs e)
{
Console.WriteLine(SplitCSV("111,222,\"33,44,55\",666,\"77,88\",\"99\""));
}
Warning As per #MrE's comment - if a rogue new line character appears in a badly formed csv file and you end up with an uneven ("string) you'll get catastrophic backtracking (https://www.regular-expressions.info/catastrophic.html) in your regex and your system will likely crash (like our production system did). Can easily be replicated in Visual Studio and as I've discovered will crash it. A simple try/catch will not trap this issue either.
You should use:
(?:^|,)(\"(?:[^\"])*\"|[^,]*)
instead
Fast and easy:
public static string[] SplitCsv(string line)
{
List<string> result = new List<string>();
StringBuilder currentStr = new StringBuilder("");
bool inQuotes = false;
for (int i = 0; i < line.Length; i++) // For each character
{
if (line[i] == '\"') // Quotes are closing or opening
inQuotes = !inQuotes;
else if (line[i] == ',') // Comma
{
if (!inQuotes) // If not in quotes, end of current string, add it to result
{
result.Add(currentStr.ToString());
currentStr.Clear();
}
else
currentStr.Append(line[i]); // If in quotes, just add it
}
else // Add any other character to current string
currentStr.Append(line[i]);
}
result.Add(currentStr.ToString());
return result.ToArray(); // Return array of all strings
}
With this string as input :
111,222,"33,44,55",666,"77,88","99"
It will return :
111
222
33,44,55
666
77,88
99
i really like jimplode's answer, but I think a version with yield return is a little bit more useful, so here it is:
public IEnumerable<string> SplitCSV(string input)
{
Regex csvSplit = new Regex("(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)", RegexOptions.Compiled);
foreach (Match match in csvSplit.Matches(input))
{
yield return match.Value.TrimStart(',');
}
}
Maybe it's even more useful to have it like an extension method:
public static class StringHelper
{
public static IEnumerable<string> SplitCSV(this string input)
{
Regex csvSplit = new Regex("(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)", RegexOptions.Compiled);
foreach (Match match in csvSplit.Matches(input))
{
yield return match.Value.TrimStart(',');
}
}
}
This regular expression works without the need to loop through values and TrimStart(','), like in the accepted answer:
((?<=\")[^\"]*(?=\"(,|$)+)|(?<=,|^)[^,\"]*(?=,|$))
Here is the implementation in C#:
string values = "111,222,\"33,44,55\",666,\"77,88\",\"99\"";
MatchCollection matches = new Regex("((?<=\")[^\"]*(?=\"(,|$)+)|(?<=,|^)[^,\"]*(?=,|$))").Matches(values);
foreach (var match in matches)
{
Console.WriteLine(match);
}
Outputs
111
222
33,44,55
666
77,88
99
None of these answers work when the string has a comma inside quotes, as in "value, 1", or escaped double-quotes, as in "value ""1""", which are valid CSV that should be parsed as value, 1 and value "1", respectively.
This will also work with the tab-delimited format if you pass in a tab instead of a comma as your delimiter.
public static IEnumerable<string> SplitRow(string row, char delimiter = ',')
{
var currentString = new StringBuilder();
var inQuotes = false;
var quoteIsEscaped = false; //Store when a quote has been escaped.
row = string.Format("{0}{1}", row, delimiter); //We add new cells at the delimiter, so append one for the parser.
foreach (var character in row.Select((val, index) => new {val, index}))
{
if (character.val == delimiter) //We hit a delimiter character...
{
if (!inQuotes) //Are we inside quotes? If not, we've hit the end of a cell value.
{
Console.WriteLine(currentString);
yield return currentString.ToString();
currentString.Clear();
}
else
{
currentString.Append(character.val);
}
} else {
if (character.val != ' ')
{
if(character.val == '"') //If we've hit a quote character...
{
if(character.val == '\"' && inQuotes) //Does it appear to be a closing quote?
{
if (row[character.index + 1] == character.val) //If the character afterwards is also a quote, this is to escape that (not a closing quote).
{
quoteIsEscaped = true; //Flag that we are escaped for the next character. Don't add the escaping quote.
}
else if (quoteIsEscaped)
{
quoteIsEscaped = false; //This is an escaped quote. Add it and revert quoteIsEscaped to false.
currentString.Append(character.val);
}
else
{
inQuotes = false;
}
}
else
{
if (!inQuotes)
{
inQuotes = true;
}
else
{
currentString.Append(character.val); //...It's a quote inside a quote.
}
}
}
else
{
currentString.Append(character.val);
}
}
else
{
if (!string.IsNullOrWhiteSpace(currentString.ToString())) //Append only if not new cell
{
currentString.Append(character.val);
}
}
}
}
}
With minor updates to the function provided by "Chad Hedgcock".
Updates are on:
Line 26: character.val == '\"' - This can never be true due to the check made on Line 24. i.e. character.val == '"'
Line 28: if (row[character.index + 1] == character.val) added !quoteIsEscaped to escape 3 consecutive quotes.
public static IEnumerable<string> SplitRow(string row, char delimiter = ',')
{
var currentString = new StringBuilder();
var inQuotes = false;
var quoteIsEscaped = false; //Store when a quote has been escaped.
row = string.Format("{0}{1}", row, delimiter); //We add new cells at the delimiter, so append one for the parser.
foreach (var character in row.Select((val, index) => new {val, index}))
{
if (character.val == delimiter) //We hit a delimiter character...
{
if (!inQuotes) //Are we inside quotes? If not, we've hit the end of a cell value.
{
//Console.WriteLine(currentString);
yield return currentString.ToString();
currentString.Clear();
}
else
{
currentString.Append(character.val);
}
} else {
if (character.val != ' ')
{
if(character.val == '"') //If we've hit a quote character...
{
if(character.val == '"' && inQuotes) //Does it appear to be a closing quote?
{
if (row[character.index + 1] == character.val && !quoteIsEscaped) //If the character afterwards is also a quote, this is to escape that (not a closing quote).
{
quoteIsEscaped = true; //Flag that we are escaped for the next character. Don't add the escaping quote.
}
else if (quoteIsEscaped)
{
quoteIsEscaped = false; //This is an escaped quote. Add it and revert quoteIsEscaped to false.
currentString.Append(character.val);
}
else
{
inQuotes = false;
}
}
else
{
if (!inQuotes)
{
inQuotes = true;
}
else
{
currentString.Append(character.val); //...It's a quote inside a quote.
}
}
}
else
{
currentString.Append(character.val);
}
}
else
{
if (!string.IsNullOrWhiteSpace(currentString.ToString())) //Append only if not new cell
{
currentString.Append(character.val);
}
}
}
}
}
For Jay's answer, if you use a 2nd boolean then you can have nested double-quotes inside single-quotes and vice-versa.
private string[] splitString(string stringToSplit)
{
char[] characters = stringToSplit.ToCharArray();
List<string> returnValueList = new List<string>();
string tempString = "";
bool blockUntilEndQuote = false;
bool blockUntilEndQuote2 = false;
int characterCount = 0;
foreach (char character in characters)
{
characterCount = characterCount + 1;
if (character == '"' && !blockUntilEndQuote2)
{
if (blockUntilEndQuote == false)
{
blockUntilEndQuote = true;
}
else if (blockUntilEndQuote == true)
{
blockUntilEndQuote = false;
}
}
if (character == '\'' && !blockUntilEndQuote)
{
if (blockUntilEndQuote2 == false)
{
blockUntilEndQuote2 = true;
}
else if (blockUntilEndQuote2 == true)
{
blockUntilEndQuote2 = false;
}
}
if (character != ',')
{
tempString = tempString + character;
}
else if (character == ',' && (blockUntilEndQuote == true || blockUntilEndQuote2 == true))
{
tempString = tempString + character;
}
else
{
returnValueList.Add(tempString);
tempString = "";
}
if (characterCount == characters.Length)
{
returnValueList.Add(tempString);
tempString = "";
}
}
string[] returnValue = returnValueList.ToArray();
return returnValue;
}
The original version
Currently I use the following regex:
public static Regex regexCSVSplit = new Regex(#"(?x:(
(?<FULL>
(^|[,;\t\r\n])\s*
( (?<QUODAT> (?<QUO>[""'])(?<DAT>([^,;\t\r\n]|(?<!\k<QUO>\s*)[,;\t\r\n])*)\k<QUO>) |
(?<QUODAT> (?<DAT> [^""',;\s\r\n]* )) )
(?=\s*([,;\t\r\n]|$))
) |
(?<FULL>
(^|[\s\t\r\n])
( (?<QUODAT> (?<QUO>[""'])(?<DAT> [^""',;\s\t\r\n]* )\k<QUO>) |
(?<QUODAT> (?<DAT> [^""',;\s\t\r\n]* )) )
(?=[,;\s\t\r\n]|$)
)
))", RegexOptions.Compiled);
This solution can handle pretty chaotic cases too like below:
This is how to feed the result into an array:
var data = regexCSVSplit.Matches(line_to_process).Cast<Match>().
Select(x => x.Groups["DAT"].Value).ToArray();
See this example in action HERE
Note: The regular expression contains two set of <FULL> block and each of them contains two <QUODAT> block separated by "or" (|). Depending on your task you may only need one of them.
Note: That this regular expression gives us one string array, and works on single line with or without <carrier return> and/or <line feed>.
Simplified version
The following regular expression will already cover many complex cases:
public static Regex regexCSVSplit = new Regex(#"(?x:(
(?<FULL>
(^|[,;\t\r\n])\s*
(?<QUODAT> (?<QUO>[""'])(?<DAT>([^,;\t\r\n]|(?<!\k<QUO>\s*)[,;\t\r\n])*)\k<QUO>)
(?=\s*([,;\t\r\n]|$))
)
))", RegexOptions.Compiled);
See this example in action: HERE
It can process complex, easy and empty items too:
This is how to feed the result into an array:
var data = regexCSVSplit.Matches(line_to_process).Cast<Match>().
Select(x => x.Groups["DAT"].Value).ToArray();
The main rule here is that every item may contain anything but the <quotation mark><separators><comma> sequence AND each item shall being and end with the same <quotation mark>.
<quotation mark>: <">, <'>
<comma>: <,>, <;>, <tab>, <carrier return>, <line feed>
Edit notes: I added some more explanation to make it easier to understand and replaces the text "CO" with "QUO".
Try this:
string s = #"111,222,""33,44,55"",666,""77,88"",""99""";
List<string> result = new List<string>();
var splitted = s.Split('"').ToList<string>();
splitted.RemoveAll(x => x == ",");
foreach (var it in splitted)
{
if (it.StartsWith(",") || it.EndsWith(","))
{
var tmp = it.TrimEnd(',').TrimStart(',');
result.AddRange(tmp.Split(','));
}
else
{
if(!string.IsNullOrEmpty(it)) result.Add(it);
}
}
//Results:
foreach (var it in result)
{
Console.WriteLine(it);
}
I know I'm a bit late to this, but for searches, here is how I did what you are asking about in C sharp
private string[] splitString(string stringToSplit)
{
char[] characters = stringToSplit.ToCharArray();
List<string> returnValueList = new List<string>();
string tempString = "";
bool blockUntilEndQuote = false;
int characterCount = 0;
foreach (char character in characters)
{
characterCount = characterCount + 1;
if (character == '"')
{
if (blockUntilEndQuote == false)
{
blockUntilEndQuote = true;
}
else if (blockUntilEndQuote == true)
{
blockUntilEndQuote = false;
}
}
if (character != ',')
{
tempString = tempString + character;
}
else if (character == ',' && blockUntilEndQuote == true)
{
tempString = tempString + character;
}
else
{
returnValueList.Add(tempString);
tempString = "";
}
if (characterCount == characters.Length)
{
returnValueList.Add(tempString);
tempString = "";
}
}
string[] returnValue = returnValueList.ToArray();
return returnValue;
}
Don't reinvent a CSV parser, try FileHelpers.
I needed something a little more robust, so I took from here and created this... This solution is a little less elegant and a little more verbose, but in my testing (with a 1,000,000 row sample), I found this to be 2 to 3 times faster. Plus it handles non-escaped, embedded quotes. I used string delimiter and qualifiers instead of chars because of the requirements of my solution. I found it more difficult than I expected to find a good, generic CSV parser so I hope this parsing algorithm can help someone.
public static string[] SplitRow(string record, string delimiter, string qualifier, bool trimData)
{
// In-Line for example, but I implemented as string extender in production code
Func <string, int, int> IndexOfNextNonWhiteSpaceChar = delegate (string source, int startIndex)
{
if (startIndex >= 0)
{
if (source != null)
{
for (int i = startIndex; i < source.Length; i++)
{
if (!char.IsWhiteSpace(source[i]))
{
return i;
}
}
}
}
return -1;
};
var results = new List<string>();
var result = new StringBuilder();
var inQualifier = false;
var inField = false;
// We add new columns at the delimiter, so append one for the parser.
var row = $"{record}{delimiter}";
for (var idx = 0; idx < row.Length; idx++)
{
// A delimiter character...
if (row[idx]== delimiter[0])
{
// Are we inside qualifier? If not, we've hit the end of a column value.
if (!inQualifier)
{
results.Add(trimData ? result.ToString().Trim() : result.ToString());
result.Clear();
inField = false;
}
else
{
result.Append(row[idx]);
}
}
// NOT a delimiter character...
else
{
// ...Not a space character
if (row[idx] != ' ')
{
// A qualifier character...
if (row[idx] == qualifier[0])
{
// Qualifier is closing qualifier...
if (inQualifier && row[IndexOfNextNonWhiteSpaceChar(row, idx + 1)] == delimiter[0])
{
inQualifier = false;
continue;
}
else
{
// ...Qualifier is opening qualifier
if (!inQualifier)
{
inQualifier = true;
}
// ...It's a qualifier inside a qualifier.
else
{
inField = true;
result.Append(row[idx]);
}
}
}
// Not a qualifier character...
else
{
result.Append(row[idx]);
inField = true;
}
}
// ...A space character
else
{
if (inQualifier || inField)
{
result.Append(row[idx]);
}
}
}
}
return results.ToArray<string>();
}
Some test code:
//var input = "111,222,\"33,44,55\",666,\"77,88\",\"99\"";
var input =
"111, 222, \"99\",\"33,44,55\" , \"666 \"mark of a man\"\", \" spaces \"77,88\" \"";
Console.WriteLine("Split with trim");
Console.WriteLine("---------------");
var result = SplitRow(input, ",", "\"", true);
foreach (var r in result)
{
Console.WriteLine(r);
}
Console.WriteLine("");
// Split 2
Console.WriteLine("Split with no trim");
Console.WriteLine("------------------");
var result2 = SplitRow(input, ",", "\"", false);
foreach (var r in result2)
{
Console.WriteLine(r);
}
Console.WriteLine("");
// Time Trial 1
Console.WriteLine("Experimental Process (1,000,000) iterations");
Console.WriteLine("-------------------------------------------");
watch = Stopwatch.StartNew();
for (var i = 0; i < 1000000; i++)
{
var x1 = SplitRow(input, ",", "\"", false);
}
watch.Stop();
elapsedMs = watch.ElapsedMilliseconds;
Console.WriteLine($"Total Process Time: {string.Format("{0:0.###}", elapsedMs / 1000.0)} Seconds");
Console.WriteLine("");
Results
Split with trim
---------------
111
222
99
33,44,55
666 "mark of a man"
spaces "77,88"
Split with no trim
------------------
111
222
99
33,44,55
666 "mark of a man"
spaces "77,88"
Original Process (1,000,000) iterations
-------------------------------
Total Process Time: 7.538 Seconds
Experimental Process (1,000,000) iterations
--------------------------------------------
Total Process Time: 3.363 Seconds
I once had to do something similar and in the end I got stuck with Regular Expressions. The inability for Regex to have state makes it pretty tricky - I just ended up writing a simple little parser.
If you're doing CSV parsing you should just stick to using a CSV parser - don't reinvent the wheel.
Here is my fastest implementation based upon string raw pointer manipulation:
string[] FastSplit(string sText, char? cSeparator = null, char? cQuotes = null)
{
string[] oTokens;
if (null == cSeparator)
{
cSeparator = DEFAULT_PARSEFIELDS_SEPARATOR;
}
if (null == cQuotes)
{
cQuotes = DEFAULT_PARSEFIELDS_QUOTE;
}
unsafe
{
fixed (char* lpText = sText)
{
#region Fast array estimatation
char* lpCurrent = lpText;
int nEstimatedSize = 0;
while (0 != *lpCurrent)
{
if (cSeparator == *lpCurrent)
{
nEstimatedSize++;
}
lpCurrent++;
}
nEstimatedSize++; // Add EOL char(s)
string[] oEstimatedTokens = new string[nEstimatedSize];
#endregion
#region Parsing
char[] oBuffer = new char[sText.Length];
int nIndex = 0;
int nTokens = 0;
lpCurrent = lpText;
while (0 != *lpCurrent)
{
if (cQuotes == *lpCurrent)
{
// Quotes parsing
lpCurrent++; // Skip quote
nIndex = 0; // Reset buffer
while (
(0 != *lpCurrent)
&& (cQuotes != *lpCurrent)
)
{
oBuffer[nIndex] = *lpCurrent; // Store char
lpCurrent++; // Move source cursor
nIndex++; // Move target cursor
}
}
else if (cSeparator == *lpCurrent)
{
// Separator char parsing
oEstimatedTokens[nTokens++] = new string(oBuffer, 0, nIndex); // Store token
nIndex = 0; // Skip separator and Reset buffer
}
else
{
// Content parsing
oBuffer[nIndex] = *lpCurrent; // Store char
nIndex++; // Move target cursor
}
lpCurrent++; // Move source cursor
}
// Recover pending buffer
if (nIndex > 0)
{
// Store token
oEstimatedTokens[nTokens++] = new string(oBuffer, 0, nIndex);
}
// Build final tokens list
if (nTokens == nEstimatedSize)
{
oTokens = oEstimatedTokens;
}
else
{
oTokens = new string[nTokens];
Array.Copy(oEstimatedTokens, 0, oTokens, 0, nTokens);
}
#endregion
}
}
// Epilogue
return oTokens;
}
Try this
private string[] GetCommaSeperatedWords(string sep, string line)
{
List<string> list = new List<string>();
StringBuilder word = new StringBuilder();
int doubleQuoteCount = 0;
for (int i = 0; i < line.Length; i++)
{
string chr = line[i].ToString();
if (chr == "\"")
{
if (doubleQuoteCount == 0)
doubleQuoteCount++;
else
doubleQuoteCount--;
continue;
}
if (chr == sep && doubleQuoteCount == 0)
{
list.Add(word.ToString());
word = new StringBuilder();
continue;
}
word.Append(chr);
}
list.Add(word.ToString());
return list.ToArray();
}
This is Chad's answer rewritten with state based logic. His answered failed for me when it came across """BRAD""" as a field. That should return "BRAD" but it just ate up all the remaining fields. When I tried to debug it I just ended up rewriting it as state based logic:
enum SplitState { s_begin, s_infield, s_inquotefield, s_foundquoteinfield };
public static IEnumerable<string> SplitRow(string row, char delimiter = ',')
{
var currentString = new StringBuilder();
SplitState state = SplitState.s_begin;
row = string.Format("{0}{1}", row, delimiter); //We add new cells at the delimiter, so append one for the parser.
foreach (var character in row.Select((val, index) => new { val, index }))
{
//Console.WriteLine("character = " + character.val + " state = " + state);
switch (state)
{
case SplitState.s_begin:
if (character.val == delimiter)
{
/* empty field */
yield return currentString.ToString();
currentString.Clear();
} else if (character.val == '"')
{
state = SplitState.s_inquotefield;
} else
{
currentString.Append(character.val);
state = SplitState.s_infield;
}
break;
case SplitState.s_infield:
if (character.val == delimiter)
{
/* field with data */
yield return currentString.ToString();
state = SplitState.s_begin;
currentString.Clear();
} else
{
currentString.Append(character.val);
}
break;
case SplitState.s_inquotefield:
if (character.val == '"')
{
// could be end of field, or escaped quote.
state = SplitState.s_foundquoteinfield;
} else
{
currentString.Append(character.val);
}
break;
case SplitState.s_foundquoteinfield:
if (character.val == '"')
{
// found escaped quote.
currentString.Append(character.val);
state = SplitState.s_inquotefield;
}
else if (character.val == delimiter)
{
// must have been last quote so we must find delimiter
yield return currentString.ToString();
state = SplitState.s_begin;
currentString.Clear();
}
else
{
throw new Exception("Quoted field not terminated.");
}
break;
default:
throw new Exception("unknown state:" + state);
}
}
//Console.WriteLine("currentstring = " + currentString.ToString());
}
This is a lot more lines of code than the other solutions, but it is easy to modify to add edge cases.

Split string in square brackets from Google translator

I am receiving a data from a Google Language Translator service and need help splitting the data.
void Start()
{
translateText("Hello, This is a test!", "en", "fr");
}
void translateText(string text, string fromLanguage, string toLanguage)
{
string url = "https://translate.googleapis.com/translate_a/single?client=gtx&sl=" + fromLanguage + "&tl=" + toLanguage + "&dt=t&q=" + Uri.EscapeUriString(text);
StartCoroutine(startTranslator(url));
}
IEnumerator startTranslator(string url)
{
UnityWebRequest www = UnityWebRequest.Get(url);
yield return www.Send();
Debug.Log("Raw string Received: " + www.downloadHandler.text);
LanguageResult tempResult = decodeResult(www.downloadHandler.text);
Debug.Log("Original Text: " + tempResult.originalText);
Debug.Log("Translated Text: " + tempResult.translatedText);
Debug.Log("LanguageIso: " + tempResult.languageIso);
yield return null;
}
LanguageResult decodeResult(string result)
{
char[] delims = { '[', '\"', ']', ',' };
string[] arr = result.Split(delims, StringSplitOptions.RemoveEmptyEntries);
LanguageResult tempLang = null;
if (arr.Length >= 4)
{
tempLang = new LanguageResult();
tempLang.translatedText = arr[0];
tempLang.originalText = arr[1];
tempLang.unknowValue = arr[2];
tempLang.languageIso = arr[3];
}
return tempLang;
}
public class LanguageResult
{
public string translatedText;
public string originalText;
public string unknowValue;
public string languageIso;
}
then calling it with translateText("Hello, This is a test!", "en", "fr"); from the Start() function which converts the English sentence to French with ISO 639-1 Code.
The received data looks like this:
[[["Bonjour, Ceci est un test!","Hello, This is a test!",,,0]],,"en"]
I want to split it like this:
Bonjour, Ceci est un test!
Hello, This is a test!
0
en
and put them into a string array in order.
I currently use this:
char[] delims = { '[', '\"', ']', ',' };
string[] arr = result.Split(delims, StringSplitOptions.RemoveEmptyEntries);
This works if there is no comma in the received string. If there is a comma, the splitted values are messed up. What's the best way of splitting this?
EDIT:
With Blorgbeard's solution, the final working code is as below. Hopefully, this will help somebody else. This shouldn't be used for commercial purposes but for personal or school project.
void Start()
{
//translateText("Hello, This is \" / \\ a test !", "en", "fr");
//translateText("Hello, This is , \\ \" a test !", "en", "fr");
translateText("Hello, This is a test!", "en", "fr");
}
void translateText(string text, string fromLanguage, string toLanguage)
{
string url = "https://translate.googleapis.com/translate_a/single?client=gtx&sl=" + fromLanguage + "&tl=" + toLanguage + "&dt=t&q=" + Uri.EscapeUriString(text);
StartCoroutine(startTranslator(url));
}
IEnumerator startTranslator(string url)
{
UnityWebRequest www = UnityWebRequest.Get(url);
yield return www.Send();
Debug.Log("Raw string Received: " + www.downloadHandler.text);
LanguageResult tempResult = decodeResult(www.downloadHandler.text);
displayResult(tempResult);
yield return null;
}
void displayResult(LanguageResult translationResult)
{
Debug.Log("Original Text: " + translationResult.originalText);
Debug.Log("Translated Text: " + translationResult.translatedText);
Debug.Log("LanguageIso: " + translationResult.languageIso);
}
LanguageResult decodeResult(string result)
{
string[] arr = Decode(result);
LanguageResult tempLang = null;
if (arr.Length >= 4)
{
tempLang = new LanguageResult();
tempLang.translatedText = arr[0];
tempLang.originalText = arr[1];
tempLang.unknowValue = arr[2];
tempLang.languageIso = arr[3];
}
return tempLang;
}
public class LanguageResult
{
public string translatedText;
public string originalText;
public string unknowValue;
public string languageIso;
}
private string[] Decode(string input)
{
List<string> finalResult = new List<string>();
bool inToken = false;
bool inString = false;
bool escaped = false;
var seps = ",[]\"".ToArray();
var current = "";
foreach (var chr in input)
{
if (!inString && chr == '"')
{
current = "";
inString = true;
continue;
}
if (inString && !escaped && chr == '"')
{
finalResult.Add(current);
current = "";
inString = false;
continue;
}
if (inString && !escaped && chr == '\\')
{
escaped = true;
continue;
}
if (inString && (chr != '"' || escaped))
{
escaped = false;
current += chr;
continue;
}
if (inToken && seps.Contains(chr))
{
finalResult.Add(current);
current = "";
inToken = false;
continue;
}
if (!inString && chr == '"')
{
inString = true;
current = "";
continue;
}
if (!inToken && !seps.Contains(chr))
{
inToken = true;
current = "";
}
current += chr;
}
return finalResult.ToArray();
}
You could code up a simple parser yourself. Here's one I threw together (could use some cleaning up, but demonstrates the idea):
private static IEnumerable<string> Parse(string input) {
bool inToken = false;
bool inString = false;
bool escaped = false;
var seps = ",[]\"".ToArray();
var current = "";
foreach (var chr in input) {
if (!inString && chr == '"') {
current = "";
inString = true;
continue;
}
if (inString && !escaped && chr == '"') {
yield return current;
current = "";
inString = false;
continue;
}
if (inString && !escaped && chr == '\\') {
escaped = true;
continue;
}
if (inString && (chr != '"' || escaped)) {
escaped = false;
current += chr;
continue;
}
if (inToken && seps.Contains(chr)) {
yield return current;
current = "";
inToken = false;
continue;
}
if (!inString && chr == '"') {
inString = true;
current = "";
continue;
}
if (!inToken && !seps.Contains(chr)) {
inToken = true;
current = "";
}
current += chr;
}
}
Here's a jsfiddle demo.
Using Regex.Split you could do something like this for example:
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
var input ="[[[\"Bonjour, Ceci est un test!\",\"Hello, This is a test!\",,,0]],,\"en\"]";
var parse = Regex.Split(input, "\\[|\\]|[^a-zA-Z ],|\",\"|\"|\"");
foreach(var item in parse) {
bool result = !String.IsNullOrEmpty(item) && (Char.IsLetter(item[0]) || Char.IsDigit(item[0]));
if (result) {
Console.WriteLine(item);
}
}
}
}
Output:
Bonjour, Ceci est un test!
Hello, This is a test!
0
en
If you want everything that was split you can simply remove the bool check for alphacharacters.
Here is a crazy idea - split by " and then by the rest (but won't work if there is " between the "'s)
var s = #"[[[""Bonjour, Ceci est un test!"",""Hello, This is a test!"",,,0]],,""en""]";
var a = s.Split('"').Select((x, i) => (i & 1) > 0 ? new[] { x } : x.Split("[],".ToArray(),
StringSplitOptions.RemoveEmptyEntries)).SelectMany(x => x).ToArray();
Debug.Print(string.Join("|", a)); // "Bonjour, Ceci est un test!|Hello, This is a test!|0|en"
You can try regex for splitting. I tested with the sample you provided. It results like this.
var str="[[[\"Bonjour, Ceci est un test!\",\"Hello, This is a test!\",,,0]],,\"en\"]";
var splitted=Regex.Split(str,#"\[|\]|\,");
foreach(var split in splitted){
Console.WriteLine(split );
}
"Bonjour Ceci est un test!"
"Hello This is a test!"
0
"en"

ASP.NET: Implementing ISessionIDManager for cookieless sessions?

Question:
I'm writing a custom session provider.
So far it works excellently.
I decided I wanted to add a customized ISessionIDManager, to control the session id.
It already works fine for cookie sessions.
But when I swich to cookieless, like this:
<sessionState mode="Custom" customProvider="custom_provider" cookieless="true" timeout="1"
sessionIDManagerType="Samples.AspNet.Session.MySessionIDManager"
sqlConnectionString="Data Source=localhost;Initial Catalog=TestDB;User Id=SomeUser;Password=SomePassword;"
sqlCommandTimeout="10"
>
<!-- timeout in minutes-->
<providers>
<add name="custom_provider" type="Test.WebSession.CustomSessionStoreProvider" />
</providers>
</sessionState>
Then it redirects to:
http://localhost:52897/(77bb065f-d2e9-4cfc-8117-8b89a40e00d8)/default.aspx
and this throws HTTP 404.
I understand why, as there is no such folder.
But when you use the default session manager (the one that ships with asp.net), and switch to cookieless, the URL looks like this:
http://localhost:52897/(S(sq2abm453wnasg45pvboee45))/DisplaySessionValues.aspx
and there is no HTTP 404...
I tried adding the (S and ) to my session-id in brackets in the url, but that didn't help.
What am I missing ?
using System;
using System.Configuration;
using System.Web.Configuration;
using System.Web;
using System.Web.SessionState;
// http://allantech.blogspot.com/2011/04/cookieless-session-state-in-aspnet.html
// http://forums.asp.net/t/1082784.aspx/1
// http://stackoverflow.com/questions/4612310/implementing-a-custom-sessionidmanager
// http://msdn.microsoft.com/en-us/library/system.web.sessionstate.isessionidmanager.aspx
// http://msdn.microsoft.com/en-us/library/system.web.sessionstate.isessionidmanager(v=vs.80).aspx
namespace Samples.AspNet.Session
{
// Samples.AspNet.Session.MySessionIDManager
public class MySessionIDManager : IHttpModule, ISessionIDManager
{
protected SessionStateSection pConfig = null;
internal const string HeaderName = "AspFilterSessionId";
protected void InitializeModule()
{
// Obtain session-state configuration settings.
if (pConfig == null)
{
Configuration cfg =
WebConfigurationManager.OpenWebConfiguration(System.Web.Hosting.HostingEnvironment.ApplicationVirtualPath);
pConfig = (SessionStateSection)cfg.GetSection("system.web/sessionState");
} // End if (pConfig == null)
}
//
// IHttpModule Members
//
//
// IHttpModule.Init
//
public void Init(HttpApplication app)
{
//InitializeModule();
} // End Sub Init
//
// IHttpModule.Dispose
//
public void Dispose()
{
} // End Sub Dispose
//
// ISessionIDManager Members
//
//
// ISessionIDManager.Initialize
//
public void Initialize()
{
InitializeModule();
} // End Sub Initialize
//
// ISessionIDManager.InitializeRequest
//
public bool InitializeRequest(
HttpContext context,
bool suppressAutoDetectRedirect,
out bool supportSessionIDReissue
)
{
if (pConfig.Cookieless == HttpCookieMode.UseCookies)
{
supportSessionIDReissue = false;
return false;
}
else
{
supportSessionIDReissue = true;
return context.Response.IsRequestBeingRedirected;
}
} // End Function InitializeRequest
//
// ISessionIDManager.GetSessionID
//
public string GetSessionID(HttpContext context)
{
string id = null;
if (pConfig.Cookieless == HttpCookieMode.UseUri)
{
string tmp = context.Request.Headers[HeaderName];
if (tmp != null)
id = HttpUtility.UrlDecode(id);
// Retrieve the SessionID from the URI.
}
else
{
if (context.Request.Cookies.Count > 0)
{
id = context.Request.Cookies[pConfig.CookieName].Value;
id = HttpUtility.UrlDecode(id);
}
}
// Verify that the retrieved SessionID is valid. If not, return null.
if (!Validate(id))
id = null;
return id;
} // End Function GetSessionID
//
// ISessionIDManager.CreateSessionID
//
public string CreateSessionID(HttpContext context)
{
return System.Guid.NewGuid().ToString();
} // End Function CreateSessionID
//
// ISessionIDManager.RemoveSessionID
//
public void RemoveSessionID(HttpContext context)
{
context.Response.Cookies.Remove(pConfig.CookieName);
} // End Sub RemoveSessionID
public static string InsertSessionId(string id, string path)
{
string dir = GetDirectory(path);
if (!dir.EndsWith("/"))
dir += "/";
string appvpath = HttpRuntime.AppDomainAppVirtualPath;
if (!appvpath.EndsWith("/"))
appvpath += "/";
if (path.StartsWith(appvpath))
path = path.Substring(appvpath.Length);
if (path[0] == '/')
path = path.Length > 1 ? path.Substring(1) : "";
// //http://localhost:52897/(S(sq2abm453wnasg45pvboee45))/DisplaySessionValues.aspx
return Canonic(appvpath + "(" + id + ")/" + path);
//return Canonic(appvpath + "(S(" + id + "))/" + path);
}
public static bool IsRooted(string path)
{
if (path == null || path.Length == 0)
return true;
char c = path[0];
if (c == '/' || c == '\\')
return true;
return false;
}
public static string Canonic(string path)
{
char[] path_sep = { '\\', '/' };
bool isRooted = IsRooted(path);
bool endsWithSlash = path.EndsWith("/");
string[] parts = path.Split(path_sep);
int end = parts.Length;
int dest = 0;
for (int i = 0; i < end; i++)
{
string current = parts[i];
if (current.Length == 0)
continue;
if (current == ".")
continue;
if (current == "..")
{
dest--;
continue;
}
if (dest < 0)
if (!isRooted)
throw new HttpException("Invalid path.");
else
dest = 0;
parts[dest++] = current;
}
if (dest < 0)
throw new HttpException("Invalid path.");
if (dest == 0)
return "/";
string str = String.Join("/", parts, 0, dest);
str = RemoveDoubleSlashes(str);
if (isRooted)
str = "/" + str;
if (endsWithSlash)
str = str + "/";
return str;
}
public static string GetDirectory(string url)
{
url = url.Replace('\\', '/');
int last = url.LastIndexOf('/');
if (last > 0)
{
if (last < url.Length)
last++;
return RemoveDoubleSlashes(url.Substring(0, last));
}
return "/";
}
public static string RemoveDoubleSlashes (string input)
{
// MS VirtualPathUtility removes duplicate '/'
int index = -1;
for (int i = 1; i < input.Length; i++)
if (input [i] == '/' && input [i - 1] == '/') {
index = i - 1;
break;
}
if (index == -1) // common case optimization
return input;
System.Text.StringBuilder sb = new System.Text.StringBuilder(input.Length);
sb.Append (input, 0, index);
for (int i = index; i < input.Length; i++) {
if (input [i] == '/') {
int next = i + 1;
if (next < input.Length && input [next] == '/')
continue;
sb.Append ('/');
}
else {
sb.Append (input [i]);
}
}
return sb.ToString ();
}
// http://www.dotnetfunda.com/articles/article1531-how-to-add-custom-headers-into-readonly-httprequest-object-using-httpmodule-.aspx
public void SetHeader(string strHeaderName, string strValue)
{
//get a reference
System.Collections.Specialized.NameValueCollection headers = HttpContext.Current.Request.Headers;
//get a type
Type t = headers.GetType();
//get the property
System.Reflection.PropertyInfo prop = t.GetProperty(
"IsReadOnly",
System.Reflection.BindingFlags.Instance
| System.Reflection.BindingFlags.IgnoreCase
| System.Reflection.BindingFlags.NonPublic
| System.Reflection.BindingFlags.FlattenHierarchy
| System.Reflection.BindingFlags.NonPublic
| System.Reflection.BindingFlags.Public
| System.Reflection.BindingFlags.FlattenHierarchy
);
//unset readonly
prop.SetValue(headers, false, null); // Set Read-Only to false
//add a header
//HttpContext.Current.Request.Headers.Add(strHeaderName, strValue);
//headers.Add(strHeaderName, strValue);
t.InvokeMember("BaseAdd",
System.Reflection.BindingFlags.InvokeMethod
| System.Reflection.BindingFlags.NonPublic
| System.Reflection.BindingFlags.Instance,
null,
headers,
new object[] { strHeaderName, new System.Collections.ArrayList { strValue } }
);
prop.SetValue(headers, true, null); // Reset Read-Only to true
// Victory !
//string strCheckHeaders = string.Join(Environment.NewLine, HttpContext.Current.Request.Headers.AllKeys);
}
//
// ISessionIDManager.SaveSessionID
//
public void SaveSessionID(HttpContext context, string id, out bool redirected, out bool cookieAdded)
{
if (!Validate(id))
throw new HttpException("Invalid session ID");
Type t = base.GetType();
redirected = false;
cookieAdded = false;
if (pConfig.Cookieless == HttpCookieMode.UseUri)
{
// Add the SessionID to the URI. Set the redirected variable as appropriate.
//context.Request.Headers.Add(HeaderName, id);
//context.Request.Headers.Set(HeaderName, id);
SetHeader(HeaderName, id);
cookieAdded = false;
redirected = true;
UriBuilder newUri = new UriBuilder(context.Request.Url);
newUri.Path = InsertSessionId(id, context.Request.FilePath);
//http://localhost:52897/(S(sq2abm453wnasg45pvboee45))/DisplaySessionValues.aspx
context.Response.Redirect(newUri.Uri.PathAndQuery, false);
context.ApplicationInstance.CompleteRequest(); // Important !
return;
}
else
{
context.Response.Cookies.Add(new HttpCookie(pConfig.CookieName, id));
cookieAdded = true;
}
} // End Sub SaveSessionID
//
// ISessionIDManager.Validate
//
public bool Validate(string id)
{
try
{
Guid testGuid = new Guid(id);
if (id == testGuid.ToString())
return true;
}
catch
{
}
return false;
} // End Function Validate
} // End Class MySessionIDManager : IHttpModule, ISessionIDManager
} // End Namespace Samples.AspNet.Session
Creating a custom session id manager from scratch seems like a lot of work. What about inheriting from System.Web.SessionState.SessionIDManager class and overriding the CreateSessionID method?
public class MySessionIDManager : SessionIDManager, ISessionIDManager
{
public override string CreateSessionID(HttpContext context)
{
return System.Guid.NewGuid().ToString("N");
}
}
When all else fails, crack open the .NET implementation with Reflector or ILSpy and see what they are doing different.

System.StringComparer that supports wildcard (*)

I'm looking for a fast .NET class/library that has a StringComparer that supports wildcard (*) AND incase-sensitivity.
Any Ideas?
You could use Regex with RegexOptions.IgnoreCase, then compare with the IsMatch method.
var wordRegex = new Regex( "^" + prefix + ".*" + suffix + "$", RegexOptions.IgnoreCase );
if (wordRegex.IsMatch( testWord ))
{
...
}
This would match prefix*suffix. You might also consider using StartsWith or EndsWith as alternatives.
Alternatively you can use these extended functions:
public static bool CompareWildcards(this string WildString, string Mask, bool IgnoreCase)
{
int i = 0;
if (String.IsNullOrEmpty(Mask))
return false;
if (Mask == "*")
return true;
while (i != Mask.Length)
{
if (CompareWildcard(WildString, Mask.Substring(i), IgnoreCase))
return true;
while (i != Mask.Length && Mask[i] != ';')
i += 1;
if (i != Mask.Length && Mask[i] == ';')
{
i += 1;
while (i != Mask.Length && Mask[i] == ' ')
i += 1;
}
}
return false;
}
public static bool CompareWildcard(this string WildString, string Mask, bool IgnoreCase)
{
int i = 0, k = 0;
while (k != WildString.Length)
{
if (i > Mask.Length - 1)
return false;
switch (Mask[i])
{
case '*':
if ((i + 1) == Mask.Length)
return true;
while (k != WildString.Length)
{
if (CompareWildcard(WildString.Substring(k + 1), Mask.Substring(i + 1), IgnoreCase))
return true;
k += 1;
}
return false;
case '?':
break;
default:
if (IgnoreCase == false && WildString[k] != Mask[i])
return false;
if (IgnoreCase && Char.ToLower(WildString[k]) != Char.ToLower(Mask[i]))
return false;
break;
}
i += 1;
k += 1;
}
if (k == WildString.Length)
{
if (i == Mask.Length || Mask[i] == ';' || Mask[i] == '*')
return true;
}
return false;
}
CompareWildcards compares a string against multiple wildcard patterns, and CompareWildcard compares a string against a single wildcard pattern.
Example usage:
if (Path.CompareWildcards("*txt;*.zip;", true) == true)
{
// Path matches wildcard
}
alternatively you can try following
class Wildcard : Regex
{
public Wildcard() { }
public Wildcard(string pattern) : base(WildcardToRegex(pattern)) { }
public Wildcard(string pattern, RegexOptions options) : base(WildcardToRegex(pattern), options) { }
public static string WildcardToRegex(string pattern)
{
return "^" + Regex.Escape(pattern).
Replace("\\*", ".*").
Replace("\\?", ".") + "$";
}
}

Whats the problem in this function C#

I have written the below function to select a numeric string such as 1,23,000.00
In the WebBrowser I am trapping Double_Click Event and then passing the selected range to the below function.
lets say the initial selection was 000 and my target is to select the whole string as mentioned above.
myRange=doc.selection.createRange()
myRange=SelectCSNumbers(myRange)
I am returning a Range object from the below function. The issue here is
return tmpRange.duplicate();//here it should terminate
count++;
when I am returning the final range this method is getting called again
How I dont know, Can anyone pointout my mistake.
private mshtml.IHTMLTxtRange SelectCSNumbers(mshtml.IHTMLTxtRange myRange)
{
mshtml.IHTMLTxtRange tmpRange = myRange.duplicate();
string[] strInt = tmpRange.text.Split(',');
bool result = false;
result = CheckText(tmpRange, strInt, result);
if (result && count==0)//
{
//Expand the Range with a single Character
tmpRange.expand("character");
if (tmpRange.text.Length > myRange.text.Length)
{
if (tmpRange.text.IndexOf(' ') == -1) //if no space is found that means the selection is not proper
{
//Check for ,/.
if (tmpRange.text.IndexOf(',') == -1)//if NO Comma is found
{
if (tmpRange.text.IndexOf('.') == -1)
{
//EOS
}
else
{
//. is found
SelectCSNumbers(tmpRange.duplicate());
}
}
else
{
SelectCSNumbers(tmpRange.duplicate());
}
}
else if (tmpRange.text.IndexOf(' ') != -1)
{
tmpRange = myRange.duplicate();
tmpRange.moveStart("character", -1);
if (tmpRange.text.IndexOf(' ') == -1) //if no space is found that means the selection is not proper
{
//Check for ,/.
if (tmpRange.text.IndexOf(',') == -1)//if NO Comma is found
{
if (tmpRange.text.IndexOf('.') == -1)
{
//EOS
}
else
{
//. is found
SelectCSNumbers(tmpRange.duplicate());
}
}
else
{
SelectCSNumbers(tmpRange.duplicate());
}
}
}
}
else if (tmpRange.text.Length == myRange.text.Length)
{
tmpRange = myRange.duplicate();
tmpRange.moveStart("character", -1);
if (tmpRange.text.Length == myRange.text.Length)
{
//tmpRange = null;
return tmpRange.duplicate();//here it should terminate
count++;
}
else if (tmpRange.text.IndexOf(' ') == -1) //if no space is found that means the selection is not proper
{
if (tmpRange.text.IndexOf(',') == -1)//if NO Comma is found
{
if (tmpRange.text.IndexOf('.') == -1)
{
//EOS
}
else
{
//. is found
SelectCSNumbers(tmpRange.duplicate());
}
}
else
{
SelectCSNumbers(tmpRange.duplicate());
}
}
}
}
return tmpRange.duplicate();
}
This doesn't help immediately, but addresses a bigger problem
This code needs to be refactored. It will cause problems for you down the line. You have copy-pasted code that will be a pain to take care of. And also, it makes it harder for others to help.
Here is a suggestion for a refactoring (Not Tested)
private mshtml.IHTMLTxtRange SelectCSNumbers(mshtml.IHTMLTxtRange myRange)
{
mshtml.IHTMLTxtRange tmpRange = myRange.duplicate();
string[] strInt = tmpRange.text.Split(',');
bool result = false;
result = CheckText(tmpRange, strInt, result);
if (result && count==0)//
{
//Expand the Range with a single Character
tmpRange.expand("character");
if (tmpRange.text.Length > myRange.text.Length)
{
if (tmpRange.text.IndexOf(' ') == -1) //if no space is found that means the selection is not proper
{
SomeOtherFunction(tmpRange);
}
else if (tmpRange.text.IndexOf(' ') != -1)
{
tmpRange = myRange.duplicate();
tmpRange.moveStart("character", -1);
SomeOtherFunction(tmpRange);
}
}
else if (tmpRange.text.Length == myRange.text.Length)
{
tmpRange = myRange.duplicate();
tmpRange.moveStart("character", -1);
if (tmpRange.text.Length == myRange.text.Length)
{
//tmpRange = null;
return tmpRange.duplicate();//here it should terminate
count++;
}
else if (tmpRange.text.IndexOf(' ') == -1) //if no space is found that means the selection is not proper
{
SomeOtherFunction(tmpRange);
}
}
}
return tmpRange.duplicate();
}
private void SomeOtherFunction(mshtml.IHTMLTxtRange tmpRange)
{
if (tmpRange.text.IndexOf(',') == -1)//if NO Comma is found
{
if (tmpRange.text.IndexOf('.') == -1)
{
//EOS
}
else
{
//. is found
SelectCSNumbers(tmpRange.duplicate());
}
}
else
{
SelectCSNumbers(tmpRange.duplicate());
}
}
Random guess:
if (tmpRange.text.Length == myRange.text.Length)
{
count++;
return tmpRange.duplicate();
}
If you put count++ after the return statement, it will never be executed.

Categories

Resources