Search Through a text Window app crash - c#

I have two methods which search through a text document in my WPF app. When search for a word in the first search it works fine, but when I add a word to it, it will crash and come up with a null exception. Can someone please help?
Crashes on:
TextRange result = new TextRange(start, start.GetPositionAtOffset(searchText.Length));
Stacktrace:
{"Value cannot be null.\r\nParameter name: position2"}
Example:
if the text said this.
And I search for "if the", then I search for "if the text said" it would crash.
private void btnSearch_Click(object sender, RoutedEventArgs e)
{
string searchText = searchBox.Text.Trim();
searchText = searchText.ToLower();
if (String.IsNullOrWhiteSpace(searchText))
{
MessageBox.Show("Please enter a search term!");
searchBox.Clear();
searchBox.Focus();
newSearch = true;
return;
}
if (!String.IsNullOrEmpty(lastSearch))
{
if (lastSearch != searchText)
newSearch = true;
}
TextRange searchRange;
RichTextBox _body = ((DockPanel)((TabItem)tabControl.Items[tabControl.SelectedIndex]).Content).Children[1] as RichTextBox;
_body.Focus();
if (newSearch)
{
searchRange = new TextRange(_body.Document.ContentStart, _body.Document.ContentEnd);
lastSearch = searchText;
TextPointer position2 = _body.Document.ContentEnd;
}
else
{
backupSearchRange = new TextRange(_body.CaretPosition.GetLineStartPosition(1) == null ?
_body.CaretPosition.GetLineStartPosition(0) : _body.CaretPosition.GetLineStartPosition(1), _body.Document.ContentEnd);
TextPointer position1 = _body.Selection.Start.GetPositionAtOffset(1);
TextPointer position2 = _body.Document.ContentEnd;
searchRange = new TextRange(position1, position2);
}
TextRange foundRange = newSearchFunction(searchRange, searchText);
if (foundRange == null)
{
if (newSearch)
{
MessageBox.Show("\'" + searchBox.Text.Trim() + "\' not found!");
newSearch = true;
lastOffset = -1;
}
else
{
MessageBox.Show("No more results!");
newSearch = true;
lastOffset = -1;
}
}
else
{
_body.Selection.Select(foundRange.Start, foundRange.End);
_body.SelectionBrush = selectionHighlighter;
newSearch = false;
}
}
private TextRange newSearchFunction(TextRange searchRange, string searchText)
{
int offset = searchRange.Text.ToLower().IndexOf(searchText);
offset = searchRange.Text.ToLower().IndexOf(searchText);
if (offset < 0)
return null;
if (lastOffset == offset)
{
//searchRange = backupSearchRange;
offset = searchRange.Text.ToLower().IndexOf(searchText);
if (offset < 0)
return null;
for (TextPointer start = searchRange.Start.GetPositionAtOffset(offset); start != searchRange.End; start = start.GetPositionAtOffset(1))
{
TextRange result = new TextRange(start, start.GetPositionAtOffset(searchText.Length));
if (result.Text.ToLower() == searchText)
{
lastOffset = offset;
return result;
}
}
}
for (TextPointer start = searchRange.Start.GetPositionAtOffset(offset); start != searchRange.End; start = start.GetPositionAtOffset(1))
{
TextRange result = new TextRange(start, start.GetPositionAtOffset(searchText.Length));
if (result.Text.ToLower() == searchText)
{
lastOffset = offset;
return result;
}
}
return null;
}

Method GetPositionAtOffset can return null if it cannot find this position. See TextPointer.GetPositionAtOffset. In your case you see it because you do the search until you don't reach end of your search range, but in the case when for example your search range contains 100 symbols and your search text has 10 symbols after you reach pointer at index 91 - you will call GetPositionAtOffset with offset 10 - and this will be 101 symbol, which gives you null in this case.
You can do simple check in your for loops, something like:
for (
TextPointer start = searchRange.Start.GetPositionAtOffset(offset);
start != searchRange.End;
start = start.GetPositionAtOffset(1))
{
var end = start.GetPositionAtOffset(searchText.Length);
if (end == null)
{
break;
}
TextRange result = new TextRange(start, end);
if (result.Text.ToLower() == searchText)
{
lastOffset = offset;
return result;
}
}
You have one more similar for loop, just add this special check in it too.
It looks like you are doing search, so just want to give you two recommendations:
Use string.Compare method instead ToLower. See String.Compare Method (String, String, Boolean, CultureInfo). In your case it should be string.Compare(text1, text2, ignoreCase: true, culture: CultureInfo.CurrentCulture). In this case your application will support all languages.
If you really want to use ToLower and do search in this way - consider to change it to ToUpper, because some languages can be tricky when you do ToLower. Check this article What's Wrong With Turkey?. When you do ToLower(I) with Turkish locale you will get dotless i, which is different from i. Wikipedia about: Dotted and dotless I.

Related

Split a string if delimiter is between single quotes [duplicate]

This question already has answers here:
How to split csv whose columns may contain comma
(9 answers)
Closed 4 years ago.
I have the following comma-separated string that I need to split. The problem is that some of the content is within quotes and contains commas that shouldn't be used in the split.
String:
111,222,"33,44,55",666,"77,88","99"
I want the output:
111
222
33,44,55
666
77,88
99
I have tried this:
(?:,?)((?<=")[^"]+(?=")|[^",]+)
But it reads the comma between "77,88","99" as a hit and I get the following output:
111
222
33,44,55
666
77,88
,
99
Depending on your needs you may not be able to use a csv parser, and may in fact want to re-invent the wheel!!
You can do so with some simple regex
(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)
This will do the following:
(?:^|,) = Match expression "Beginning of line or string ,"
(\"(?:[^\"]+|\"\")*\"|[^,]*) = A numbered capture group, this will select between 2 alternatives:
stuff in quotes
stuff between commas
This should give you the output you are looking for.
Example code in C#
static Regex csvSplit = new Regex("(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)", RegexOptions.Compiled);
public static string[] SplitCSV(string input)
{
List<string> list = new List<string>();
string curr = null;
foreach (Match match in csvSplit.Matches(input))
{
curr = match.Value;
if (0 == curr.Length)
{
list.Add("");
}
list.Add(curr.TrimStart(','));
}
return list.ToArray();
}
private void button1_Click(object sender, RoutedEventArgs e)
{
Console.WriteLine(SplitCSV("111,222,\"33,44,55\",666,\"77,88\",\"99\""));
}
Warning As per #MrE's comment - if a rogue new line character appears in a badly formed csv file and you end up with an uneven ("string) you'll get catastrophic backtracking (https://www.regular-expressions.info/catastrophic.html) in your regex and your system will likely crash (like our production system did). Can easily be replicated in Visual Studio and as I've discovered will crash it. A simple try/catch will not trap this issue either.
You should use:
(?:^|,)(\"(?:[^\"])*\"|[^,]*)
instead
Fast and easy:
public static string[] SplitCsv(string line)
{
List<string> result = new List<string>();
StringBuilder currentStr = new StringBuilder("");
bool inQuotes = false;
for (int i = 0; i < line.Length; i++) // For each character
{
if (line[i] == '\"') // Quotes are closing or opening
inQuotes = !inQuotes;
else if (line[i] == ',') // Comma
{
if (!inQuotes) // If not in quotes, end of current string, add it to result
{
result.Add(currentStr.ToString());
currentStr.Clear();
}
else
currentStr.Append(line[i]); // If in quotes, just add it
}
else // Add any other character to current string
currentStr.Append(line[i]);
}
result.Add(currentStr.ToString());
return result.ToArray(); // Return array of all strings
}
With this string as input :
111,222,"33,44,55",666,"77,88","99"
It will return :
111
222
33,44,55
666
77,88
99
i really like jimplode's answer, but I think a version with yield return is a little bit more useful, so here it is:
public IEnumerable<string> SplitCSV(string input)
{
Regex csvSplit = new Regex("(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)", RegexOptions.Compiled);
foreach (Match match in csvSplit.Matches(input))
{
yield return match.Value.TrimStart(',');
}
}
Maybe it's even more useful to have it like an extension method:
public static class StringHelper
{
public static IEnumerable<string> SplitCSV(this string input)
{
Regex csvSplit = new Regex("(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)", RegexOptions.Compiled);
foreach (Match match in csvSplit.Matches(input))
{
yield return match.Value.TrimStart(',');
}
}
}
This regular expression works without the need to loop through values and TrimStart(','), like in the accepted answer:
((?<=\")[^\"]*(?=\"(,|$)+)|(?<=,|^)[^,\"]*(?=,|$))
Here is the implementation in C#:
string values = "111,222,\"33,44,55\",666,\"77,88\",\"99\"";
MatchCollection matches = new Regex("((?<=\")[^\"]*(?=\"(,|$)+)|(?<=,|^)[^,\"]*(?=,|$))").Matches(values);
foreach (var match in matches)
{
Console.WriteLine(match);
}
Outputs
111
222
33,44,55
666
77,88
99
None of these answers work when the string has a comma inside quotes, as in "value, 1", or escaped double-quotes, as in "value ""1""", which are valid CSV that should be parsed as value, 1 and value "1", respectively.
This will also work with the tab-delimited format if you pass in a tab instead of a comma as your delimiter.
public static IEnumerable<string> SplitRow(string row, char delimiter = ',')
{
var currentString = new StringBuilder();
var inQuotes = false;
var quoteIsEscaped = false; //Store when a quote has been escaped.
row = string.Format("{0}{1}", row, delimiter); //We add new cells at the delimiter, so append one for the parser.
foreach (var character in row.Select((val, index) => new {val, index}))
{
if (character.val == delimiter) //We hit a delimiter character...
{
if (!inQuotes) //Are we inside quotes? If not, we've hit the end of a cell value.
{
Console.WriteLine(currentString);
yield return currentString.ToString();
currentString.Clear();
}
else
{
currentString.Append(character.val);
}
} else {
if (character.val != ' ')
{
if(character.val == '"') //If we've hit a quote character...
{
if(character.val == '\"' && inQuotes) //Does it appear to be a closing quote?
{
if (row[character.index + 1] == character.val) //If the character afterwards is also a quote, this is to escape that (not a closing quote).
{
quoteIsEscaped = true; //Flag that we are escaped for the next character. Don't add the escaping quote.
}
else if (quoteIsEscaped)
{
quoteIsEscaped = false; //This is an escaped quote. Add it and revert quoteIsEscaped to false.
currentString.Append(character.val);
}
else
{
inQuotes = false;
}
}
else
{
if (!inQuotes)
{
inQuotes = true;
}
else
{
currentString.Append(character.val); //...It's a quote inside a quote.
}
}
}
else
{
currentString.Append(character.val);
}
}
else
{
if (!string.IsNullOrWhiteSpace(currentString.ToString())) //Append only if not new cell
{
currentString.Append(character.val);
}
}
}
}
}
With minor updates to the function provided by "Chad Hedgcock".
Updates are on:
Line 26: character.val == '\"' - This can never be true due to the check made on Line 24. i.e. character.val == '"'
Line 28: if (row[character.index + 1] == character.val) added !quoteIsEscaped to escape 3 consecutive quotes.
public static IEnumerable<string> SplitRow(string row, char delimiter = ',')
{
var currentString = new StringBuilder();
var inQuotes = false;
var quoteIsEscaped = false; //Store when a quote has been escaped.
row = string.Format("{0}{1}", row, delimiter); //We add new cells at the delimiter, so append one for the parser.
foreach (var character in row.Select((val, index) => new {val, index}))
{
if (character.val == delimiter) //We hit a delimiter character...
{
if (!inQuotes) //Are we inside quotes? If not, we've hit the end of a cell value.
{
//Console.WriteLine(currentString);
yield return currentString.ToString();
currentString.Clear();
}
else
{
currentString.Append(character.val);
}
} else {
if (character.val != ' ')
{
if(character.val == '"') //If we've hit a quote character...
{
if(character.val == '"' && inQuotes) //Does it appear to be a closing quote?
{
if (row[character.index + 1] == character.val && !quoteIsEscaped) //If the character afterwards is also a quote, this is to escape that (not a closing quote).
{
quoteIsEscaped = true; //Flag that we are escaped for the next character. Don't add the escaping quote.
}
else if (quoteIsEscaped)
{
quoteIsEscaped = false; //This is an escaped quote. Add it and revert quoteIsEscaped to false.
currentString.Append(character.val);
}
else
{
inQuotes = false;
}
}
else
{
if (!inQuotes)
{
inQuotes = true;
}
else
{
currentString.Append(character.val); //...It's a quote inside a quote.
}
}
}
else
{
currentString.Append(character.val);
}
}
else
{
if (!string.IsNullOrWhiteSpace(currentString.ToString())) //Append only if not new cell
{
currentString.Append(character.val);
}
}
}
}
}
For Jay's answer, if you use a 2nd boolean then you can have nested double-quotes inside single-quotes and vice-versa.
private string[] splitString(string stringToSplit)
{
char[] characters = stringToSplit.ToCharArray();
List<string> returnValueList = new List<string>();
string tempString = "";
bool blockUntilEndQuote = false;
bool blockUntilEndQuote2 = false;
int characterCount = 0;
foreach (char character in characters)
{
characterCount = characterCount + 1;
if (character == '"' && !blockUntilEndQuote2)
{
if (blockUntilEndQuote == false)
{
blockUntilEndQuote = true;
}
else if (blockUntilEndQuote == true)
{
blockUntilEndQuote = false;
}
}
if (character == '\'' && !blockUntilEndQuote)
{
if (blockUntilEndQuote2 == false)
{
blockUntilEndQuote2 = true;
}
else if (blockUntilEndQuote2 == true)
{
blockUntilEndQuote2 = false;
}
}
if (character != ',')
{
tempString = tempString + character;
}
else if (character == ',' && (blockUntilEndQuote == true || blockUntilEndQuote2 == true))
{
tempString = tempString + character;
}
else
{
returnValueList.Add(tempString);
tempString = "";
}
if (characterCount == characters.Length)
{
returnValueList.Add(tempString);
tempString = "";
}
}
string[] returnValue = returnValueList.ToArray();
return returnValue;
}
The original version
Currently I use the following regex:
public static Regex regexCSVSplit = new Regex(#"(?x:(
(?<FULL>
(^|[,;\t\r\n])\s*
( (?<QUODAT> (?<QUO>[""'])(?<DAT>([^,;\t\r\n]|(?<!\k<QUO>\s*)[,;\t\r\n])*)\k<QUO>) |
(?<QUODAT> (?<DAT> [^""',;\s\r\n]* )) )
(?=\s*([,;\t\r\n]|$))
) |
(?<FULL>
(^|[\s\t\r\n])
( (?<QUODAT> (?<QUO>[""'])(?<DAT> [^""',;\s\t\r\n]* )\k<QUO>) |
(?<QUODAT> (?<DAT> [^""',;\s\t\r\n]* )) )
(?=[,;\s\t\r\n]|$)
)
))", RegexOptions.Compiled);
This solution can handle pretty chaotic cases too like below:
This is how to feed the result into an array:
var data = regexCSVSplit.Matches(line_to_process).Cast<Match>().
Select(x => x.Groups["DAT"].Value).ToArray();
See this example in action HERE
Note: The regular expression contains two set of <FULL> block and each of them contains two <QUODAT> block separated by "or" (|). Depending on your task you may only need one of them.
Note: That this regular expression gives us one string array, and works on single line with or without <carrier return> and/or <line feed>.
Simplified version
The following regular expression will already cover many complex cases:
public static Regex regexCSVSplit = new Regex(#"(?x:(
(?<FULL>
(^|[,;\t\r\n])\s*
(?<QUODAT> (?<QUO>[""'])(?<DAT>([^,;\t\r\n]|(?<!\k<QUO>\s*)[,;\t\r\n])*)\k<QUO>)
(?=\s*([,;\t\r\n]|$))
)
))", RegexOptions.Compiled);
See this example in action: HERE
It can process complex, easy and empty items too:
This is how to feed the result into an array:
var data = regexCSVSplit.Matches(line_to_process).Cast<Match>().
Select(x => x.Groups["DAT"].Value).ToArray();
The main rule here is that every item may contain anything but the <quotation mark><separators><comma> sequence AND each item shall being and end with the same <quotation mark>.
<quotation mark>: <">, <'>
<comma>: <,>, <;>, <tab>, <carrier return>, <line feed>
Edit notes: I added some more explanation to make it easier to understand and replaces the text "CO" with "QUO".
Try this:
string s = #"111,222,""33,44,55"",666,""77,88"",""99""";
List<string> result = new List<string>();
var splitted = s.Split('"').ToList<string>();
splitted.RemoveAll(x => x == ",");
foreach (var it in splitted)
{
if (it.StartsWith(",") || it.EndsWith(","))
{
var tmp = it.TrimEnd(',').TrimStart(',');
result.AddRange(tmp.Split(','));
}
else
{
if(!string.IsNullOrEmpty(it)) result.Add(it);
}
}
//Results:
foreach (var it in result)
{
Console.WriteLine(it);
}
I know I'm a bit late to this, but for searches, here is how I did what you are asking about in C sharp
private string[] splitString(string stringToSplit)
{
char[] characters = stringToSplit.ToCharArray();
List<string> returnValueList = new List<string>();
string tempString = "";
bool blockUntilEndQuote = false;
int characterCount = 0;
foreach (char character in characters)
{
characterCount = characterCount + 1;
if (character == '"')
{
if (blockUntilEndQuote == false)
{
blockUntilEndQuote = true;
}
else if (blockUntilEndQuote == true)
{
blockUntilEndQuote = false;
}
}
if (character != ',')
{
tempString = tempString + character;
}
else if (character == ',' && blockUntilEndQuote == true)
{
tempString = tempString + character;
}
else
{
returnValueList.Add(tempString);
tempString = "";
}
if (characterCount == characters.Length)
{
returnValueList.Add(tempString);
tempString = "";
}
}
string[] returnValue = returnValueList.ToArray();
return returnValue;
}
Don't reinvent a CSV parser, try FileHelpers.
I needed something a little more robust, so I took from here and created this... This solution is a little less elegant and a little more verbose, but in my testing (with a 1,000,000 row sample), I found this to be 2 to 3 times faster. Plus it handles non-escaped, embedded quotes. I used string delimiter and qualifiers instead of chars because of the requirements of my solution. I found it more difficult than I expected to find a good, generic CSV parser so I hope this parsing algorithm can help someone.
public static string[] SplitRow(string record, string delimiter, string qualifier, bool trimData)
{
// In-Line for example, but I implemented as string extender in production code
Func <string, int, int> IndexOfNextNonWhiteSpaceChar = delegate (string source, int startIndex)
{
if (startIndex >= 0)
{
if (source != null)
{
for (int i = startIndex; i < source.Length; i++)
{
if (!char.IsWhiteSpace(source[i]))
{
return i;
}
}
}
}
return -1;
};
var results = new List<string>();
var result = new StringBuilder();
var inQualifier = false;
var inField = false;
// We add new columns at the delimiter, so append one for the parser.
var row = $"{record}{delimiter}";
for (var idx = 0; idx < row.Length; idx++)
{
// A delimiter character...
if (row[idx]== delimiter[0])
{
// Are we inside qualifier? If not, we've hit the end of a column value.
if (!inQualifier)
{
results.Add(trimData ? result.ToString().Trim() : result.ToString());
result.Clear();
inField = false;
}
else
{
result.Append(row[idx]);
}
}
// NOT a delimiter character...
else
{
// ...Not a space character
if (row[idx] != ' ')
{
// A qualifier character...
if (row[idx] == qualifier[0])
{
// Qualifier is closing qualifier...
if (inQualifier && row[IndexOfNextNonWhiteSpaceChar(row, idx + 1)] == delimiter[0])
{
inQualifier = false;
continue;
}
else
{
// ...Qualifier is opening qualifier
if (!inQualifier)
{
inQualifier = true;
}
// ...It's a qualifier inside a qualifier.
else
{
inField = true;
result.Append(row[idx]);
}
}
}
// Not a qualifier character...
else
{
result.Append(row[idx]);
inField = true;
}
}
// ...A space character
else
{
if (inQualifier || inField)
{
result.Append(row[idx]);
}
}
}
}
return results.ToArray<string>();
}
Some test code:
//var input = "111,222,\"33,44,55\",666,\"77,88\",\"99\"";
var input =
"111, 222, \"99\",\"33,44,55\" , \"666 \"mark of a man\"\", \" spaces \"77,88\" \"";
Console.WriteLine("Split with trim");
Console.WriteLine("---------------");
var result = SplitRow(input, ",", "\"", true);
foreach (var r in result)
{
Console.WriteLine(r);
}
Console.WriteLine("");
// Split 2
Console.WriteLine("Split with no trim");
Console.WriteLine("------------------");
var result2 = SplitRow(input, ",", "\"", false);
foreach (var r in result2)
{
Console.WriteLine(r);
}
Console.WriteLine("");
// Time Trial 1
Console.WriteLine("Experimental Process (1,000,000) iterations");
Console.WriteLine("-------------------------------------------");
watch = Stopwatch.StartNew();
for (var i = 0; i < 1000000; i++)
{
var x1 = SplitRow(input, ",", "\"", false);
}
watch.Stop();
elapsedMs = watch.ElapsedMilliseconds;
Console.WriteLine($"Total Process Time: {string.Format("{0:0.###}", elapsedMs / 1000.0)} Seconds");
Console.WriteLine("");
Results
Split with trim
---------------
111
222
99
33,44,55
666 "mark of a man"
spaces "77,88"
Split with no trim
------------------
111
222
99
33,44,55
666 "mark of a man"
spaces "77,88"
Original Process (1,000,000) iterations
-------------------------------
Total Process Time: 7.538 Seconds
Experimental Process (1,000,000) iterations
--------------------------------------------
Total Process Time: 3.363 Seconds
I once had to do something similar and in the end I got stuck with Regular Expressions. The inability for Regex to have state makes it pretty tricky - I just ended up writing a simple little parser.
If you're doing CSV parsing you should just stick to using a CSV parser - don't reinvent the wheel.
Here is my fastest implementation based upon string raw pointer manipulation:
string[] FastSplit(string sText, char? cSeparator = null, char? cQuotes = null)
{
string[] oTokens;
if (null == cSeparator)
{
cSeparator = DEFAULT_PARSEFIELDS_SEPARATOR;
}
if (null == cQuotes)
{
cQuotes = DEFAULT_PARSEFIELDS_QUOTE;
}
unsafe
{
fixed (char* lpText = sText)
{
#region Fast array estimatation
char* lpCurrent = lpText;
int nEstimatedSize = 0;
while (0 != *lpCurrent)
{
if (cSeparator == *lpCurrent)
{
nEstimatedSize++;
}
lpCurrent++;
}
nEstimatedSize++; // Add EOL char(s)
string[] oEstimatedTokens = new string[nEstimatedSize];
#endregion
#region Parsing
char[] oBuffer = new char[sText.Length];
int nIndex = 0;
int nTokens = 0;
lpCurrent = lpText;
while (0 != *lpCurrent)
{
if (cQuotes == *lpCurrent)
{
// Quotes parsing
lpCurrent++; // Skip quote
nIndex = 0; // Reset buffer
while (
(0 != *lpCurrent)
&& (cQuotes != *lpCurrent)
)
{
oBuffer[nIndex] = *lpCurrent; // Store char
lpCurrent++; // Move source cursor
nIndex++; // Move target cursor
}
}
else if (cSeparator == *lpCurrent)
{
// Separator char parsing
oEstimatedTokens[nTokens++] = new string(oBuffer, 0, nIndex); // Store token
nIndex = 0; // Skip separator and Reset buffer
}
else
{
// Content parsing
oBuffer[nIndex] = *lpCurrent; // Store char
nIndex++; // Move target cursor
}
lpCurrent++; // Move source cursor
}
// Recover pending buffer
if (nIndex > 0)
{
// Store token
oEstimatedTokens[nTokens++] = new string(oBuffer, 0, nIndex);
}
// Build final tokens list
if (nTokens == nEstimatedSize)
{
oTokens = oEstimatedTokens;
}
else
{
oTokens = new string[nTokens];
Array.Copy(oEstimatedTokens, 0, oTokens, 0, nTokens);
}
#endregion
}
}
// Epilogue
return oTokens;
}
Try this
private string[] GetCommaSeperatedWords(string sep, string line)
{
List<string> list = new List<string>();
StringBuilder word = new StringBuilder();
int doubleQuoteCount = 0;
for (int i = 0; i < line.Length; i++)
{
string chr = line[i].ToString();
if (chr == "\"")
{
if (doubleQuoteCount == 0)
doubleQuoteCount++;
else
doubleQuoteCount--;
continue;
}
if (chr == sep && doubleQuoteCount == 0)
{
list.Add(word.ToString());
word = new StringBuilder();
continue;
}
word.Append(chr);
}
list.Add(word.ToString());
return list.ToArray();
}
This is Chad's answer rewritten with state based logic. His answered failed for me when it came across """BRAD""" as a field. That should return "BRAD" but it just ate up all the remaining fields. When I tried to debug it I just ended up rewriting it as state based logic:
enum SplitState { s_begin, s_infield, s_inquotefield, s_foundquoteinfield };
public static IEnumerable<string> SplitRow(string row, char delimiter = ',')
{
var currentString = new StringBuilder();
SplitState state = SplitState.s_begin;
row = string.Format("{0}{1}", row, delimiter); //We add new cells at the delimiter, so append one for the parser.
foreach (var character in row.Select((val, index) => new { val, index }))
{
//Console.WriteLine("character = " + character.val + " state = " + state);
switch (state)
{
case SplitState.s_begin:
if (character.val == delimiter)
{
/* empty field */
yield return currentString.ToString();
currentString.Clear();
} else if (character.val == '"')
{
state = SplitState.s_inquotefield;
} else
{
currentString.Append(character.val);
state = SplitState.s_infield;
}
break;
case SplitState.s_infield:
if (character.val == delimiter)
{
/* field with data */
yield return currentString.ToString();
state = SplitState.s_begin;
currentString.Clear();
} else
{
currentString.Append(character.val);
}
break;
case SplitState.s_inquotefield:
if (character.val == '"')
{
// could be end of field, or escaped quote.
state = SplitState.s_foundquoteinfield;
} else
{
currentString.Append(character.val);
}
break;
case SplitState.s_foundquoteinfield:
if (character.val == '"')
{
// found escaped quote.
currentString.Append(character.val);
state = SplitState.s_inquotefield;
}
else if (character.val == delimiter)
{
// must have been last quote so we must find delimiter
yield return currentString.ToString();
state = SplitState.s_begin;
currentString.Clear();
}
else
{
throw new Exception("Quoted field not terminated.");
}
break;
default:
throw new Exception("unknown state:" + state);
}
}
//Console.WriteLine("currentstring = " + currentString.ToString());
}
This is a lot more lines of code than the other solutions, but it is easy to modify to add edge cases.

Trying to detect which html tags are affecting an element in a string

I am trying to write a markup editor in c#, part of is is to detect which tags are affecting where the caret currently is.
Examples:
<b>Tes|ting</b><u><i>Testing</i>Testing</u> (caret '|' at index 6)
Answer: [b]
<b>Testing<u><i>Test|ing</i>Tes</b>ting</u> (caret '|' at index 20)
Answer: [b, u, i]
Here is the code I have currently, inside "GetTags()" I split the string into 2 queues representing what tags are in front and behind where the caret is. I played around with ways to pop the elements from both queues to try to solve the problem but I keep getting wrong results. I think im on the right track creating 2 data structures but I don't know if Queue<> is what I need.
public class Tag
{
public enum TagType
{
bold,
italic,
underline,
}
public TagType type;
public bool opening;
public Tag(TagType type, bool opening)
{
this.type = type;
this.opening = opening;
}
public static Tag GetTagFromString(string str)
{
switch (str)
{
case "<b>": return new Tag(TagType.bold, true);
case "</b>": return new Tag(TagType.bold, false);
case "<i>": return new Tag(TagType.italic, true);
case "</i>": return new Tag(TagType.italic, false);
case "<u>": return new Tag(TagType.underline, true);
case "</u>": return new Tag(TagType.underline, false);
}
return null;
}
public static List<TagType> GetTags(string str, int index)
{
Queue<Tag> backQueue = new Queue<Tag>();
Queue<Tag> forwardQueue = new Queue<Tag>();
// populate the back queue
int i = index;
while (true)
{
int lastOpening = str.LastIndexOf('<', i);
int lastClosing = str.LastIndexOf('>', i);
if (lastOpening != -1 && lastClosing != -1)
{
string tagStr = str.Substring(lastOpening, lastClosing - lastOpening + 1);
Tag tag = GetTagFromString(tagStr);
backQueue.Enqueue(tag);
i = lastOpening - 1;
if (i < 0)
break;
}
else
break;
}
// populate the front stack
i = index;
while (true)
{
int nextOpening = str.IndexOf('<', i);
int nextClosing = str.IndexOf('>', i);
if (nextOpening != -1 && nextClosing != -1)
{
string tagStr = str.Substring(nextOpening, nextClosing - nextOpening + 1);
Tag tag = GetTagFromString(tagStr);
forwardQueue.Enqueue(tag);
i = nextClosing + 1;
}
else
break;
}
List<TagType> tags = new List<TagType>();
// populate 'tags' list with the tags affecting the index here
return tags;
}
public override string ToString()
{
string str = "<";
if (!opening)
str += "/";
switch (type)
{
case TagType.bold:
str += "b";
break;
case TagType.italic:
str += "i";
break;
case TagType.underline:
str += "u";
break;
}
str += ">";
return str;
}
}
Would love any input on how I could solve this and would greatly appreciate any issues anyone has with the code that I've provided.

Highlighting words in RichEditBox

It is needed to highlight by fiven colour a substring in the document RichEditBox. For this purpose I wrote a method:
private async Task ChangeTextColor(string text, Color color)
{
string textStr;
bool theEnd = false;
int startTextPos = 0;
myRichEdit.Document.GetText(TextGetOptions.None, out textStr);
while (theEnd == false)
{
myRichEdit.Document.GetRange(startTextPos, textStr.Length).GetText(TextGetOptions.None, out textStr);
var isFinded = myRichEdit.Document.GetRange(startTextPos, textStr.Length).FindText(text, textStr.Length, FindOptions.None);
if (isFinded != 0)
{
string textStr2;
textStr2 = myRichEdit.Document.Selection.Text;
var dialog = new MessageDialog(textStr2);
await dialog.ShowAsync();
myRichEdit.Document.Selection.CharacterFormat.BackgroundColor = color;
startTextPos = myRichEdit.Document.Selection.EndPosition;
myRichEdit.Document.ApplyDisplayUpdates();
}
else
{
theEnd = true;
}
}
}
In the debugger you can see that there is a substring and isFinded is equal with the number of signs (or symbols) in the found substring. It means the fragment is found and judging by the description of the method FindText should be highlighted but it isn't. In textStr2 an empty line returns and, correspondingly, the colour doesn't change. I cannot identify the reasons of the error.
The code you postted did not set the selection, so the myRichEdit.Document.Selection is null. You can use ITextRange.SetRange to set the selection. And you can use ITextRange.FindText method to find the string in the selection.
For example:
private void ChangeTextColor(string text, Color color)
{
string textStr;
myRichEdit.Document.GetText(TextGetOptions.None, out textStr);
var myRichEditLength = textStr.Length;
myRichEdit.Document.Selection.SetRange(0, myRichEditLength);
int i = 1;
while (i > 0)
{
i = myRichEdit.Document.Selection.FindText(text, myRichEditLength, FindOptions.Case);
ITextSelection selectedText = myRichEdit.Document.Selection;
if (selectedText != null)
{
selectedText.CharacterFormat.BackgroundColor = color;
}
}
}

highlight word in richtextbox on moveover in WPF C#

I am trying to highlight words in richtextbox. When user mouse over the word in richtextbox the word should be highlighted.
Below is the code I am so far. Errors: last highlight point is good but start point is not accurate, When i place a new paragrah or new line then start point goes so away then expected result.
private void richTextBox1_MouseMove(object sender, MouseEventArgs e)
{
richTextBox1.Focus();
selectWordOnMouseOver();
}
public void selectWordOnMouseOver()
{
if (richTextBox1 == null)
return;
TextPointer cursurPosition = richTextBox1.GetPositionFromPoint(Mouse.GetPosition(richTextBox1), false);
if (cursurPosition == null)
return;
int offset = richTextBox1.Document.ContentStart.GetOffsetToPosition(cursurPosition);
// offset = offset;
//MessageBox.Show("Offset = " + offset.ToString());
int spaceAfter = FindSpaceAfterWordFromPosition(cursurPosition, " ");
int spaceBefore = FindSpaceBeforeWordFromPosition(cursurPosition, " ");
TextPointer wholeText = richTextBox1.Document.ContentStart;
TextPointer start = wholeText.GetPositionAtOffset(spaceBefore);
TextPointer end = wholeText.GetPositionAtOffset(spaceAfter);
if (start == null || end == null )
return;
richTextBox1.Selection.ApplyPropertyValue(TextElement.ForegroundProperty, Brushes.Black);
richTextBox1.Selection.Select(start, end);
// MessageBox.Show("Mouse Over On = " + offset.ToString() + ": Word Start = " + (spaceBefore).ToString() + ": Word End = " + (spaceAfter).ToString() + " : Word is = " + richTextBox1.Selection.Text);
}
int FindSpaceBeforeWordFromPosition(TextPointer position, string word)
{
while (position != null)
{
if (position.GetPointerContext(LogicalDirection.Backward) == TextPointerContext.Text)
{
string textRun = position.GetTextInRun(LogicalDirection.Backward);
// Find the starting index of any substring that matches "word".
int spaceIndexBeforeMouseOver = textRun.LastIndexOf(word);
return spaceIndexBeforeMouseOver;
}
position = position.GetNextContextPosition(LogicalDirection.Backward);
}
// position will be null if "word" is not found.
return 0;
}
int FindSpaceAfterWordFromPosition(TextPointer position, string word)
{
while (position != null)
{
if (position.GetPointerContext(LogicalDirection.Forward) == TextPointerContext.Text)
{
string textRun = position.GetTextInRun(LogicalDirection.Forward);
// Find the starting index of any substring that matches "word".
int spaceIndexAfterMouseOver = textRun.IndexOf(word);
int lastIndexAfterMouseOverIndex = textRun.Count();
int mouseOverIndex = richTextBox1.Document.ContentStart.GetOffsetToPosition(position);
if (spaceIndexAfterMouseOver >= 0)
{
return spaceIndexAfterMouseOver + mouseOverIndex;
}
else//if space index not found the select to to the last word of text box
return mouseOverIndex + lastIndexAfterMouseOverIndex;
}
position = position.GetNextContextPosition(LogicalDirection.Forward);
}
// position will be null if "word" is not found.
return 0;
}
results when no new line.
results when there are 3 new line feeds.
UPDATE 1:
I recently after testing found one thing that I am getting wrong offset.
Point nMousePositionCoordinate = Mouse.GetPosition(richTextBox1);
// System.Diagnostics.Debug.WriteLine(nMousePositionCoordinate.ToString());
TextPointer cursurPosition = richTextBox1.GetPositionFromPoint(nMousePositionCoordinate, false);
if (cursurPosition == null)
return;
int offset = richTextBox1.Document.ContentStart.GetOffsetToPosition(cursurPosition);
System.Diagnostics.Debug.WriteLine(offset.ToString());

WPF Flowdocument "change case" feature

I am implementing a "change case" functionality for my RichTextBox like word has with Shift+F3. All it does is switching between lower->upper->title case, which is very simple once I get access to the string I need.
My question is, how to change (and find it in the first place) a string in flowdocument without losing any embedded elements (losing formatting is not a problem) that may be contained within the string.
Same as word, I need this functionality for 2 cases:
1) Mouse-selected text. I tried simply
this.Selection.Text = newText;
But that of course lost my embedded elements.
2) The last word before caret position. Any non-text element is a word delimiter, however one word can be
"He<weird formatting begin>ll<weird formatting end>o".
SOLUTION
This way it mimics MS WORD Shift+F3 behaviour. Only problem that in very few cases occurs is the carret being moved to the word beginning instead of keeping its position. I suppose that a short sleep after EditingCommands.MoveLeftByWord.Execute(null, this); would fix this, but this would be a dirty hack and I am trying to find out a nicer solution.
private void ChangeCase()
{
try
{
TextPointer start;
TextPointer end;
FindSelectedRange(out start, out end);
List<TextRange> textToChange = SplitToTextRanges(start, end);
ChangeCaseToAllRanges(textToChange);
}
catch (Exception ex)
{
mLog.Error("Change case error", ex);
}
}
private void FindSelectedRange(out TextPointer start, out TextPointer end)
{
if (!this.Selection.IsEmpty)
{
start = this.Selection.Start;
end = this.Selection.End;
}
else
{
end = this.CaretPosition;
EditingCommands.MoveLeftByWord.Execute(null, this);
start = this.CaretPosition;
this.CaretPosition = end;
}
}
private static List<TextRange> SplitToTextRanges(TextPointer start, TextPointer end)
{
List<TextRange> textToChange = new List<TextRange>();
var previousPointer = start;
for (var pointer = start; pointer.CompareTo(end) <= 0; pointer = pointer.GetPositionAtOffset(1, LogicalDirection.Forward))
{
var contextAfter = pointer.GetPointerContext(LogicalDirection.Forward);
var contextBefore = pointer.GetPointerContext(LogicalDirection.Backward);
if (contextBefore != TextPointerContext.Text && contextAfter == TextPointerContext.Text)
{
previousPointer = pointer;
}
if (contextBefore == TextPointerContext.Text && contextAfter != TextPointerContext.Text && previousPointer != pointer)
{
textToChange.Add(new TextRange(previousPointer, pointer));
previousPointer = null;
}
}
textToChange.Add(new TextRange(previousPointer ?? end, end));
return textToChange;
}
private void ChangeCaseToAllRanges(List<TextRange> textToChange)
{
var textInfo = (mCasingCulture ?? CultureInfo.CurrentUICulture).TextInfo;
var allText = String.Join(" ", textToChange.Select(x => x.Text).Where(x => !string.IsNullOrWhiteSpace(x)));
Func<string, string> caseChanger = GetConvertorToNextState(textInfo, allText);
foreach (var range in textToChange)
{
if (!range.IsEmpty && !string.IsNullOrWhiteSpace(range.Text))
{
range.Text = caseChanger(range.Text);
}
}
}
private static Func<string, string> GetConvertorToNextState(TextInfo textInfo, string allText)
{
Func<string, string> caseChanger;
if (textInfo.ToLower(allText) == allText)
{
caseChanger = (text) => textInfo.ToTitleCase(text);
}
else if (textInfo.ToTitleCase(textInfo.ToLower(allText)) == allText)
{
caseChanger = (text) => textInfo.ToUpper(text);
}
else
{
caseChanger = (text) => textInfo.ToLower(text);
}
return caseChanger;
}

Categories

Resources