I have a text file of code from an old 3rd party system that I'm trying to upgrade. The code is structured text and looks very similar to VB. I'd like to parse the text file and display formatted text in a WPF application. Ideally, it would look something similar to the Visual Studio code editor.
Below is a sample of the code I am trying to format
--this is a comment
LOCAL tag1 --LOCAL would be formatted
LOCAL tag2
LOCAL foo
IF tag1 > tag2 THEN --IF and THEN would be formatted
foo = tag1
END IF --end if would be formatted
I've managed to do this by creating a FlowDocument from the original text of code. Then I search the text file for keywords and change the color of text with the following method
private FlowDocument FormatDocument(FlowDocument flowDocument, List<string> keyWordList, Brush brush)
{
TextPointer position = flowDocument.ContentStart;
while (position != null)
{
if (position.CompareTo(flowDocument.ContentEnd) == 0)
break;
if (position.GetPointerContext(LogicalDirection.Forward) == TextPointerContext.Text) //checks to see if textpointer is actually text
{
foreach (string keyword in keyWordList)
{
string textRun = position.GetTextInRun(LogicalDirection.Forward);
string pattern = #"\b" + Regex.Escape(keyword) + #"\b";
Match match = Regex.Match(textRun, pattern, RegexOptions.IgnoreCase);
if (match.Success)
{
int indexInRun = match.Index;
int indexOfComment = textRun.IndexOf("--");
TextPointer startPosition = position.GetPositionAtOffset(indexInRun);
TextPointer endPosition = startPosition.GetPositionAtOffset(keyword.Length);
TextRange keywordRange = new TextRange(startPosition, endPosition);
string test = keywordRange.Text;
if (indexOfComment == -1 || indexInRun < indexOfComment)
keywordRange.ApplyPropertyValue(TextElement.ForegroundProperty, brush);
}
}
position = position.GetNextContextPosition(LogicalDirection.Forward);
}
else //If the current position doesn't represent a text context position, go to the next context position.
position = position.GetNextContextPosition(LogicalDirection.Forward); // This can effectively ignore the formatting or embed element symbols.
}
return flowDocument;
}
The code is a bit slow when the files are large so I'm wondering is there a better way to go about this?
Your code seems okay, except that you're creating a bunch of objects every iteration of each loop, which will be slow, especially for Regex objects. Regex are also much faster if you compile them. Create your Regex objects outside of either loop and compile them, and I'll bet you see some improvement.
If that's not enough improvement, try building a single Regex that will match any word in the keyword list (\b[keyword1|keyword2|keyword3|...]\b).
public static FlowDocument FormatDocument(FlowDocument flowDocument,
List<string> keyWordList,
Brush brush)
{
var regexForKeyword = keyWordList.ToDictionary(k => k,
k => new Regex(#"\b" + Regex.Escape(keyword) + #"\b",
RegexOptions.Compiled | RegexOptions.IgnoreCase));
var position = flowDocument.ContentStart;
while (position != null)
{
if (position.CompareTo(flowDocument.ContentEnd) == 0)
break;
if (position.GetPointerContext(LogicalDirection.Forward) == TextPointerContext.Text) //checks to see if textpointer is actually text
{
foreach (string keyword in keyWordList)
{
var textRun = position.GetTextInRun(LogicalDirection.Forward);
var match = regexForKeyword[keyword].Match(textRun);
if (match.Success)
{
var indexInRun = match.Index;
var indexOfComment = textRun.IndexOf("--");
var startPosition = position.GetPositionAtOffset(indexInRun);
var endPosition = startPosition.GetPositionAtOffset(keyword.Length);
var keywordRange = new TextRange(startPosition, endPosition);
var test = keywordRange.Text;
if (indexOfComment == -1 || indexInRun < indexOfComment)
keywordRange.ApplyPropertyValue(TextElement.ForegroundProperty, brush);
}
}
position = position.GetNextContextPosition(LogicalDirection.Forward);
}
else //If the current position doesn't represent a text context position, go to the next context position.
position = position.GetNextContextPosition(LogicalDirection.Forward); // This can effectively ignore the formatting or embed element symbols.
}
return flowDocument;
}
Related
I have a richTextBox and a Regex with some words. Once, I find all the words I want to change their color to blue. I can use SelectionColor = Blue, but when it comes to coloring thousands of words it becomes quite slow.
After some search, I read that changing the RTF of the richTextBox is a faster way to change the text (e.g. it's size and/or color).
Here is my unfinished code:
MatchCollection matches = myRegex.Matches(richTextBox.text);
foreach (Match match in matches)
{
richTextBox.Select(match.Index, match.Length);
string addColor = #"{\colortbl ;\red0\green0\blue255;}" + Environment.NewLine;
richTextBox.SelectionColor = Color.Blue; //Must be replaced
}
I also found out that in every case (in my case, the entire text uses the same font and has the same size, only the color of some words changes) the SelectedRtf is:
{\rtf1\ansi\deff0{\fonttbl{\f0\fnil\fcharset0 Consolas;}}
\uc1\pard\lang1033\f0\fs18 word} // richTextBox.SelectedRtf
Moreover, using the Selection.Color = Blue changes the SelectedRtf to:
{\rtf1\ansi\deff0{\fonttbl{\f0\fnil\fcharset0 Consolas;}}
{\colortbl ;\red0\green0\blue255;} // The addColor string!
\uc1\pard\lang1033\f0\fs18 word}
To get the above string, I use this: richTextBox.SelectedRtf.Insert(59, addColor), so what I need to do is to replace SelectedRtf with that. However, after some attempts, nothing seems to happen. The color of the words remains the same. Any ideas?
Yes, it is possible and about twice as fast than the 'regular' way..:
Changing 30k words in a 3M text takes 28 seconds over 60 seconds before..
Here are the steps I would recommend, assuming your words are identifyable in the richTextBox.Rtf (*):
You could create your own color table, but it seems safer to let the system do it for you: I cheat by coloring the 1st letter before and resetting it after coloring the matches..
I pre- and postfix the search word by the rtf code for a foreground color index into the table. My code assumes that there is only one extra color in addition to the default one.
If you have more you should keep track and/ or analyze the colortable..
Here is a RTF reference, btw..
I do the replacement with the RegEx in my RichTextBox RTB like this:
string search "find me!";
RTB.SelectionStart = 0;
RTB.SelectionLength = 1;
RTB.SelectionColor = Color.HotPink;
Regex RX = new Regex(search);
MatchCollection matches = RX.Matches(RTB.Rtf);
RTB.Rtf = RX.Replace(RTB.Rtf, "\\cf1 " + search + "\\cf0 ");
RTB.SelectionStart = 0;
RTB.SelectionLength = 1;
RTB.SelectionColor = RTB.ForeColor;
(*) Note that modifying the Rtf property like this assumes that your search texts are identifiable in the Rtf. You can and should check this by comparing the matches count when searching the Rtf and the Text ! when they don't agree you probably need to use the 'regular' way..
Note that this only deals with Colors. For Font sizes etc you will have to add \fn (which index into the stylesheet) commands in a similar way..
Update: I have wrapped the code above in an expanded function, also taking care of more colors, word boundaries and some checks..:
int colorWords(RichTextBox RTB, String searchWord, Color color)
{
string wordChar = #"\w*"; // or #"\b*" for stricter search
Regex RX = new Regex(wordChar + searchWord + wordChar);
RTB.SelectionStart = 0;
RTB.SelectionLength = 0;
RTB.SelectedText = "~"; // insert a dummy character
RTB.SelectionStart = 0;
RTB.SelectionLength = 1;
RTB.SelectionColor = color; // and color it
MatchCollection matches = null;
matches = RX.Matches(RTB.Text);
int textCount = matches.Count;
matches = RX.Matches(RTB.Rtf);
// we should not find more in the rtf code, less is ok
if (textCount < matches.Count) return -1;
if (matches.Count <= 0) return 0;
List<Color> colors = getRtfColorTable(RTB);
int cIndex = 1;
Color cRGB = Color.FromArgb(255, color);
if (colors.Contains(cRGB) )
cIndex = colors.FindIndex(x => x == cRGB) + 1;
RTB.Rtf = RX.Replace(RTB.Rtf, "\\cf" + cIndex + " " + searchWord + "\\cf0 ");
RTB.SelectionStart = 0;
RTB.SelectionLength = 1;
RTB.Cut(); // remove the dummy
return matches.Count;
}
Here is a function that pulls out the current colors from the Rtf color table. (Hopefully, the full spec is not exactly very small and tackling it with two simple IndexOf is a little optimistic.. ;-)
List<Color> getRtfColorTable(RichTextBox RTB)
{ // \red255\green0\blue0;
List<Color> colors = new List<Color>();
string tabString = #"\colortbl ;";
int ct0 = RTB.Rtf.IndexOf(tabString);
if (ct0 >= 0)
{
ct0 += tabString.Length;
int ct1 = RTB.Rtf.IndexOf(#"}", ct0);
var table = RTB.Rtf.Substring(ct0, ct1 - ct0).Split(';');
foreach(string t in table)
{
var ch = t.Split('\\');
if (ch.Length == 4)
{
int r = Convert.ToInt16(ch[1].Replace("red", ""));
int g = Convert.ToInt16(ch[2].Replace("green", ""));
int b = Convert.ToInt16(ch[3].Replace("blue", ""));
colors.Add(Color.FromArgb(255, r, g, b));
}
}
}
return colors;
}
The example was called like this:
colorWords(RTB, "<DIR>", Color.SaddleBrown);
colorWords(RTB, "Verzeichnis", Color.BlueViolet);
colorWords(RTB, "2012", Color.OrangeRed);
Trying to select and color a specific word in WPF Richtextbox but my method selects just first 5 letters of the word. Indexes 0,1 and 2 seems to be empty string although the first word in my rtb is "private" and there is no empty string before it.
What can be the cause of this problem?
public void FormatRtbText(RichTextBox rtb)
{
int x, y;
string str = "private";
var text = new TextRange(rtb.Document.ContentStart, rtb.Document.ContentEnd).Text;
x = text.IndexOf(str);
y = x + str.Length;
var range = new TextRange(rtb.Document.ContentStart.GetPositionAtOffset(x), rtb.Document.ContentStart.GetPositionAtOffset(y));
range.ApplyPropertyValue(TextElement.ForegroundProperty, Brushes.Red);
}
GetPositionAtOffset considers 3 things as symbols while calculating the offset:
An opening or closing tag for the TextElement element.
A UIElement element contained in an InlineUIContainer or BlockUIContainer. Note
that such a UIElement is always counted as exactly one symbol; any
additional content or elements contained by the UIElement are not
counted as symbols.
A 16-bit Unicode character inside of a text Run
element.
Here the first two symbols are the Paragraph and Run elements. Therefore your TextRange is two symbols behind what you want. This code should the the work. (What this code does is just skipping symbols until the next symbol is text.)
TextPointer start = rtb.Document.ContentStart;
while (start.GetPointerContext(LogicalDirection.Forward) != TextPointerContext.Text)
{
start = start.GetNextContextPosition(LogicalDirection.Forward);
if (start == null) return;
}
...
var range = new TextRange(start, start.GetPositionAtOffset(y));
I found that the offsets returned by the wpf rtf box are practically worthless. They don't take into account the hidden characters that the textbox requires. Each new paragraph, image, etc in the box will add even more hidden chars that skew the offset.
Here's what I came up with to search for the match closed to the caret position.
private TextRange FindText(string findText)
{
var fullText = DoGetAllText();
if (string.IsNullOrEmpty(findText) || string.IsNullOrEmpty(fullText) || findText.Length > fullText.Length)
return null;
var textbox = GetTextbox();
var leftPos = textbox.CaretPosition;
var rightPos = textbox.CaretPosition;
while (true)
{
var previous = leftPos.GetNextInsertionPosition(LogicalDirection.Backward);
var next = rightPos.GetNextInsertionPosition(LogicalDirection.Forward);
if (previous == null && next == null)
return null; //can no longer move outward in either direction and text wasn't found
if (previous != null)
leftPos = previous;
if (next != null)
rightPos = next;
var range = new TextRange(leftPos, rightPos);
var offset = range.Text.IndexOf(findText, StringComparison.InvariantCultureIgnoreCase);
if (offset < 0)
continue; //text not found, continue to move outward
//rtf has broken text indexes that often come up too low due to not considering hidden chars. Increment up until we find the real position
var findTextLower = findText.ToLower();
var endOfDoc = textbox.Document.ContentEnd.GetNextInsertionPosition(LogicalDirection.Backward);
for (var start = range.Start.GetPositionAtOffset(offset); start != endOfDoc; start = start.GetPositionAtOffset(1))
{
var result = new TextRange(start, start.GetPositionAtOffset(findText.Length));
if (result.Text?.ToLower() == findTextLower)
{
return result;
}
}
}
}
If you want to highlight the match then it'd be as simple as changing this method to void and doing this when you found the match:
textbox.Selection.Select(result.Start, result.End);
I have a string something like this:
"2014-01-23 09:13:45|\"10002112|TR0859657|25-DEC-2013>0000000000000001\"|10002112"
I would like to split by pipe apart from anything wrapped in double quotes so I have something like (similar to how csv is done):
[0] => 2014-01-23 09:13:45
[1] => 10002112|TR0859657|25-DEC-2013>0000000000000001
[2] => 10002112
I would like to know if there is a regular expression that can do this?
I think you may need to write your own parser.
Yo will need:
custom collection to keep results
boolean flag to decide whether pipe is inside quotation or outside quotation marks
string (or StringBuilder) to keep current word
The idea is that you read string char by char. Each char is appended to the word. If there is a pipe outside quotation marks you add the word to your result collection. If there is a quote you switch a flag so you don't treat the pipe as a divider anymore but you append it as a part of the word. Then if there is another quotation you switch the flag back again. So next pipe will result in adding the whole word (with pipes within quotation marks) to the collection. I tested the code below on your example and it worked.
private static List<string> ParseLine(string yourString)
{
bool ignorePipe = false;
string word = string.Empty;
List<string> divided = new List<string>();
foreach (char c in yourString)
{
if (c == '|' &&
!ignorePipe)
{
divided.Add(word);
word = string.Empty;
}
else if (c == '"')
{
ignorePipe = !ignorePipe;
}
else
{
word += c;
}
}
divided.Add(word);
return divided;
}
How about this Regular Expression:
/((["|]).*\2)/g
Online Demo
It looks like it could be used as valid split expression.
I'm going to blatantly ignore the fact that you want a RegEx, because I think that making your own IEnumerable will be easier. Plus, you get instant access to Linq.
var line = "2014-01-23 09:13:45|\"10002112|TR0859657|25-DEC-2013>0000000000000001\"|10002112";
var data = GetPartsFromLine(line).ToList();
private static IEnumerable<string> GetPartsFromLine(string line)
{
int position = -1;
while (position < line.Length)
{
position++;
if (line[position] == '"')
{
//go find the next "
int endQuote = line.IndexOf('"', position + 1);
yield return line.Substring(position + 1, endQuote - position - 1);
position = endQuote;
if (position < line.Length && line[position + 1] == '|')
{
position++;
}
}
else
{
//go find the next |
int pipe = line.IndexOf('|', position + 1);
if (pipe == -1)
{
//hit the end of the line
yield return line.Substring(position);
position = line.Length;
}
else
{
yield return line.Substring(position, pipe - position);
position = pipe;
}
}
}
}
This hasn't been fully tested, but it works with your example.
I would like to know how I can get the word that current cursor is on, in WPF RichTextBox. I am aware that RichTextBox has Selection Property. However, this only gives me the text that is highlighted in the RichTextBox. Instead I would like to know the word the cursor is on even if the whole word is not highlighted.
Any tips are appreciated.
Attach this function to an arbitrary RichTextBox, now called testRTB, and see Output window for results:
private void testRTB_MouseUp(object sender, MouseButtonEventArgs e)
{
TextPointer start = testRTB.CaretPosition; // this is the variable we will advance to the left until a non-letter character is found
TextPointer end = testRTB.CaretPosition; // this is the variable we will advance to the right until a non-letter character is found
String stringBeforeCaret = start.GetTextInRun(LogicalDirection.Backward); // extract the text in the current run from the caret to the left
String stringAfterCaret = start.GetTextInRun(LogicalDirection.Forward); // extract the text in the current run from the caret to the left
Int32 countToMoveLeft = 0; // we record how many positions we move to the left until a non-letter character is found
Int32 countToMoveRight = 0; // we record how many positions we move to the right until a non-letter character is found
for (Int32 i = stringBeforeCaret.Length - 1; i >= 0; --i)
{
// if the character at the location CaretPosition-LeftOffset is a letter, we move more to the left
if (Char.IsLetter(stringBeforeCaret[i]))
++countToMoveLeft;
else break; // otherwise we have found the beginning of the word
}
for (Int32 i = 0; i < stringAfterCaret.Length; ++i)
{
// if the character at the location CaretPosition+RightOffset is a letter, we move more to the right
if (Char.IsLetter(stringAfterCaret[i]))
++countToMoveRight;
else break; // otherwise we have found the end of the word
}
start = start.GetPositionAtOffset(-countToMoveLeft); // modify the start pointer by the offset we have calculated
end = end.GetPositionAtOffset(countToMoveRight); // modify the end pointer by the offset we have calculated
// extract the text between those two pointers
TextRange r = new TextRange(start, end);
String text = r.Text;
// check the result
System.Diagnostics.Debug.WriteLine("[" + text + "]");
}
Change Char.IsLetter(...) to Char.IsLetterOrDigit(...) or whatever else appropriately depending on whether you wish to keep digits as well.
Tip: extract this into an extension method in a separate assembly to access it whenever needed.
OK so in order to solve this I brute forced it.
I used
curCaret.GetTextInRun(LogicalDirection.Backward)
and
curCaret.GetTextInRun(LogicalDirection.Forward)
along with preCaretString.LastIndexOf(" ") and postCaretString.IndexOf(" ") plus other dividers that separates word and got the substrings.
Eventually I added the first half of string and second half of string to obtain the currently cursored word.
I bet there are cleverer way of doing this but at least this solved the problem
You can get the current position of the cursor via CaretPosition.
Unfortunately there is no easy way to get the characters to the left/right of the caret position. The only way I know of to get text out of a RichTextBox is in this answer, which is a bit convoluted. But it will accomplish what is necessary.
Since words are divided by spaces, you can iterate through the runs around the caret until space is found. This function should work even when your RichTextBox even contains different Fonts and Font Sizes.
public string GetWordByCaret(LogicalDirection direction)
{
// Get the CaretPosition
TextPointer position = this.CaretPosition;
TextPointerContext context = position.GetPointerContext(direction);
string text = string.Empty;
// Iterate through the RichTextBox based on the Start, Text and End of nearby inlines
while (context != TextPointerContext.None)
{
// We are only interested in the text here
//, so ignore everything that is not text
if (context == TextPointerContext.Text)
{
string current = position.GetTextInRun(direction);
// The strings appended based on whether they are before the caret or after it...
// And well...I love switches :)
switch (direction)
{
case LogicalDirection.Backward:
{
int spaceIndex = current.LastIndexOf(' ');
// If space is found, we've reached the end
if (spaceIndex >= 0)
{
int length = current.Length - 1;
if (spaceIndex + 1 <= length)
{
text = current.Substring(spaceIndex + 1, length - spaceIndex) + text;
}
return text;
}
else
text = current + text;
}
break;
default:
{
int spaceIndex = current.IndexOf(' ');
// If space is found, we've reached the end
if (spaceIndex >= 0)
{
int length = current.Length;
if (spaceIndex <= length)
{
text += current.Substring(0, spaceIndex);
}
return text;
}
else
text += current;
}
break;
}
}
// Move to the next position
position = position.GetNextContextPosition(direction);
// Get the next context
if (position != null)
context = position.GetPointerContext(direction);
else
context = TextPointerContext.None;
}
return text;
}
Now you can get the word you caret is on like this.
string before = GetWordByCaret(LogicalDirection.Backward);
string after = GetWordByCaret(LogicalDirection.Forward);
string word = before + after; // :)
Here is my alternative solution using LINQ and Dependency Property:
public class SelectionRichTextBox : RichTextBox
{
public SelectionRichTextBox()
{
// Use base class style
SetResourceReference(StyleProperty, typeof(RichTextBox));
}
public static readonly DependencyProperty SelectedWordProperty =
DependencyProperty.Register(
"SelectedWord",
typeof(string),
typeof(SelectionRichTextBox),
new PropertyMetadata("")
);
public string SelectedWord
{
get
{
return (string)GetValue(SelectedWordProperty);
}
set
{
SetValue(SelectedWordProperty, value);
}
}
protected override void OnMouseUp(MouseButtonEventArgs e)
{
TextPointer cursorPosition = CaretPosition;
string strBeforeCursor = cursorPosition.GetTextInRun(LogicalDirection.Backward);
string strAfterCursor = cursorPosition.GetTextInRun(LogicalDirection.Forward);
string wordBeforeCursor = strBeforeCursor.Split().Last();
string wordAfterCursor = strAfterCursor.Split().First();
string text = wordBeforeCursor + wordAfterCursor;
SelectedWord = string.Join("", text
.Where(c => char.IsLetter(c))
.ToArray());
base.OnMouseUp(e);
}
}
After that, you can use it in binding like this:
<custom:SelectionRichTextBox
SelectedWord="{Binding SelectedWord, Mode=OneWayToSource}"/>
Can someone please tell me what is wrong with this. I am trying to get text between several characters before caret and the caret."comparable" is never longer than the actual text in the RichTextBox.
This is the code that I have:
int coLen = comparable.Length;
TextPointer caretBack = rtb.CaretPosition.GetPositionAtOffset(coLen,
LogicalDirection.Backward);
TextRange rtbText = new TextRange(caretBack, rtb.CaretPosition);
string text = rtbText.Text;
This returns text = ""
Please help!
This works as expected , I get I a
Piece of code :
RichTextBox rtb = new RichTextBox();
rtb.AppendText("I am adding some texts to the richTextBox");
rtb.CaretPosition = rtb.CaretPosition.DocumentEnd;
int coLen = 3;
TextPointer caretBack = rtb.CaretPosition.GetPositionAtOffset(-coLen);
TextRange rtbText = new TextRange(caretBack, rtb.CaretPosition);
string ttt = rtbText.Text;
EDIT
Here is an MSTest method to explain the behavior of the Caret and reading :
[TestMethod]
public void TestRichtTextBox()
{
RichTextBox rtb = new RichTextBox();
rtb.AppendText("I am adding some texts to the richTextBox");
int offset = 3;
TextPointer beginningPointer = rtb.CaretPosition.GetPositionAtOffset(offset);
TextPointer endPointer = rtb.CaretPosition.DocumentEnd;
TextRange rtbText = new TextRange(beginningPointer, endPointer);
Assert.IsTrue(rtbText.Text == "m adding some texts to the richTextBox\r\n");
// Now we if we keep the same beggining offset but we change the end Offset to go backwards.
beginningPointer = rtb.CaretPosition.GetPositionAtOffset(3);
endPointer = rtb.CaretPosition; // this one is the beginning of the text
rtbText = new TextRange(beginningPointer, endPointer);
Assert.IsTrue(rtbText.Text == "I a");
// Nowe we want to read from the back three characters.
// so we set the end Point to DocumentEnd.
rtb.CaretPosition = rtb.CaretPosition.DocumentEnd;
beginningPointer = rtb.CaretPosition.GetPositionAtOffset(-offset);
endPointer = rtb.CaretPosition; // we already set this one to the end document
rtbText = new TextRange(beginningPointer, endPointer);
Assert.IsTrue(rtbText.Text == "Box");
}
Plus here is a comment from MSDN about the negative index :
offset Type: System.Int32 An offset, in symbols, for which to
calculate and return the position. If the offset is negative, the
position is calculated in the logical direction opposite of that
indicated by the LogicalDirection property.