I use this function for updating a RichTextBox in cross thread situations
public void AddRtf(string text)
{
// cross thread allowed
if (rtb.InvokeRequired)
{
rtb.Invoke((MethodInvoker)delegate()
{
AddRtf(text);
});
}
else
{
rtb.Rtf = #"{\rtf1\ansi This is in \b bold\b0.}"; // this works
rtb.Rtf = #"{\rtf1\ansi This "+text+"is in \b bold\b0.}"; // this not
}
}
However, is not working, I can't see the RTF format when passing the "text" argument.
What will be the problem?
In fact, I need a simple solution to update a RichTextBox with COLOR, BOLD, UNDERLINE and some URLs inside a text. I wrote some functions for that such as rtb.AddLink() .AddBold() and so on, including a nice extension for adding URLs but seems more logical to pass RTF format and let the control to update formatting. But this will enforce me to break the text in each point where I need something in BOLD or whatever.
I think that HTML will be more convenient but I need a simple parser, at least simpler than HTMLAgilitypack.
So simple write in one line:
log.write("<font color="red">This is error</font> and this is the link... etc")
Anyone has a simple solution for this?
You need to escape the \ in the second part of the string:
#"{\rtf1\ansi This "+text+"is in \\b bold\\b0.}"
^^ ^^
or use an # again
#"{\rtf1\ansi This "+text+#"is in \b bold\b0.}"
^
Related
I am using a StringBuilder in C# to append some text, which can be English (left to right) or Arabic (right to left)
stringBuilder.Append("(");
stringBuilder.Append(text);
stringBuilder.Append(") ");
stringBuilder.Append(text);
If text = "A", then output is "(A) A"
But if text = "بتث", then output is "(بتث) بتث"
Any ideas?
This is a well-known flaw in the Windows text rendering engine when asked to render Right-To-Left text, Arabic or Hebrew. It has a difficult problem to solve, people often fall back to Western words and punctuation when there is no good alternative word available in the language. Brand and company names for example. The renderer tries to guess at the proper render order by looking at the code points, with characters in the Latin character set clearly having to be rendered left-to-right.
But it fumbles at punctuation, with brackets being the most visible. You have to be explicit about it so it knows what to do, you must use the Unicode Right-to-left mark, U+200F or \u200f in C# code. Conversely, use the Left-to-right mark if you know you need LTR rendering, U+200E.
Use AppendFormat instead of just Append:
stringBuilder.AppendFormat("({0}) {0}", text)
This may fix the issue, but it may - you need to look at the text value - it probably has LTR/RTL markers characters embedded. These need to either be removed or corrected in the value.
I had a similar issue and I managed to solve it by creating a function that checks each Char in Unicode. If it is from page FE then I add 202C after it as shown below. Without this it gets RTL and LTF mixed for what I wanted.
string us = string.Format("\uFE9E\u202C\uFE98\u202C\uFEB8\u202C\uFEC6\u202C\uFEEB\u202C\u0020\u0660\u0662\u0664\u0668 Aa1");
I have the following GetText() function, which relates to my question, which is being called in the following place:
myGridView.DataSource = stuff.Select(s => new
{
//...some stuff here
f.Text = GetText();
}
myGridView.DataBind();
GetText looks like the following:
private void string GetText()
{
StringBuilder sb = new StringBuilder();
sb.Append("<abbr title=\"Testing\">");
sp.Append("This is the Text that I want to display");
sb.Append("<\abbr>");
}
So essentially, all I want to do is be able to have the following HTML on my webpage:
<abbr title="Testing">This is the text that I want to display</abbr>
However, there is a mysterious tag that shows up. In google chrome, I looked at th the console and I saw that it looked like this:
<abbr title="Testing">This is the text that I want to display<bbr></abbr>
There is an extraneous tag that is generated when I add in the line sb.Append("<\abbr>");
This is fixed when I remove that line, but I would like to find a better solution since this makes the code look awkward.
I also tried doing the following instead of the multi-lined sb.Appends() but the extra text is still shown.
sb.Append(string.Format("<abbr title=\"testing\">{0}<\abbr>",Text));
NOTE: Assume that Text is a string which equals the text that I want to display.
Your end tag is wrong. It should be </abbr> not <\abbr>.
<\abbr> will include an escaped a (which means nothing), inside the <bbr>. Chrome apparently closes the <abbr> tag. So the superfluous tag is actually </abbr> not <bbr>.
Use
sb.Append("</abbr>");
probably the \abbr is interpreted as an escape char \a followed by the bbr> text
By the way, looking at the escape sequences on MSDN it seems that \a is the escape sequence for the BELL character. (No, I don't think that you should hear a beep from your PC)
Here's how your method shoul like
private string GetText()
{
StringBuilder sb = new StringBuilder();
sb.Append("<abbr title=\"Testing\">");
sb.Append("This is the Text that I want to display");
sb.Append("</abbr>");
return sb.ToString();
}
"\" is an escape character :)
Just a guess but is it because you have a backslash instead of slash in your closing tag?
sb.Append("<\abbr>");
vs
sb.Append("</abbr>");
As others have said, this is the issue:
sb.Append("<\abbr>");
... but it's worth looking at exactly what's happening.
That's appending <, then a U+0007 (the "alert" character, or bell), then bbr>. If you'd done this with a character which wasn't a valid escape character (e.g. "<\zfoo>") then you'd have received a compile-time error. In some other cases you might have been able to see it in the HTML. It's only because you picked a completely invisible control character that it was harder to see.
As an aside, I don't think I've ever seen a C# program which needed \a. I wish it wasn't a valid escape character - along with the \x hex sequence...
It would be great if someone could provide me the Regular expression for the following string.
Sample 1: <div>abc</div><br>
Sample 2: <div>abc</div></div></div></div></div><br>
As you can see in the samples provided above, I need to match the string no matter how many number of </div> occurs. If there occurs any other string between </div> and <br>, say like this <div>abc</div></div></div>DEF</div></div><br> OR <div>abc</div></div></div></div></div>DEF<br>, then the Regex should not match.
Thanks in advance.
Try this:
<div>([^<]+)(?:<\/div>)*<br>
As seen on rubular
Notes:
This only works if there are not tags in the abc part (or anything that has a < symbol).
You might want to use start and end of string anchors (^<div>([^<]+)(?:<\/div>)*<br>$ if you want your string to match the pattern exactly.
If you want to allow the abc part to be empty, use * instead of +
That being said, you should be wary of using regex to parse HTML.
In this example, you can use regex because you are parsing a (hopefully) known, regular subset of HTML. But a more robust solution (ie: an [X]HTML parser like HtmlAgilityPack) is preferred when it comes to parsing HTML.
You need to use a real parser. Things like infinitely nested tags can't be handled via regex.
You could also include a named group in the the expression, e.g.:
<div>(?<text>[^<]*)(?:<\/div>)*<br>
Implemented in C#:
var regex = new Regex(#"<div>(?<text>[^<]*)(?:<\/div>)*<br>");
Func<Match, string> getGroupText = m => (m.Success && m.Groups["text"] != null) ? m.Groups["text"].Value : null;
Func<string, string> getText = s => getGroupText(regex.Match(s));
Console.WriteLine(getText("<div>abc</div><br>"));
Console.WriteLine(getText("<div>123</div></div></div></div></div><br>"));
NullUserException's answer is good. Here are a couple of questions, and variations, depending on what you want.
Do you want to prevent anything from occurring before the open div tag? If so, keep the ^ at the beginning of the regex. If not, drop it.
The rest of this post refers to the following section of the regex:
([^<]+?)
Do you want to capture the contents of the div, or just know that it matches your form? To capture, leave it as is. If you don't need to capture, drop the parentheses from the above.
Do you want to match if there is nothing inside the div? If so change the + in the above to *
Finally, although it will work fine, you don't need the ? in the above.
I think, this regex is more flexible:
<div\b[^><]*+>(?>.*?</div>)(?:\s*+</div>)*+\s*+<br(?:\s*+/)?>
I don't include the ^ and $ in the beginning and the end of my regex because we cannot assure that your sample will always in a single line.
I'm currently beating my head against a wall trying to figure this out. But long story short, I'd like to convert a string between 2 UTF-8 '\u0002' to bold formating. This is for an IRC client that I'm working on so I've been running into these quite a bit. I've treid regex and found that matching on the rtf as ((\'02) works to catch it, but I'm not sure how to match the last character and change it to \bclear or whatever the rtf formating close is.
I can't exactly paste the text I'm trying to parse because the characters get filtered out of the post. But when looking at the char value its an int of 2.
Here's an attempt to paste the offending text:
[02:34] test test
You could use either
rtb.Rtf = Regex.Replace(rtb.Rtf, #"\\'02\s*(.*?)\s*\\'02", #"\b $1 \b0");
or
rtb.Rtf = Regex.Replace(rtb.Rtf, #"\\'02\s*(.*?)\s*\\'02", #"\'02 \b $1 \b0 \'02");
depending on whether you want to keep the \u0002s in there.
The \b and \b0 turn the bold on and off in RTF.
I don't have a test case, but you could also probably use the Clipboard class's GetText method with the Unicode TextDataFormat. Basically, I think you could place the input in the clipboard and get it out in a different format (works for RTF and the like). Here's MS's demo code (not applicable directly, but demonstrates the API):
// Demonstrates SetText, ContainsText, and GetText.
public String SwapClipboardHtmlText(String replacementHtmlText)
{
String returnHtmlText = null;
if (Clipboard.ContainsText(TextDataFormat.Html))
{
returnHtmlText = Clipboard.GetText(TextDataFormat.Html);
Clipboard.SetText(replacementHtmlText, TextDataFormat.Html);
}
return returnHtmlText;
}
Of course, if you do that, you probably want to save and restore what was in the clipboard, or else you may upset your users!
I'm using an HTML sanitizing whitelist code found here:
http://refactormycode.com/codes/333-sanitize-html
I needed to add the "font" tag as an additional tag to match, so I tried adding this condition after the <img tag check
if (tagname.StartsWith("<font"))
{
// detailed <font> tag checking
// Non-escaped expression (for testing in a Regex editor app)
// ^<font(\s*size="\d{1}")?(\s*color="((#[0-9a-f]{6})|(#[0-9a-f]{3})|red|green|blue|black|white)")?(\s*face="(Arial|Courier New|Garamond|Georgia|Tahoma|Verdana)")?\s*?>$
if (!IsMatch(tagname, #"<font
(\s*size=""\d{1}"")?
(\s*color=""((#[0-9a-f]{6})|(#[0-9a-f]{3})|red|green|blue|black|white)"")?
(\s*face=""(Arial|Courier New|Garamond|Georgia|Tahoma|Verdana)"")?
\s*?>"))
{
html = html.Remove(tag.Index, tag.Length);
}
}
Aside from the condition above, my code is almost identical to the code in the page I linked to. When I try to test this in C#, it throws an exception saying "Not enough )'s". I've counted the parenthesis several times and I've run the expression through a few online Javascript-based regex testers and none of them seem to tell me of any problems.
Am I missing something in my Regex that is causing a parenthesis to escape? What do I need to do to fix this?
UPDATE
After a lot of trial and error, I remembered that the # sign is a comment in regexes. The key to fixing this is to escape the # character. In case anyone else comes across the same problem, I've included my fix (just escaping the # sign)
if (tagname.StartsWith("<font"))
{
// detailed <font> tag checking
// Non-escaped expression (for testing in a Regex editor app)
// ^<font(\s*size="\d{1}")?(\s*color="((#[0-9a-f]{6})|(#[0-9a-f]{3})|red|green|blue|black|white)")?(\s*face="(Arial|Courier New|Garamond|Georgia|Tahoma|Verdana)")?\s*?>$
if (!IsMatch(tagname, #"<font
(\s*size=""\d{1}"")?
(\s*color=""((\#[0-9a-f]{6})|(\#[0-9a-f]{3})|red|green|blue|black|white)"")?
(\s*face=""(Arial|Courier\sNew|Garamond|Georgia|Tahoma|Verdana)"")?
\s*?>"))
{
html = html.Remove(tag.Index, tag.Length);
}
}
Your IsMatch Method is using the option RegexOptions.IgnorePatternWhitespace, that allows you to put comments inside the regular expressions, so you have to scape the # chatacter, otherwise it will be interpreted as a comment.
if (!IsMatch(tagname,#"<font(\s*size=""\d{1}"")?
(\s*color=""((\#[0-9a-f]{6})|(\#[0-9a-f]{3})|red|green|blue|black|white)"")?
(\s*face=""(Arial|Courier New|Garamond|Georgia|Tahoma|Verdana)"")?
\s?>"))
{
html = html.Remove(tag.Index, tag.Length);
}
I don't see anything obviously wrong with the regex. I would try isolating the problem by removing pieces of the regex until the problem goes away and then focus on the part that causes the issue.
It works fine for me... what version of the .NET framework are you using, and what is the exact exception?
Also - what does you IsMatch method look like? is this just a pass-thru to Regex.IsMatch?
[update] The problem is that the OP's example code didn't show they are using the IgnorePatternWhitespace regex option; with this option it doesn't work; without this option (i.e. as presented) the code is fine.
Download Chris Sells Regex Designer. Its a great free tool for testing .NET regex's.
I'm not sure this regex is going to do what you want because it depends on the order of the attributes matching what you have in the regex. If for example face="Arial" preceeded size="5" then face= wouldn't match.
There are some escaping problems in your regex. You need to escape your " with \ You need to escape your # with \ You need to use \s in Courier New instead of just the space. You need to use the RegexOptions.IgnorePatternWhitespace and RegexOptions.IgnoreCase options.
<font
(\s+size=\"\d{1}\")?
(\s+color=\"((\#[0-9a-f]{6})|(\#[0-9a-f]{3})|red|green|blue|black|white)\")?
(\s+face=\"(Arial|Courier\sNew|Garamond|Georgia|Tahoma|Verdana)\")?
The # characters are what was causing the exception with the somewhat misleading missing ) message.