Strange tag being generated in C#

Strange tag being generated in C# - c#

I have the following GetText() function, which relates to my question, which is being called in the following place:
myGridView.DataSource = stuff.Select(s => new
{
//...some stuff here
f.Text = GetText();
}
myGridView.DataBind();
GetText looks like the following:
private void string GetText()
{
StringBuilder sb = new StringBuilder();
sb.Append("<abbr title=\"Testing\">");
sp.Append("This is the Text that I want to display");
sb.Append("<\abbr>");
}
So essentially, all I want to do is be able to have the following HTML on my webpage:
<abbr title="Testing">This is the text that I want to display</abbr>
However, there is a mysterious tag that shows up. In google chrome, I looked at th the console and I saw that it looked like this:
<abbr title="Testing">This is the text that I want to display<bbr></abbr>
There is an extraneous tag that is generated when I add in the line sb.Append("<\abbr>");
This is fixed when I remove that line, but I would like to find a better solution since this makes the code look awkward.
I also tried doing the following instead of the multi-lined sb.Appends() but the extra text is still shown.
sb.Append(string.Format("<abbr title=\"testing\">{0}<\abbr>",Text));
NOTE: Assume that Text is a string which equals the text that I want to display.

Your end tag is wrong. It should be </abbr> not <\abbr>.
<\abbr> will include an escaped a (which means nothing), inside the <bbr>. Chrome apparently closes the <abbr> tag. So the superfluous tag is actually </abbr> not <bbr>.

Use
sb.Append("</abbr>");
probably the \abbr is interpreted as an escape char \a followed by the bbr> text
By the way, looking at the escape sequences on MSDN it seems that \a is the escape sequence for the BELL character. (No, I don't think that you should hear a beep from your PC)

Here's how your method shoul like
private string GetText()
{
StringBuilder sb = new StringBuilder();
sb.Append("<abbr title=\"Testing\">");
sb.Append("This is the Text that I want to display");
sb.Append("</abbr>");
return sb.ToString();
}
"\" is an escape character :)

Just a guess but is it because you have a backslash instead of slash in your closing tag?
sb.Append("<\abbr>");
vs
sb.Append("</abbr>");

As others have said, this is the issue:
sb.Append("<\abbr>");
... but it's worth looking at exactly what's happening.
That's appending <, then a U+0007 (the "alert" character, or bell), then bbr>. If you'd done this with a character which wasn't a valid escape character (e.g. "<\zfoo>") then you'd have received a compile-time error. In some other cases you might have been able to see it in the HTML. It's only because you picked a completely invisible control character that it was harder to see.
As an aside, I don't think I've ever seen a C# program which needed \a. I wish it wasn't a valid escape character - along with the \x hex sequence...

Related

How to produce a soft return using C#.net

I know this is kind of easy question but i cant seem to find it anywhere. Is there someone out there who knows how to create a soft return inside a set of text using C#.net?
I need to print soft return to a text file/xml file. this text file will be generated using c#.net. you could verify if the answer is correct if you use NOTEPAD++ then enable the option to “View>Show Symbol > Show End of Line” then you will see a symbol like this:
Thanks in advance :)

Not sure what you mean by a soft return. A quick Google search says it's a non-stored line break typically due to word wrapping in which case you wouldn't actually put this in a string, it would only be relevant when the string was rendered for display.
To put a carriage return and/or line feed in the string you would use:
string s = "line one\r\nline two";
And for further reference, here are the other escape codes that you can use.
Link (MSDN Blogs)
In response to your edit
The LF that you see can be represented with \n in a string. Obviously you have a specific line ending sequence that you need to represent. If you were to use Environment.NewLine that is going to give you different results on different platforms.

var message = $"Tom{Convert.ToChar(10)}Harry";
Results in:
Tom
Harry
With just a line feed between.

Lke already mentioned you can use Enviroment.NewLine but I am not sure if that i what you want or if you are actually trying to append a ASCII 141 to your string as mentioned in the comments.
You can add ASCII chr sequences to your string like this.
var myString = new StringBuilder("Foo");
myString.Append((char)141);

Padding a string using PadRight method

I am trying to add spaces to end of a string in C#:
Trip_Name1.PadRight(20);
Also tried:
Trip_Name1.PadRight(20,' ');
None of this seems to work. However I can pad the string with any other character. Why?
I should have been more specific, here is full code:
lnk_showmatch_1.Text = u_trip.Trip_Name1.PadRight(20,' ');

String are immutable, they cannot be changed. PadRight returns a new instance of the string padded, not change the one it was called from. What you want is this:
Trip_Name1 = Trip_Name1.PadRight(20,' ');
There is a great discussion on this StackOverflow question as to why strings are immutable.
EDIT:
None of this seems to work. However I can pad the string with any other character.
Are you actually re-assigning it like the example above? If that is the case - then without more detail I can only think of the following:
If you are storing this in a database and retrieving it, some databases with the correct settings may "Trim" for you.
You have logic somewhere else that is trimming the white-spaces. This is common when dealing with user input.
EDIT 2:
I should have been more specific
I'm going to take a wild guess based on your naming conventions that you are dealing with HTML / ASP.NET. In most cases, in HTML - white space is collapsed. For example:
<div><a>Hello World</a></div>
<div><a>Hello World</a></div>
Both of the a tags will render the same because the white-space is being collapsed. If you are indeed working with HTML - that is likely your reason and why the padding works for all other characters. If you do a view-source of the markup rendered - does it contain the additional white spaces?
If you wanted to keep the whitespaces, try applying a CSS style on your element called white-space and set it to pre. For example:
<a style="white-space:pre">hello world </a>
That will cause the white-space to be preserved. Keep in mind that using white space like this has disadvantages. Browsers don't render them identically, etc. I wouldn't use this for layout purposes. Consider using CSS and something like min-width instead.

Keep in mind, that way won't work for any string manipulation functionality because string is immutable. They just return a new string rather than updating the existing instance.
PadRight returns a new string that left-aligns the characters in
this string by padding them on the right with a specified Unicode
character, for a specified total length.
string Trip_Name1 = Trip_Name1.PadRight(20,' ');
EDIT:
Your control seems to be trimming the ending spaces. So, try to set the padding for the control rather than for the text.

Regex for a string

It would be great if someone could provide me the Regular expression for the following string.
Sample 1: <div>abc</div><br>
Sample 2: <div>abc</div></div></div></div></div><br>
As you can see in the samples provided above, I need to match the string no matter how many number of </div> occurs. If there occurs any other string between </div> and <br>, say like this <div>abc</div></div></div>DEF</div></div><br> OR <div>abc</div></div></div></div></div>DEF<br>, then the Regex should not match.
Thanks in advance.

Try this:
<div>([^<]+)(?:<\/div>)*<br>
As seen on rubular
Notes:
This only works if there are not tags in the abc part (or anything that has a < symbol).
You might want to use start and end of string anchors (^<div>([^<]+)(?:<\/div>)*<br>$ if you want your string to match the pattern exactly.
If you want to allow the abc part to be empty, use * instead of +
That being said, you should be wary of using regex to parse HTML.
In this example, you can use regex because you are parsing a (hopefully) known, regular subset of HTML. But a more robust solution (ie: an [X]HTML parser like HtmlAgilityPack) is preferred when it comes to parsing HTML.

You need to use a real parser. Things like infinitely nested tags can't be handled via regex.

You could also include a named group in the the expression, e.g.:
<div>(?<text>[^<]*)(?:<\/div>)*<br>
Implemented in C#:
var regex = new Regex(#"<div>(?<text>[^<]*)(?:<\/div>)*<br>");
Func<Match, string> getGroupText = m => (m.Success && m.Groups["text"] != null) ? m.Groups["text"].Value : null;
Func<string, string> getText = s => getGroupText(regex.Match(s));
Console.WriteLine(getText("<div>abc</div><br>"));
Console.WriteLine(getText("<div>123</div></div></div></div></div><br>"));

NullUserException's answer is good. Here are a couple of questions, and variations, depending on what you want.
Do you want to prevent anything from occurring before the open div tag? If so, keep the ^ at the beginning of the regex. If not, drop it.
The rest of this post refers to the following section of the regex:
([^<]+?)
Do you want to capture the contents of the div, or just know that it matches your form? To capture, leave it as is. If you don't need to capture, drop the parentheses from the above.
Do you want to match if there is nothing inside the div? If so change the + in the above to *
Finally, although it will work fine, you don't need the ? in the above.

I think, this regex is more flexible:
<div\b[^><]*+>(?>.*?</div>)(?:\s*+</div>)*+\s*+<br(?:\s*+/)?>
I don't include the ^ and $ in the beginning and the end of my regex because we cannot assure that your sample will always in a single line.

.NET string IndexOf unexpected result

A string variable str contains the following somewhere inside it: se\">
I'm trying to find the beginning of it using:
str.IndexOf("se\\\">")
which returns -1
Why isn't it finding the substring?
Note: due to editing the snippet showed 5x \ for a while, the original had 3 in a row.

Your code is in fact searching for 'se\\">'. When searching for strings including backslashes I usually find it easier to use verbatim strings:
str.IndexOf(#"se\"">")
In this case you also have a quote in the search string, so there is still some escaping, but I personally find it easier to read.
Update: my answer was based on the edit that introduced extra slashes in the parameter to the IndexOf call. Based on current version, I would place my bet on str simply not containing the expected character sequence.
Update 2:
Based on the comments on this answer, it seems to be some confusion regarding the role of the '\' character in the strings. When you inspect a string in the Visual Studio debugger, it will be displayed with escaping characters.
So, if you have a text box and type 'c:\' in it, inspecting the Text property in the debugger will show 'c:\\'. An extra backslash is added for escaping purposes. The actual string content is still 'c:\' (which can be verified by checking the Length property of the string; it will be 3, not 4).
If we take the following string (taken from the comment below)
" '<em
class=\"correct_response\">a
night light</em><br
/><br /><table
width=\"100%\"><tr><td
class=\"right\">Ingrid</td></tr></table>')"
...the \" sequences are simply escaped quotation marks; the backslashes are not part of the string content. So, you are in fact looking for 'se">', not 'se\">'. Either of these will work:
str.IndexOf(#"se"">"); // verbatim string; escape quotation mark by doubling it
str.IndexOf("se\">"); // regular string; escape quotation mark using backslash

This works:
string str = "<case\\\">";
int i = str.IndexOf("se\\\">"); // i = 3
Maybe you're not correctly escaping one of the two strings?
EDIT there's an extra couple of \ in the string you are searching for.

Maybe the str variable does not actually contain the backslash.
It may be just that when you mouse over the variable while debugging, the debugger tooltip will show the escape character.
e.g. If you put a breakpoint after this assignment
string str = "123\"456";
the tooltip will show 123\"456 and not 123"456.
However if you click on the visualize icon, you will get the correct string 123"456

Following code:
public static void RunSnippet()
{
string s = File.ReadAllText (#"D:\txt.txt");
Console.WriteLine (s);
int i = s.IndexOf("se\\\">");
Console.WriteLine (i);
}
Gives following output:
some text before se\"> some text after
17
Seems like working to me...

TextBox2.Text = TextBox1.Text.IndexOf("se\"">")
seems to work in VB.

DoubleQuotes within a string need to be specified like "" Also consider using verbatim strings - So an example would be
var source = #"abdefghise\"">jklmon";
Console.WriteLine(source.IndexOf(#"se\"">")); // returns 8

If you are looking for se\">
then
str.IndexOf(#"se\"">")
is less error-prone. Note the double "" and single \
Edit, after the comment: it seems like the string may contain ecaping itself, in which case in se\"> the \" was an escaped quote, so the literal text is simply se"> and the string to use is Indexof("se\">")

Easiest way to format rtf/unicode/utf-8 in a RichTextBox?

I'm currently beating my head against a wall trying to figure this out. But long story short, I'd like to convert a string between 2 UTF-8 '\u0002' to bold formating. This is for an IRC client that I'm working on so I've been running into these quite a bit. I've treid regex and found that matching on the rtf as ((\'02) works to catch it, but I'm not sure how to match the last character and change it to \bclear or whatever the rtf formating close is.
I can't exactly paste the text I'm trying to parse because the characters get filtered out of the post. But when looking at the char value its an int of 2.
Here's an attempt to paste the offending text:
[02:34] test test

You could use either
rtb.Rtf = Regex.Replace(rtb.Rtf, #"\\'02\s*(.*?)\s*\\'02", #"\b $1 \b0");
or
rtb.Rtf = Regex.Replace(rtb.Rtf, #"\\'02\s*(.*?)\s*\\'02", #"\'02 \b $1 \b0 \'02");
depending on whether you want to keep the \u0002s in there.
The \b and \b0 turn the bold on and off in RTF.

I don't have a test case, but you could also probably use the Clipboard class's GetText method with the Unicode TextDataFormat. Basically, I think you could place the input in the clipboard and get it out in a different format (works for RTF and the like). Here's MS's demo code (not applicable directly, but demonstrates the API):
// Demonstrates SetText, ContainsText, and GetText.
public String SwapClipboardHtmlText(String replacementHtmlText)
{
String returnHtmlText = null;
if (Clipboard.ContainsText(TextDataFormat.Html))
{
returnHtmlText = Clipboard.GetText(TextDataFormat.Html);
Clipboard.SetText(replacementHtmlText, TextDataFormat.Html);
}
return returnHtmlText;
}
Of course, if you do that, you probably want to save and restore what was in the clipboard, or else you may upset your users!

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.