HtmlDecode of html encoded space is not space

HtmlDecode of html encoded space is not space - c#

Till now I was thinking HttpUtility.HtmlDecode(" ") was a space. But the below code always returns false.
string text = " ";
text = HttpUtility.HtmlDecode(text);
string space = " ";
if (String.Compare(space, text) == 0)
return true;
else
return false;
Same when I try with Server.HtmlDecode()
Why is it so?
Any help would be much appreciated
Thanks,
N

The HTML entity doesn't represent a space, it represents a non-breaking space.
The non-breaking space has character code 160:
string nbspace = "\u00A0";
Also, as Marc Gravell noticed, you have double encoded the code, so you would need to decode it twice to get the character:
string text = "&nbsp;";
text = HttpUtility.HtmlDecode(HttpUtility.HtmlDecode(text));

I'm cleaning the html like this:
var text = WebUtility.HtmlDecode(html)
.Replace("\u00A0", " ") // Replace non breaking space with space.
.Replace(" ", " ") // Shrink multiple spaces into one space.
.Trim();

The HTML of &nbsp; doesn't mean any kind of space. It means, literally, the text - for example, if you were writing HTML that was talking about HTML, you may need to include the text , which you would do by writing the HTML &nbsp;.
If you had:
string text = " ";
then that would decode to a non-breaking space.

Hello I faced the same issue some minutes ago.
I solved it in this way:
string text = " ";
text = Server.HtmlDecode(text).Trim;
so now:
text = "" is true (the Trim at the end eliminates the space)

Related

Misplaced spaces between RTL and LTR strings

I have a web site built with DevExpress controls including Reports. The main language used is Hebrew so the basic direction is RTL. However there is often a need to include English text, LTR, mixed in the Hebrew text. Their web controls support RTL and usually there isn't a problem with mixed texts.
The problem is with their reports that up until recently did not support RTL. Creating a report entirely in Hebrew was not to much of a problem. Trouble starts when we have Hebrew and English mixed, then the text becomes messed up.
I succeeded in fixing that with the following code:
private string FixBiDirectionalString(string textToFix)
{
try
{
char RLE = '\u202B';
char PDF = '\u202C';
char LRM = '\u200E';
char RLM = '\u200F';
StringBuilder sb = new StringBuilder(textToFix.Replace("\r", "").Replace("\n", string.Format("{0}", '\u000A')));
System.Text.RegularExpressions.Regex r = new System.Text.RegularExpressions.Regex("[A-Za-z0-9-+ ]+");
System.Text.RegularExpressions.MatchCollection mc = r.Matches(sb.ToString());
foreach (System.Text.RegularExpressions.Match m in mc)
{
double tmp;
if (m.Value == " ")
continue;
if (double.TryParse(RemoveAcceptedChars(m.Value), out tmp))
continue;
sb.Replace(m.Value, LRM + m.Value + RLM);
}
return RLE + sb.ToString() + PDF;
}
catch { return Text; }
}
private string RemoveAcceptedChars(string p)
{
return p.Replace("+", "").Replace("-", "").Replace("*", "").Replace("/", "");
}
This code is based on code I found in this article XtraReports RTL: bidirectional text drawing in one of the comments.
However I still had a problem with spaces between Hebrew and English words disapearing or being misplaced.
How can that be fixed? (I'm still using an older version of reports that doesn't support RTL).

I fixed it by first trimming the leading and trailing spaces in the string that matched the Regex of the English alphabet and then added spaces accordingly in relation to the unicode elements.
string mTrim = m.Value.Trim();
sb.Replace(m.Value, " " + LRM + mTrim + " " + RLM);
This problem is caused because the spaces are neutral or weakly directional, meaning their direction is dependent on the text they are in and here the text being mixed can cause the space to be misplaced. So this code forces one space to be part the general RTL direction and one to be part of the LTR segment. Then the words are displayed separated properly.

Is there a way to split a literal string to multiple lines with out enter to it \r

In my code i got a very long and complicated string that I want to save as a literal string
At the moment my line is 1200 character long.
And I want to separate the lines in the way which every line it wouldn't be longer than 200 characters
string comlexLine = #"{""A"": {""B"": [{""C"": {""D"": {""E"":""+"" ""/F;
I would like to separate it into shorter lines so the code would be more readable.
At the moment when I enter a new line, because it is a literal string a \r is entering to the string
for example:
string comlexLine = #"{""A"": {""B"": "
+ "[{""C"": {""D"": {""E"":""+"" ""/F;
Console.WriteLine(comlexLine);
would print:
#"{""A"": {""B"": //r "[{""C"": {""D"": {""E"":""+"" ""/F
I prefer not to split it to different constant and also to use a literal string.
Is there any solution?

Use Environment.NewLine instead for including a new line in your string. i would rather make it like below using a back slash \ to escape the extra double quotes
string comlexLine = "{\"A\": {\"B\": " + Environment.NewLine + "[{\"C\": {\"D\": {\"E\":\"+\" \"/F";
Console.WriteLine(comlexLine);

Try not using the literal and escaping the double quotes with a slash.
string comlexLine = "{\"A\": {\"B\": [{\"C\": "
+ "{\"D\": {\"E\":\"+\" \"/F";
If I use that, it doesn't introduce the //r.

Search for a newline Character C#.net

How do i search a string for a newline character? Both of the below seem to be returning -1....!
theJ = line.IndexOf(Environment.NewLine);
OR
theJ = line.IndexOf('\n');
The string it's searching is "yo\n"
the string i'm parsing contains this "printf("yo\n");"
the string i see contained during the comparison is this: "\tprintf(\"yo\n\");"

"yo\n" // output as "yo" + newline
"yo\n".IndexOf('\n') // returns 2
"yo\\n" // output as "yo\n"
"yo\\n".IndexOf('\n') // returns -1
Are you sure you're searching yo\n and not yo\\n?
Edit
Based on your update, I can see that I guessed correctly. If your string says:
printf("yo\n");
... then this does not contain a newline character. If it did, it would look like this:
printf("yo
");
What it actually has is an escaped newline character, or in other words, a backslash character followed by an 'n'. That's why the string you're seeing when you debug is "\tprintf(\"yo\\n\");". If you want to find this character combination, you can use:
line.IndexOf("\\n")
For example:
"\tprintf(\"yo\\n\");" // output as " printf("yo\n");"
"\tprintf(\"yo\\n\");".IndexOf("\\n") // returns 11

Looks like your line does not contain a newline.
If you are using File.ReadAllLines or string.Split on newline, then each line in the returned array will not contain the newline. If you are using StreamReader or one of the classes inheriting from it, the ReadLine method will return the string without the newline.
string lotsOfLines = #"one
two
three";
string[] lines = lotsOfLines.Split('\n');
foreach(string line in lines)
{
Console.WriteLine(line.IndexOf('\n'); // prints -1 three times
}

That should work although in Windows you'll have to search for '\r\n'.
-1 simply means that no enter was found.

It depends what you are trying to do. Both may no be identical on some platforms.
Environment.NewLine returns:
A string containing "\r\n" for non-Unix platforms, or a string
containing "\n" for Unix platforms.
Also:
If you want to search for the \n char (new line on Unix), use \n
If you want to search for the \r\n chars (new line on Windows), use \r\n
If your search depend on the current platform, use Environment.NewLine
If it returns -1 in both cases you mentioned, then you don't have a new line in your string.

When I was in college and I did a WebForms aplication to order referencies.
And the line break/carriage return it was what I used to break a referense.
//Text from TextBox
var text = textBox1.Text;
//Create an array with the text between the carriage returns
var references = text.Split(new string[] { "\r\n", "\r" }, StringSplitOptions.RemoveEmptyEntries);
//Simple OrderBy(Alphabetical)
var ordered = references.ToList<string>().OrderBy(ff => ff);
//Return the entry text ordered alphabetical em with a carriage return between every result
var valueToReturn = String.Join(Environment.NewLine, ordered);
textBox1.Text = valueToReturn;

The Environment.NewLine is not the same as \n. It is a CRLF (\r\n). However, I did try with the \n using IndexOf and my test did find the value. Are you sure what you're searching for is a \n rather than a \r? View your text in hexadecimal format and see what the hex value is.

How to remove leading and trailing spaces from a string

I have the following input:
string txt = " i am a string "
I want to remove space from start of starting and end from a string.
The result should be: "i am a string"
How can I do this in c#?

String.Trim
Removes all leading and trailing white-space characters from the current String object.
Usage:
txt = txt.Trim();
If this isn't working then it highly likely that the "spaces" aren't spaces but some other non printing or white space character, possibly tabs. In this case you need to use the String.Trim method which takes an array of characters:
char[] charsToTrim = { ' ', '\t' };
string result = txt.Trim(charsToTrim);
Source
You can add to this list as and when you come across more space like characters that are in your input data. Storing this list of characters in your database or configuration file would also mean that you don't have to rebuild your application each time you come across a new character to check for.
NOTE
As of .NET 4 .Trim() removes any character that Char.IsWhiteSpace returns true for so it should work for most cases you come across. Given this, it's probably not a good idea to replace this call with the one that takes a list of characters you have to maintain.
It would be better to call the default .Trim() and then call the method with your list of characters.

You can use:
String.TrimStart - Removes all leading occurrences of a set of characters specified in an array from the current String object.
String.TrimEnd - Removes all trailing occurrences of a set of characters specified in an array from the current String object.
String.Trim - combination of the two functions above
Usage:
string txt = " i am a string ";
char[] charsToTrim = { ' ' };
txt = txt.Trim(charsToTrim)); // txt = "i am a string"
EDIT:
txt = txt.Replace(" ", ""); // txt = "iamastring"

I really don't understand some of the hoops the other answers are jumping through.
var myString = " this is my String ";
var newstring = myString.Trim(); // results in "this is my String"
var noSpaceString = myString.Replace(" ", ""); // results in "thisismyString";
It's not rocket science.

txt = txt.Trim();

Or you can split your string to string array, splitting by space and then add every item of string array to empty string.
May be this is not the best and fastest method, but you can try, if other answer aren't what you whant.

text.Trim() is to be used
string txt = " i am a string ";
txt = txt.Trim();

Use the Trim method.

static void Main()
{
// A.
// Example strings with multiple whitespaces.
string s1 = "He saw a cute\tdog.";
string s2 = "There\n\twas another sentence.";
// B.
// Create the Regex.
Regex r = new Regex(#"\s+");
// C.
// Strip multiple spaces.
string s3 = r.Replace(s1, #" ");
Console.WriteLine(s3);
// D.
// Strip multiple spaces.
string s4 = r.Replace(s2, #" ");
Console.WriteLine(s4);
Console.ReadLine();
}
OUTPUT:
He saw a cute dog.
There was another sentence.
He saw a cute dog.

You Can Use
string txt = " i am a string ";
txt = txt.TrimStart().TrimEnd();
Output is "i am a string"

How Can I Put Escape Character before Special Characters using C#?

buildLetter.Append("</head>").AppendLine();
buildLetter.Append("").AppendLine();
buildLetter.Append("<style type="text/css">").AppendLine();
Assume the above contents resides in a file. I want to write a snippet that
removes any line which has empty string "" and put escape character before
the middle quotations. The final output would be:
buildLetter.Append("</head>").AppendLine();
buildLetter.Append("<style type=\"text/css\">").AppendLine();
The outer " .... " is not considered special chars. The special chars may be single
quotation or double quotation.
I could run it via find and replace feature of Visual Studio. However, in my case i
want it to be written in c# or VB.NET
Any help will be appreciated.

Perhaps this does what you want:
string s = File.ReadAllText("input.txt");
string empty = "buildLetter.Append(\"\").AppendLine();" + Environment.NewLine;
s = s.Replace(empty, "");
s = Regex.Replace(s, #"(?<="").*(?="")",
match => { return match.Value.Replace("\"", "\\\""); }
);
Result:
buildLetter.Append("</head>").AppendLine();
buildLetter.Append("<style type=\"text/css\">").AppendLine();

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

HtmlDecode of html encoded space is not space - c#

I'm cleaning the html like this: var text = WebUtility.HtmlDecode(html) .Replace("\u00A0", " ") // Replace non breaking space with space. .Replace(" ", " ") // Shrink multiple spaces into one space. .Trim();

Hello I faced the same issue some minutes ago. I solved it in this way: string text = " "; text = Server.HtmlDecode(text).Trim; so now: text = "" is true (the Trim at the end eliminates the space)

Related

Misplaced spaces between RTL and LTR strings

Is there a way to split a literal string to multiple lines with out enter to it \r

Search for a newline Character C#.net

How to remove leading and trailing spaces from a string

How Can I Put Escape Character before Special Characters using C#?

Categories

Resources