WebUtility.HtmlDecode vs HttpUtilty.HtmlDecode

WebUtility.HtmlDecode vs HttpUtilty.HtmlDecode - c#

I was using WebUtilty.HtmlDecode to decode HTML. It turns out that it doesn't decode properly, for example, – is supposed to decode to a "–" character, but WebUtilty.HtmlDecode does not decode it. HttpUtilty.HtmlDecode, however, does.
Debug.WriteLine(WebUtility.HtmlDecode("–"));
Debug.WriteLine(HttpUtility.HtmlDecode("–"));
> –
> –
The documentation for both of these is the same:
Converts a string that has been HTML-encoded for HTTP transmission into a decoded string.
Why are they different, which one should I be using, and what will change if I switch to WebUtility.HtmlDecode to get "–" to decode correctly?

The implementation of the two methods are indeed different on Windows Phone.
WebUtility.HtmlDecode:
public static void HtmlDecode(string value, TextWriter output)
{
if (value != null)
{
if (output == null)
{
throw new ArgumentNullException("output");
}
if (!StringRequiresHtmlDecoding(value))
{
output.Write(value);
}
else
{
int length = value.Length;
for (int i = 0; i < length; i++)
{
bool flag;
uint num4;
char ch = value[i];
if (ch != '&')
{
goto Label_01B6;
}
int num3 = value.IndexOfAny(_htmlEntityEndingChars, i + 1);
if ((num3 <= 0) || (value[num3] != ';'))
{
goto Label_01B6;
}
string entity = value.Substring(i + 1, (num3 - i) - 1);
if ((entity.Length <= 1) || (entity[0] != '#'))
{
goto Label_0188;
}
if ((entity[1] == 'x') || (entity[1] == 'X'))
{
flag = uint.TryParse(entity.Substring(2), NumberStyles.AllowHexSpecifier, NumberFormatInfo.InvariantInfo, out num4);
}
else
{
flag = uint.TryParse(entity.Substring(1), NumberStyles.Integer, NumberFormatInfo.InvariantInfo, out num4);
}
if (flag)
{
switch (_htmlDecodeConformance)
{
case UnicodeDecodingConformance.Strict:
flag = (num4 < 0xd800) || ((0xdfff < num4) && (num4 <= 0x10ffff));
goto Label_0151;
case UnicodeDecodingConformance.Compat:
flag = (0 < num4) && (num4 <= 0xffff);
goto Label_0151;
case UnicodeDecodingConformance.Loose:
flag = num4 <= 0x10ffff;
goto Label_0151;
}
flag = false;
}
Label_0151:
if (!flag)
{
goto Label_01B6;
}
if (num4 <= 0xffff)
{
output.Write((char) num4);
}
else
{
char ch2;
char ch3;
ConvertSmpToUtf16(num4, out ch2, out ch3);
output.Write(ch2);
output.Write(ch3);
}
i = num3;
goto Label_01BD;
Label_0188:
i = num3;
char ch4 = HtmlEntities.Lookup(entity);
if (ch4 != '\0')
{
ch = ch4;
}
else
{
output.Write('&');
output.Write(entity);
output.Write(';');
goto Label_01BD;
}
Label_01B6:
output.Write(ch);
Label_01BD:;
}
}
}
}
HttpUtility.HtmlDecode:
public static string HtmlDecode(string html)
{
if (html == null)
{
return null;
}
if (html.IndexOf('&') < 0)
{
return html;
}
StringBuilder sb = new StringBuilder();
StringWriter writer = new StringWriter(sb, CultureInfo.InvariantCulture);
int length = html.Length;
for (int i = 0; i < length; i++)
{
char ch = html[i];
if (ch == '&')
{
int num3 = html.IndexOfAny(s_entityEndingChars, i + 1);
if ((num3 > 0) && (html[num3] == ';'))
{
string entity = html.Substring(i + 1, (num3 - i) - 1);
if ((entity.Length > 1) && (entity[0] == '#'))
{
try
{
if ((entity[1] == 'x') || (entity[1] == 'X'))
{
ch = (char) int.Parse(entity.Substring(2), NumberStyles.AllowHexSpecifier, CultureInfo.InvariantCulture);
}
else
{
ch = (char) int.Parse(entity.Substring(1), CultureInfo.InvariantCulture);
}
i = num3;
}
catch (FormatException)
{
i++;
}
catch (ArgumentException)
{
i++;
}
}
else
{
i = num3;
char ch2 = HtmlEntities.Lookup(entity);
if (ch2 != '\0')
{
ch = ch2;
}
else
{
writer.Write('&');
writer.Write(entity);
writer.Write(';');
continue;
}
}
}
}
writer.Write(ch);
}
return sb.ToString();
}
Interestingly, WebUtility doesn't exist on WP7. Also, the WP8 implementation of WebUtility is identical to the desktop one. The desktop implementation of HttpUtility.HtmlDecode is just a wrapper around WebUtility.HtmlDecode. Last but not least, Silverlight 5 has the same implementation of HttpUtility.HtmlDecode as Windows Phone, and does not implement WebUtility.
From there, I can venture a guess: since the Windows Phone 7 runtime is based on Silverlight, WP7 inherited of the Silverlight version of HttpUtility.HtmlDecode, and WebUtility wasn't present. Then came WP8, whose runtime is based on WinRT. WinRT brought WebUtility, and the old version of HttpUtility.HtmlDecode was kept to ensure the compatibility with the legacy WP7 apps.
As to know which one you should use... If you want to target WP7 then you have no choice but to use HttpUtility.HtmlDecode. If you're targeting WP8, then just pick the method whose behavior suits your needs the best. WebUtility is probably the future-proof choice, just in case Microsoft decides to ditch the Silverlight runtime in an upcoming version of Windows Phone. But I'd just go with the practical choice of picking HttpUtility to not have to worry about manually supporting the example you've put in your question.

The methods do exactly the same. Moreover if you try to decompile them the implementations look like one was just copied from another.
The difference is only intended use. HttpUtility is contained in the System.Web assembly and is expected to be used in ASP.net applications which are built over this assembly. WebUtility is contained in the System assembly referenced by nearly all applications and is provided for more general purpose or client use.

Just to notify others who will find this in search. Use any function that mentioned in the question, but never use Windows.Data.Html.HtmlUtilities.ConvertToText(string input). It's 70 times slower than WebUtilty.HtmlDecode and produce crashes! Crash will be named as mshtml!IEPeekMessage in the DevCenter. It looks like this function call InternetExplorer to convert the string. Just avoid it.

Related

C# for case in string(easy)

so I have this code. I need to generate a for loop that checks all the characters in the string and checks if they are all valid(So numbers from 0->7). But I don't know how to write it, I tried something but it didn't work. Here are the examples:user enters: 77, code works, user enters 99, code doesn't work, user enters 5., code doesn't work, etc..
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace NALOGA1
{
class Program
{
static string decToOct(int stevilo)//v mojon primere 7
{
string izhod = "";
//7>0 DRŽI
while (stevilo > 0)
{
//izhodi se dodeli ostanek deljenja z 8 keri se spremeni v string
izhod = (stevilo % 8) + izhod;
//7/8;
stevilo /= 8;
}
return izhod;
}
static int Octtodesetisko(string stevilo)
{
double vsota = 0;
for (int i = stevilo.Length - 1; i >= 0; i--)
{
int stevka = stevilo[i] - '0';
vsota += (stevka * Math.Pow(8, i));
}
return (int)vsota;
}
static void Main(string[] args)
{
//3 podprogram-in progress
string prvastevilka = Console.ReadLine();
int prvasprememba = Int32.Parse(prvastevilka);
if (prvasprememba > 0)
{
Console.WriteLine(decToOct(prvasprememba));
}
else
{
Console.WriteLine("Napaka");
}
string drugastevilka = Console.ReadLine();
int drugasprememba = Octtodesetisko(drugastevilka);
foreach (char znak in drugastevilka)
{
if(znak!=1 || znak!=2 || znak!=3 || znak!=4 || znak!=5 || znak!=6 || znak!=7)
{
Console.WriteLine("Napaka");
}
else
{
Console.WriteLine("dela :D");
}
}
Console.ReadKey();
}
}
}

Personally, I would take advantage of the LINQ Enumerable.All method to express this in a very concise and readable way:
if (str.Any() && str.All(c => c >= '0' && c <= '7'))
{
Console.WriteLine("good");
}
else
{
Console.WriteLine("bad");
}
EDIT: No LINQ
It's not hard to translate what the LINQ Enumerable.All method does to a normal loop. It's just more verbose:
bool isValid = true;
foreach (char c in str)
{
if (c < '0' || c > '7')
{
isValid = false;
break;
}
}
if (str.Length != 0 && isValid)
{
Console.WriteLine("good");
}
else
{
Console.WriteLine("bad");
}

Firstly, there seems to be a mistake in the line
if(znak!=1 || znak!=2 || znak!=3 || znak!=4 || znak!=5 || znak!=6 || znak!=7)
I guess it should read
if(znak!='1' || znak!='2' || znak!='3' || znak!='4' || znak!='5' || znak!='6' || znak!='7')
which should be compressed to
if (znak >= '0' && znak <= '7')
You can use linq instead of the for loop here like this:
if (drugastevilka.All(c => c >= '0' && c <= '7')
Console.WriteLine("dela :D");
else
Console.WriteLine("Napaka");
But the best solution is probably to use a regular expression:
Regex regex = new Regex("^[0-7]+$");
if (regex.IsMatch(drugastevilka))
Console.WriteLine("dela :D");
else
Console.WriteLine("Napaka");
Edit: the linq solution shown accepts empty strings, the regex (as shown) needs at least 1 character. Exchange the + with a * and it will accept empty strings, too. But I don't think you want to accept empty strings.

You are messing up with the datatype
Can you try with below code
static string decToOct(int stevilo)//v mojon primere 7
{
int izhod = 0;
//7>0 DRŽI
while (stevilo > 0)
{
//izhodi se dodeli ostanek deljenja z 8 keri se spremeni v string
izhod = (stevilo % 8) + izhod;
//7/8;
stevilo /= 8;
}
return (izhod.ToString());
}

What about something like this?
class Program
{
static void Main(string[] args)
{
string someString = "1234567";
string someOtherString = "1287631";
string anotherString = "123A6F2";
Console.WriteLine(IsValidString(someString));
Console.WriteLine(IsValidString(someOtherString));
Console.WriteLine(IsValidString(anotherString));
Console.ReadLine();
}
public static bool IsValidString(string str)
{
bool isValid = true;
char[] splitString = str.ToCharArray(); //get an array of each character
for (int i = 0; i < splitString.Length; i++)
{
try
{
double number = Char.GetNumericValue(splitString[i]); //try to convert the character to a double (GetNumericValue returns a double)
if (number < 0 || number > 7) //we get here if the character is an int, then we check for 0-7
{
isValid = false; //if the character is invalid, we're done.
break;
}
}
catch (Exception) //this will hit if we try to convert a non-integer character.
{
isValid = false;
break;
}
}
return isValid;
}
}
IsValidString() takes a string, converts it to a Char array, then checks each value as such:
Get the numeric value
Check if the value is between 0-7
GetNumericValue will fail on a non-integer character, so we wrap it in a try/catch - if we hit an exception we know that isValid = false, so we break.
If we get a valid number, and it's not between 0-7 we also know that isValid = false, so we break.
If we make it all the way through the list, the string is valid.
The sample given above returns:
IsValidString(someString) == true
IsValidString(someOtherString) == false
IsValidString(anotherString) == false

Check the difference

Hi I'm building a program that will look at files at have been placed into SVN and show what files have been changed in each commit.
As i'm only wanting to show the file path. if the path is that same I only want to show the difference.
example:
First file path is:
/GEM4/trunk/src/Tools/TaxMarkerUpdateTool/Tax Marker Ripper v1/DataModifier.cs
Second file path is:
/GEM4/trunk/src/Tools/TaxMarkerUpdateTool/Tax Marker Ripper v1/Tax Marker Ripper v1.csproj
What I'd like to do is substring at the point of difference.
So in this case:
/GEM4/trunk/src/Tools/TaxMarkerUpdateTool/Tax Marker Ripper v1/
would be substringed

I hope this helps:
public string GetString(string Path1, string Path2)
{
//Split and Put everything between / in the arrays
string[] Arr_String1 = Path1.Split('/');
string[] Arr_String2 = Path2.Split('/');
string Result = "";
for (int i = 0; i <= Arr_String1.Length; i++)
{
if (Arr_String1[i] == Arr_String2[i])
{
//Puts the Content that is the same in an Result string with /
Result += Arr_String1[i] + '/';
}
else
break;
}
// If Path is identical he would add a / which we dont want
if (Result.Contains('.'))
{
Result = Result.TrimEnd('/');
}
return Result;
}

You can do this pretty easily with a loop. Basically:
public String FindCommonStart(string a, string b)
{
int length = Math.Min(a.Length, b.Length);
var common = String.Empty;
for (int i = 0; i < length; i++)
{
if (a[i] == b[i])
{
common += a[i];
}
else
{
break;
}
}
return common;
}

Something like this:
public string GetCommonStart(string a, string b)
{
if ((a == null) || (b == null))
throw new ArgumentNullException();
int Delim = 0;
int I = 0;
while ((I < a.Length) && (I < b.Length) && (a[I] == b[I]))
{
if (a[I++] == Path.AltDirectorySeparatorChar) // or Path.DirectorySeparatorChar
Delim = I;
}
return a.Substring(0, Delim);
}
Keep in mind, that this code is case-sensitive (and paths in windows in general are not).

Reliably checking if a string is base64 encoded in .Net

Before I start: Yes, I have checked the other questions and answers on this topic both here and elsewhere.
I have found an example string that the .Net will base64 decode even though it isn't actually base64 encoded. Here is the example:
Rhinocort Aqueous 64mcg/dose Nasal Spray
The .Net method Convert.FromBase64String does not throw an exception when decoding this string so my IsBase64Encoded method happily returns true for this string.
Interestingly, if I use the cygwin base64 -d command using this string as input, it fails with the message invalid input.
Even more interestingly, the source that I thought that belongs to this executable (http://libb64.sourceforge.net/) "decodes" this same string with the same result as I am getting from the .Net Convert.FromBase64String. I will keep looking hoping to find a clue elsewhere but right now I'm stumped.
Any ideas?

There's a slightly better solution which also checks the input string length.
I recommend you do a check at the beginning. If the input is null or empty then return false.
http://www.codeproject.com/Questions/177808/How-to-determine-if-a-string-is-Base-decoded-or

When strings do pass Base64 decoding and the decoded data has special characters, then perhaps we can conclude that it was not valid Base64 (this depends on the encoding). Also, sometimes we're expecting the data being passed to be Base64, but sometimes it may not be properly padded with '='. Therefore, one method uses "strict" rules for Base64 and the other is "forgiving".
[TestMethod]
public void CheckForBase64()
{
Assert.IsFalse(IsBase64DataStrict("eyJhIjoiMSIsImIiOiI2N2NiZjA5MC00ZGRiLTQ3OTktOTlmZi1hMjhhYmUyNzQwYjEiLCJmIjoiMSIsImciOiIxIn0"));
Assert.IsTrue(IsBase64DataForgiving("eyJhIjoiMSIsImIiOiI2N2NiZjA5MC00ZGRiLTQ3OTktOTlmZi1hMjhhYmUyNzQwYjEiLCJmIjoiMSIsImciOiIxIn0"));
Assert.IsFalse(IsBase64DataForgiving("testing123"));
Assert.IsFalse(IsBase64DataStrict("ABBA"));
Assert.IsFalse(IsBase64DataForgiving("6AC648C9-C08F-4F9D-A0A5-3904CF15ED3E"));
}
public bool IsBase64DataStrict(string data)
{
if (string.IsNullOrWhiteSpace(data)) return false;
if ((new Regex(#"[^A-Z0-9+\/=]", RegexOptions.IgnoreCase)).IsMatch(data)) return false;
if (data.Length % 4 != 0) return false;
var e = data.IndexOf('=');
var l = data.Length;
if (!(e == -1 || e == l - 1 || (e == l - 2 && data[l - 1] == '='))) return false;
var decoded = string.Empty;
try
{
byte[] decodedData = Convert.FromBase64String(data);
decoded = Encoding.UTF8.GetString(decodedData);
}
catch(Exception)
{
return false;
}
//check for special chars that you know should not be there
char current;
for (int i = 0; i < decoded.Length; i++)
{
current = decoded[i];
if (current == 65533) return false;
if (!((current == 0x9 || current == 0xA || current == 0xD) ||
((current >= 0x20) && (current <= 0xD7FF)) ||
((current >= 0xE000) && (current <= 0xFFFD)) ||
((current >= 0x10000) && (current <= 0x10FFFF))))
{
return false;
}
}
return true;
}
public bool IsBase64DataForgiving(string data)
{
if (string.IsNullOrWhiteSpace(data)) return false;
//it could be made more forgiving by replacing any spaces with '+' here
if ((new Regex(#"[^A-Z0-9+\/=]", RegexOptions.IgnoreCase)).IsMatch(data)) return false;
//this is the forgiving part
if (data.Length % 4 > 0)
data = data.PadRight(data.Length + 4 - data.Length % 4, '=');
var e = data.IndexOf('=');
var l = data.Length;
if (!(e == -1 || e == l - 1 || (e == l - 2 && data[l - 1] == '='))) return false;
var decoded = string.Empty;
try
{
byte[] decodedData = Convert.FromBase64String(data);
decoded = Encoding.UTF8.GetString(decodedData);
}
catch (Exception)
{
return false;
}
//check for special chars that you know should not be there
char current;
for (int i = 0; i < decoded.Length; i++)
{
current = decoded[i];
if (current == 65533) return false;
if (!((current == 0x9 || current == 0xA || current == 0xD) ||
((current >= 0x20) && (current <= 0xD7FF)) ||
((current >= 0xE000) && (current <= 0xFFFD)) ||
((current >= 0x10000) && (current <= 0x10FFFF))))
{
return false;
}
}
return true;
}

C# console random number guess game

I'm working on a random number guessing game as a c# console program. It's done with the code and working. However, there is a part that I want to make better:
I declared an instance of a Guess class I created, now how to make this part more efficient?
int counter = 0;
do
{
myGuess.UserGuess = GetUserGuess(); //read user guess
if (myGuess.Compair() == "match")
{
Console.WriteLine("\n\t Correct!You WIN !");
}
else if (myGuess.Compair() == "high")
{
if (counter < 3)
Console.WriteLine("\n\tTry a lower number,");
else
Console.WriteLine("\n\tSorry you LOSE !, The right number is " + myGuess.RndNum);
counter++;
}
else if (myGuess.Compair() == "low")
{
if (counter < 3)
Console.WriteLine("\n\tTry a higher number,");
else
Console.WriteLine("\n\tSorry you LOSE !, The right number is " + myGuess.RndNum);
counter++;
}
} while (myGuess.Compair() != "match" && counter < 4);
Thanks in advance.

What does "Compair()" function look like? It seems like that could return an integer rather than a string for a simpler function. An example of that looks like:
// just an example implementation
public int Compair() {
if (UserGuess < actualValue) return -1;
if (UserGuess > actualValue) return 1;
return 0;
}
And then your routine becomes:
int counter = 0;
bool success = false;
do
{
myGuess.UserGuess = GetUserGuess();
int compair= myGuess.Compair()
switch (compair) {
case 0:
Console.WriteLine("\n\t Correct!You WIN !");
success = true;
break;
case 1:
case -1:
if (counter < 3) Console.WriteLine("\n\tTry a {0} number,", compair == -1 ? "lower" : "higher");
break;
}
counter++;
if (counter >= 3 && !success)
Console.WriteLine("\n\tSorry you LOSE !, The right number is " + myGuess.RndNum);
} while (!success && counter < 4);
That should do it! This should be faster because it isn't using string comparisons, it might be a bit easier to read and it should have fixed a few logical issues.
Note - I made a few assumptions about the use of properties so this example might not compile out of the get but it should get you most of the way there. Best of luck!

Display special (non printable) characters in WPF control

I have raw binary data received from device. I would like to display that data something like HEX editors do - display hex values, but also display corresponding characters.
I found fonts that have characters for ASCII codes 0 - 32, but I cannot get them to show on screen.
I tried this with WPF listbox, itemscontrol and textbox.
Is there some setting that can make this work?
Or maybe some WPF control that will show this characters?
Edit:
After some thinking and testing, only characters that make problems are line feed, form feed, carriage return, backspace, horizontal and vertical tab. As quick solution I decided to replace those characters with ASCII 16 (10HEX) character. I tested this with ASCII, UTF-8 and Unicode files and it works with those three formats.
Here is regex that I am using for this:
rawLine = Regex.Replace(inputLine, "[\t\n\r\f\b\v]", '\x0010'.ToString());
It replaces all occurrences of this 6 problematic characters with some boxy sign. It shows that this is not "regular printable" character and it works for me.

Not sure if that's excatly what you want, but I would recommend you to have a look in the #develop project. Their editor can display spaces, tabs and end-of-line markers.
I had a quick look at the source code and in the namespace ICSharpCode.AvalonEdit.Rendering the SingleCharacterElementGenerator class, seems to do what you want.

This should help you can expand it
private static string GetPrintableCharacter(char character)
{
switch (character)
{
case '\a':
{
return "\\a";
}
case '\b':
{
return "\\b";
}
case '\t':
{
return "\\t";
}
case '\n':
{
return "\\n";
}
case '\v':
{
return "\\v";
}
case '\f':
{
return "\\f";
}
case '\r':
{
return "\\r";
}
default:
{
if (character == ' ')
{
break;
}
else
{
throw new InvalidArgumentException(Resources.NOTSUPPORTCHAR, new object[] { character });
}
}
}
return "\\x20";
}
public static string GetPrintableText(string text)
{
StringBuilder stringBuilder = new StringBuilder(1024);
if (text == null)
{
return "[~NULL~]";
}
if (text.Length == 0)
{
return "[~EMPTY~]";
}
stringBuilder.Remove(0, stringBuilder.Length);
int num = 0;
for (int i = 0; i < text.Length; i++)
{
if (text[i] == '\a' || text[i] == '\b' || text[i] == '\f' || text[i] == '\v' || text[i] == '\t' || text[i] == '\n' || text[i] == '\r' || text[i] == ' ')
{
num += 3;
}
}
int length = text.Length + num;
if (stringBuilder.Capacity < length)
{
stringBuilder = new StringBuilder(length);
}
string str = text;
for (int j = 0; j < str.Length; j++)
{
char chr = str[j];
if (chr > ' ')
{
stringBuilder.Append(chr);
}
else
{
stringBuilder.Append(StringHelper.GetPrintableCharacter(chr));
}
}
return stringBuilder.ToString();
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

WebUtility.HtmlDecode vs HttpUtilty.HtmlDecode - c#

Related

C# for case in string(easy)

Check the difference

Reliably checking if a string is base64 encoded in .Net

C# console random number guess game

Display special (non printable) characters in WPF control

Categories

Resources