Method to encode to HTML character entity

Method to encode to HTML character entity - c#

Any C# method to convert into HTML character entity strings?
Basically, I need to encode a URL. Using HttpUtility.UrlEncode() I get to
http://server/sites/blank/_vti_bin/UploadService/UploadService.svc/Upload/http%3a%2f%2fserver%2fsites%2fblank%2fdoclib1%2ffile.pdf
Problem is, that causes a "400 Bad Request" for my service.
Fix is to replace %3A with : and so on, makes sense?

I believe the best method to encode URLs for transfer over another URL is by using
Uri.EscapeDataString(). The problem might be lowercase letters (%3a instead of %3A) in your encoded string.
var escdata = Uri.EscapeDataString(#"http://server/sites/blank/doclib1/file.pdf?test=a+b c");
// http%3A%2F%2Fserver%2Fsites%2Fblank%2Fdoclib1%2Ffile.pdf%3Ftest%3Da%2Bb%20c
var escuris = Uri.EscapeUriString(#"http://server/sites/blank/doclib1/file.pdf?test=a+b c");
// http://server/sites/blank/doclib1/file.pdf?test=a+b%20c
var urlencd = HttpUtility.UrlEncode(#"http://server/sites/blank/doclib1/file.pdf?test=a+b c");
// http%3a%2f%2fserver%2fsites%2fblank%2fdoclib1%2ffile.pdf%3ftest%3da%2bb+c
var urlpenc = HttpUtility.UrlPathEncode(#"http://server/sites/blank/doclib1/file.pdf?test=a+b c");
// http://server/sites/blank/doclib1/file.pdf?test=a+b c

I'm not aware of a built-in function that will do this, but here is a quick-and-dirty solution:
string s = "http://myurl.com/whatever";
StringBuilder sb = new StringBuilder();
foreach (char c in s)
{
sb.Append(String.Format("&#x{0:X2};", (uint)c));
}
var result = sb.ToString();
And as a one-liner using LINQ:
string s = "http://myurl.com/whatever";
string result = String.Join("", s.SelectMany(c=> String.Format("&#x{0:X2};", (uint)c)).ToArray());

Related

UriBuilder().Query will wrongly encode non-ASCII characters

I am working on an asp.net mvc 4 web application. and i am using .net 4.5. now i have the following WebClient() class:
using (var client = new WebClient())
{
var query = HttpUtility.ParseQueryString(string.Empty);
query["model"] = Model;
//code goes here for other parameters....
string apiurl = System.Web.Configuration.WebConfigurationManager.AppSettings["ApiURL"];
var url = new UriBuilder(apiurl);
url.Query = query.ToString();
string xml = client.DownloadString(url.ToString());
XmlDocument doc = new XmlDocument();
//code goes here ....
}
now i have noted a problem when one of the parameters contain non-ASCII charterers such as £, ¬, etc....
now the final query will have any non-ASCII characters (such as £) encoded wrongly (as %u00a3). i read about this problem and seems i can replace :-
url.Query = query.ToString();
with
url.Query = ri.EscapeUriString(HttpUtility.UrlDecode(query.ToString()));
now using the later approach will encode £ as %C2%A3 which is the correct encoded value.
but the problem i am facing with url.Query = Uri.EscapeUriString(HttpUtility.UrlDecode(query.ToString())); in that case one of the parameters contains & then the url will have the following format &operation=AddAsset&assetName=&.... so it will assume that I am passing empty assetName parameter not value =&??
EDIT
Let me summarize my problem again. I want to be able to pass the following 3 things inside my URL to a third part API :
Standard characters such as A,B ,a ,b ,1, 2, 3 ...
Non-ASCII characters such as £,¬ .
and also special characters that are used in url encoding such as & , + .
now i tried the following 2 approaches :
Approach A:
using (var client = new WebClient())
{
var query = HttpUtility.ParseQueryString(string.Empty);
query["model"] = Model;
//code goes here for other parameters....
string apiurl = System.Web.Configuration.WebConfigurationManager.AppSettings["ApiURL"];
var url = new UriBuilder(apiurl);
url.Query = query.ToString();
string xml = client.DownloadString(url.ToString());
XmlDocument doc = new XmlDocument();
//code goes here ....
}
In this approach i can pass values such as & ,+ since they are going to be url encoded ,,but if i want to pass non-ASCII characters they will be encoded using ISO-8859-1 ... so if i have £ value , my above code will encoded as %u00a3 and it will be saved inside the 3rd party API as %u00a3 instead of £.
Approach B :
I use :
url.Query = Uri.EscapeUriString(HttpUtility.UrlDecode(query.ToString()));
instead of
url.Query = query.ToString();
now I can pass non-ASCII characters such as £ since they will be encoded correctly using UTF8 instead of ISO-8859-1. but i can not pass values such as & because my url will be read wrongly by the 3rd party API.. for example if I want to pass assetName=& my url will look as follow:
&operation=Add&assetName=&
so the third part API will assume I am passing empty assetName, while I am trying to pass its value as &...
so not sure how I can pass both non-ASCII characters + characters such as &, + ????

You could use System.Net.Http.FormUrlEncodedContent instead.
This works with a Dictionary for the Name/Value pairing and the Dictionary, unlike the NameValueCollection, does not "incorrectly" map characters such as £ to an unhelpful escaping (%u00a3, in your case).
Instead, the FormUrlEncodedContent can take a dictionary in its constructor. When you read the string out of it, it will have properly urlencoded the dictionary values.
It will correctly and uniformly handle both of the cases you were having trouble with:
£ (which exceeds the character value range of urlencoding and needs to be encoded into a hexadecimal value in order to transport)
& (which, as you say, has meaning in the url as a parameter separator, so that values cannot contain it--so that it has to be encoded as well).
Here's a code example, that shows that the various kinds of example items you mentioned (represented by item1, item2 and item3) now end up correctly urlencoded:
String item1 = "£";
String item2 = "&";
String item3 = "xyz";
Dictionary<string,string> queryDictionary = new Dictionary<string, string>()
{
{"item1", item1},
{"item2", item2},
{"item3", item3}
};
var queryString = new System.Net.Http.FormUrlEncodedContent(queryDictionary)
.ReadAsStringAsync().Result;
queryString will contain item1=%C2%A3&item2=%26&item3=xyz.

Maybe you could try to use an Extension method on the NameValueCollection class. Something like this:
using System.Collections.Specialized;
using System.Text;
using System.Web;
namespace Testing
{
public static class NameValueCollectionExtension
{
public static string ToUtf8UrlEncodedQuery(this NameValueCollection nv)
{
StringBuilder sb = new StringBuilder();
bool firstIteration = true;
foreach (var key in nv.AllKeys)
{
if (!firstIteration)
sb.Append("&");
sb.Append(HttpUtility.UrlEncode(key, Encoding.UTF8))
.Append("=")
.Append(HttpUtility.UrlEncode(nv[key], Encoding.UTF8));
firstIteration = false;
}
return sb.ToString();
}
}
}
Then, in your code you can do this:
url.Query = query.ToUtf8UrlEncodedQuery();
Remember to add a using directive for the namespace where you put the NameValueCollectionExtension class.

The problem here isn't UriBuilder.Query, it's UriBuilder.ToString(). Read the documentation here: https://msdn.microsoft.com/en-us/library/system.uribuilder.tostring(v=vs.110).aspx. The property is defined as returning the "display string" of the builder, not a validly encoded string. Uri.ToString() has a similar problem, in that it doesn't perform proper encoding.
Use the following instead: url.Uri.AbsoluteUri, that will always be a properly encoded string. You shouldn't have to do any encoding on the way into the builder (that's part of it's purpose, after all, to properly encode things).

You need to use:
System.Web.HttpUtility.UrlEncode(key)
Change your code to this:
using (var client = new WebClient())
{
var query = HttpUtility.ParseQueryString(string.Empty);
query["model"] = Model;
//code goes here for other parameters....
string apiurl = System.Web.Configuration.WebConfigurationManager.AppSettings["ApiURL"];
var url = new UriBuilder(apiurl);
url.Query = HttpUtility.UrlEncode(query.ToString());
string xml = client.DownloadString(url.ToString());
XmlDocument doc = new XmlDocument();
//code goes here ....
}

If nothing helps, then just manually convert those problematic chars inside values of parameters
& to %26
+ to %2B
? to %3F

C# ByteString to ASCII String

I am looking for a smart way to convert a string of hex-byte-values into a string of 'real text' (ASCII Characters).
For example I have the word "Hello" written in Hexadecimal ASCII: 48 45 4C 4C 4F. And using some method I want to receive the ASCII text of it (in this case "Hello").
// I have this string (example: "Hello") and want to convert it to "Hello".
string strHexa = "48454C4C4F";
// I want to convert the strHexa to an ASCII string.
string strResult = ConvertToASCII(strHexa);
I am sure there is a framework method. If this is not the case of course I could implement my own method.
Thanks!

var str = Encoding.UTF8.GetString(SoapHexBinary.Parse("48454C4C4F").Value); //HELLO
PS: SoapHexBinary is in System.Runtime.Remoting.Metadata.W3cXsd2001 namespace

I am sure there is a framework method.
A a single framework method: No.
However the second part of this: converting a byte array containing ASCII encoded text into a .NET string (which is UTF-16 encoded Unicode) does exist: System.Text.ASCIIEncoding and specifically the method GetString:
string result = ASCIIEncoding.GetString(byteArray);
The First part is easy enough to do yourself: take two hex digits at a time, parse as hex and cast to a byte to store in the array. Seomthing like:
byte[] HexStringToByteArray(string input) {
Debug.Assert(input.Length % 2 == 0, "Must have two digits per byte");
var res = new byte[input.Length/2];
for (var i = 0; i < input.Length/2; i++) {
var h = input.Substring(i*2, 2);
res[i] = Convert.ToByte(h, 16);
}
return res;
}
Edit: Note: L.B.'s answer identifies a method in .NET that will do the first part more easily: this is a better approach that writing it yourself (while in a, perhaps, obscure namespace it is implemented in mscorlib rather than needing an additional reference).

StringBuilder sb = new StringBuilder();
for (int i = 0; i < hexStr.Length; i += 2)
{
string hs = hexStr.Substring(i, 2);
sb.Append(Convert.ToByte(hs, 16));
}

How to convert a string containing escape characters to a string

I have a string that is returned to me which contains escape characters.
Here is a sample string
"test\40gmail.com"
As you can see it contains escape characters. I need it to be converted to its real value which is
"test#gmail.com"
How can I do this?

If you are looking to replace all escaped character codes, not only the code for #, you can use this snippet of code to do the conversion:
public static string UnescapeCodes(string src) {
var rx = new Regex("\\\\([0-9A-Fa-f]+)");
var res = new StringBuilder();
var pos = 0;
foreach (Match m in rx.Matches(src)) {
res.Append(src.Substring(pos, m.Index - pos));
pos = m.Index + m.Length;
res.Append((char)Convert.ToInt32(m.Groups[1].ToString(), 16));
}
res.Append(src.Substring(pos));
return res.ToString();
}
The code relies on a regular expression to find all sequences of hex digits, converting them to int, and casting the resultant value to a char.

string test = "test\40gmail.com";
test.replace(#"\40","#");
If you want a more general approach ...
HTML Decode

The sample string provided ("test\40gmail.com") is JID escaped. It is not malformed, and HttpUtility/WebUtility will not correctly handle this escaping scheme.
You can certainly do it with string or regex functions, as suggested in the answers from dasblinkenlight and C.Barlow. This is probably the cleanest way to achieve the desired result. I'm not aware of any .NET libraries for decoding JID escaping, and a brief search hasn't turned up much. Here is a link to some source which may be useful, though.

I just wrote this piece of code and it seems to work beautifully... It requires that the escape sequence is in HEX, and is valid for value's 0x00 to 0xFF.
// Example
str = remEscChars(#"Test\x0D") // str = "Test\r"
Here is the code.
private string remEscChars(string str)
{
int pos = 0;
string subStr = null;
string escStr = null;
try
{
while ((pos = str.IndexOf(#"\x")) >= 0)
{
subStr = str.Substring(pos + 2, 2);
escStr = Convert.ToString(Convert.ToChar(Convert.ToInt32(subStr, 16)));
str = str.Replace(#"\x" + subStr, escStr);
}
}
catch (Exception ex)
{
throw ex;
}
return str;
}

.NET provides the static methods Regex.Unescape and Regex.Escape to perform this task and back again. Regex.Unescape will do what you need.
https://learn.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.unescape

How to decode "\u0026" in a URL?

I want decode URL A to B:
A) http:\/\/example.com\/xyz?params=id%2Cexpire\u0026abc=123
B) http://example.com/xyz?params=id,expire&abc=123
This is a sample URL and I look for a general solution not A.Replace("\/", "/")...
Currently I use HttpUtility.UrlDecode(A, Encoding.UTF8) and other Encodings but cannot generate URL B !

You only need this function
System.Text.RegularExpressions.Regex.Unescape(str);

This is a basic example I was able to come up with:
static void Sample()
{
var str = #"http:\/\/example.com\/xyz?params=id%2Cexpire\u0026abc=123";
str = str.Replace("\\/", "/");
str = HttpUtility.UrlDecode(str);
str = Regex.Replace(str, #"\\u(?<code>\d{4})", CharMatch);
Console.Out.WriteLine("value = {0}", str);
}
private static string CharMatch(Match match)
{
var code = match.Groups["code"].Value;
int value = Convert.ToInt32(code, 16);
return ((char) value).ToString();
}
This is probably missing a lot depending on the types of URLs you are going to get. It doesn't handle error checking, escaping of literals, like \\u0026 should be \u0026. I'd recommend writing a few unit tests around this with various inputs to get started.

C#/.NET: Reformatting a very long string

I need to read a string, character by character, and build a new string as the output.
What's the best approach to do this in C#?
Use a StringBuilder? Use some writer/stream?
Note that there will be no I/O operations--this is strictly an in-memory transformation.

If the size of the string cannot be determined at compile time and it may also be relatively large, you should use a StringBuilder for concatenation as it acts like a mutable string.
var input = SomeLongString;
// may as well initialize the capacity as well
// as the length will be 1 to 1 with the unprocessed input.
var sb = new StringBuilder( input.Length );
foreach( char c in input )
{
sb.Append( Process( c ) );
}

if it's just one string you can use a collection to hold your characters and then just create the string using the constructor:
IEnumerable<char> myChars = ...;
string result = new string(myChars);
Using Linq and with the help of a method ProcessChar(char c) that transforms each character to its output value this could be just a query transformation (using the string constructor that takes an IEnumerable<char> as input):
string result = new string(sourceString.Select(c => ProcessChar(c)));
This is as efficient as using a StringBuilder (since StringBuilder is used internally in the string class to construct the string from the IEnumerable), but much more readable in my opinion.

Stringbuilder is usually a pretty good bet. I've written lots of javascript in webpages using it.

A StringBuilder is good idea for building your new string, because you can efficiently append new values to it. As for reading the characters from the input string, a StringReader would be a sufficient choice.

void Main()
{
string myLongString = "lf;kajsd;lfkjal;dfkja;lkdfja;lkdjf;alkjdfa";
var transformedTString = string.Join(string.Empty, myLongString.ToCharArray().Where(x => x != ';'));
transformedTString.Dump();
}
If you have more complicated logic you can move your validation to separate predicated method
void Main()
{
string myLongString = "lf;kajsd;lfkjal;dfkja;lkdfja;lkdjf;alkjdfa";
var transformedTString = string.Join(string.Empty, myLongString.ToCharArray().Where(MyPredicate));
transformedTString.Dump();
}
public bool MyPredicate(char c)
{
return c != ';';
}

What's the difference between read string and output string? I mean why do you have to read char by char?
I use this method for reading string
string str = "some stuff";
string newStr = ToNewString(str);
string ToNewString(string arg)
{
string r = string.Empty;
foreach (char c in arg)
r += DoWork(c);
return r;
}
char DoWorK(char arg)
{
// What do you want to do here?
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Method to encode to HTML character entity - c#

Related

UriBuilder().Query will wrongly encode non-ASCII characters

C# ByteString to ASCII String

How to convert a string containing escape characters to a string

How to decode "\u0026" in a URL?

C#/.NET: Reformatting a very long string

Categories

Resources