UriBuilder().Query will wrongly encode non-ASCII characters

UriBuilder().Query will wrongly encode non-ASCII characters - c#

I am working on an asp.net mvc 4 web application. and i am using .net 4.5. now i have the following WebClient() class:
using (var client = new WebClient())
{
var query = HttpUtility.ParseQueryString(string.Empty);
query["model"] = Model;
//code goes here for other parameters....
string apiurl = System.Web.Configuration.WebConfigurationManager.AppSettings["ApiURL"];
var url = new UriBuilder(apiurl);
url.Query = query.ToString();
string xml = client.DownloadString(url.ToString());
XmlDocument doc = new XmlDocument();
//code goes here ....
}
now i have noted a problem when one of the parameters contain non-ASCII charterers such as £, ¬, etc....
now the final query will have any non-ASCII characters (such as £) encoded wrongly (as %u00a3). i read about this problem and seems i can replace :-
url.Query = query.ToString();
with
url.Query = ri.EscapeUriString(HttpUtility.UrlDecode(query.ToString()));
now using the later approach will encode £ as %C2%A3 which is the correct encoded value.
but the problem i am facing with url.Query = Uri.EscapeUriString(HttpUtility.UrlDecode(query.ToString())); in that case one of the parameters contains & then the url will have the following format &operation=AddAsset&assetName=&.... so it will assume that I am passing empty assetName parameter not value =&??
EDIT
Let me summarize my problem again. I want to be able to pass the following 3 things inside my URL to a third part API :
Standard characters such as A,B ,a ,b ,1, 2, 3 ...
Non-ASCII characters such as £,¬ .
and also special characters that are used in url encoding such as & , + .
now i tried the following 2 approaches :
Approach A:
using (var client = new WebClient())
{
var query = HttpUtility.ParseQueryString(string.Empty);
query["model"] = Model;
//code goes here for other parameters....
string apiurl = System.Web.Configuration.WebConfigurationManager.AppSettings["ApiURL"];
var url = new UriBuilder(apiurl);
url.Query = query.ToString();
string xml = client.DownloadString(url.ToString());
XmlDocument doc = new XmlDocument();
//code goes here ....
}
In this approach i can pass values such as & ,+ since they are going to be url encoded ,,but if i want to pass non-ASCII characters they will be encoded using ISO-8859-1 ... so if i have £ value , my above code will encoded as %u00a3 and it will be saved inside the 3rd party API as %u00a3 instead of £.
Approach B :
I use :
url.Query = Uri.EscapeUriString(HttpUtility.UrlDecode(query.ToString()));
instead of
url.Query = query.ToString();
now I can pass non-ASCII characters such as £ since they will be encoded correctly using UTF8 instead of ISO-8859-1. but i can not pass values such as & because my url will be read wrongly by the 3rd party API.. for example if I want to pass assetName=& my url will look as follow:
&operation=Add&assetName=&
so the third part API will assume I am passing empty assetName, while I am trying to pass its value as &...
so not sure how I can pass both non-ASCII characters + characters such as &, + ????

You could use System.Net.Http.FormUrlEncodedContent instead.
This works with a Dictionary for the Name/Value pairing and the Dictionary, unlike the NameValueCollection, does not "incorrectly" map characters such as £ to an unhelpful escaping (%u00a3, in your case).
Instead, the FormUrlEncodedContent can take a dictionary in its constructor. When you read the string out of it, it will have properly urlencoded the dictionary values.
It will correctly and uniformly handle both of the cases you were having trouble with:
£ (which exceeds the character value range of urlencoding and needs to be encoded into a hexadecimal value in order to transport)
& (which, as you say, has meaning in the url as a parameter separator, so that values cannot contain it--so that it has to be encoded as well).
Here's a code example, that shows that the various kinds of example items you mentioned (represented by item1, item2 and item3) now end up correctly urlencoded:
String item1 = "£";
String item2 = "&";
String item3 = "xyz";
Dictionary<string,string> queryDictionary = new Dictionary<string, string>()
{
{"item1", item1},
{"item2", item2},
{"item3", item3}
};
var queryString = new System.Net.Http.FormUrlEncodedContent(queryDictionary)
.ReadAsStringAsync().Result;
queryString will contain item1=%C2%A3&item2=%26&item3=xyz.

Maybe you could try to use an Extension method on the NameValueCollection class. Something like this:
using System.Collections.Specialized;
using System.Text;
using System.Web;
namespace Testing
{
public static class NameValueCollectionExtension
{
public static string ToUtf8UrlEncodedQuery(this NameValueCollection nv)
{
StringBuilder sb = new StringBuilder();
bool firstIteration = true;
foreach (var key in nv.AllKeys)
{
if (!firstIteration)
sb.Append("&");
sb.Append(HttpUtility.UrlEncode(key, Encoding.UTF8))
.Append("=")
.Append(HttpUtility.UrlEncode(nv[key], Encoding.UTF8));
firstIteration = false;
}
return sb.ToString();
}
}
}
Then, in your code you can do this:
url.Query = query.ToUtf8UrlEncodedQuery();
Remember to add a using directive for the namespace where you put the NameValueCollectionExtension class.

The problem here isn't UriBuilder.Query, it's UriBuilder.ToString(). Read the documentation here: https://msdn.microsoft.com/en-us/library/system.uribuilder.tostring(v=vs.110).aspx. The property is defined as returning the "display string" of the builder, not a validly encoded string. Uri.ToString() has a similar problem, in that it doesn't perform proper encoding.
Use the following instead: url.Uri.AbsoluteUri, that will always be a properly encoded string. You shouldn't have to do any encoding on the way into the builder (that's part of it's purpose, after all, to properly encode things).

You need to use:
System.Web.HttpUtility.UrlEncode(key)
Change your code to this:
using (var client = new WebClient())
{
var query = HttpUtility.ParseQueryString(string.Empty);
query["model"] = Model;
//code goes here for other parameters....
string apiurl = System.Web.Configuration.WebConfigurationManager.AppSettings["ApiURL"];
var url = new UriBuilder(apiurl);
url.Query = HttpUtility.UrlEncode(query.ToString());
string xml = client.DownloadString(url.ToString());
XmlDocument doc = new XmlDocument();
//code goes here ....
}

If nothing helps, then just manually convert those problematic chars inside values of parameters
& to %26
+ to %2B
? to %3F

Related

How can I use C# to search an XML file for specific words?

I'm very new to C# and XML files in general, but currently I have an XML file that still has some html markup in it (&amp, ;quot;, etc.) and I want to read through the XML file and remove all of those so it becomes easily readable. I can open and print the file to the console with no issue, but I'm stumped trying to search for those specific strings and remove them.

One way to do this would be to put all the words you want to remove into an array, and then use the Replace method to replace them with empty strings:
var xmlFilePath = #"c:\temp\original.xml";
var newFilePath = #"c:\temp\modified.xml";
var wordsToRemove = new[] {"&amp", ";quot;"};
// Read existing xml file
var fileContents = File.ReadAllText(xmlFilePath);
// Remove words
foreach (var word in wordsToRemove)
{
fileContents = fileContents.Replace(word, "");
}
// Create new file with words removed
File.WriteAllText(newFilePath, fileContents);

I suppose you are looking for this: https://learn.microsoft.com/en-us/dotnet/api/system.web.httputility.htmldecode?view=netcore-3.1
Converts a string that has been HTML-encoded for HTTP transmission into a decoded string.
// Encode the string.
string myEncodedString = HttpUtility.HtmlEncode(myString);
Console.WriteLine($"HTML Encoded string is: {myEncodedString}");
StringWriter myWriter = new StringWriter();
// Decode the encoded string.
HttpUtility.HtmlDecode(myEncodedString, myWriter);
string myDecodedString = myWriter.ToString();
Console.Write($"Decoded string of the above encoded string is: {myDecodedString}");
Your string is html encoded, probably for transmission over network. So there is a built in method to decode it.

How to convert a utf8 like string to a real utf8?

I'm working on a client that should communicate with an MMO game server.
The client is using unity3d.
I get the data from the server with JSON format and I try to get the data in UTF8 encoding:
string responseString = new System.IO.StreamReader(response.GetResponseStream(), System.Text.Encoding.UTF8).ReadToEnd()
JSONObject JOBJ = new JSONObject(responseString);
and what is inside the response string looks like:
"\u0645\u0639\u062f\u0646 \u062a\u06cc\u062a\u0627\u0646\u06cc\u0648\u0645"
Then I try to get the required utf8 string data out of the JSON:
string xy = JOBJ["name"].ToString();
byte[] utf = System.Text.Encoding.UTF8.GetBytes(xy);
string s2= System.Text.Encoding.UTF8.GetString(utf);
The Problem is when I Log the string:
Debug.Log("Jproperty :" + s2);
All I get is the \u secuences like this:
"\u0645\u0639\u062f\u0646 \u062a\u06cc\u062a\u0627\u0646\u06cc\u0648\u0645"
While if I put the same result in the xy in the first place I'll get the fine result.
Also I should mention that while I think that the s2.length should be 11 it is 66.
Any one can tell me what's wrong with my code?

Strings that contain unicode escape sequences are perfectly valid. Your data might be getting escaped before it is sent to the server.
Try Regex.Unescape:
var nameEscaped = JOBJ["name"].ToString();
// nameEscaped =
// \u0645\u0639\u062f\u0646 \u062a\u06cc\u062a\u0627\u0646\u06cc\u0648\u0645
var name = Regex.Unescape(nameEscaped);
// name =
// معدن تیتانیوم

Method to encode to HTML character entity

Any C# method to convert into HTML character entity strings?
Basically, I need to encode a URL. Using HttpUtility.UrlEncode() I get to
http://server/sites/blank/_vti_bin/UploadService/UploadService.svc/Upload/http%3a%2f%2fserver%2fsites%2fblank%2fdoclib1%2ffile.pdf
Problem is, that causes a "400 Bad Request" for my service.
Fix is to replace %3A with : and so on, makes sense?

I believe the best method to encode URLs for transfer over another URL is by using
Uri.EscapeDataString(). The problem might be lowercase letters (%3a instead of %3A) in your encoded string.
var escdata = Uri.EscapeDataString(#"http://server/sites/blank/doclib1/file.pdf?test=a+b c");
// http%3A%2F%2Fserver%2Fsites%2Fblank%2Fdoclib1%2Ffile.pdf%3Ftest%3Da%2Bb%20c
var escuris = Uri.EscapeUriString(#"http://server/sites/blank/doclib1/file.pdf?test=a+b c");
// http://server/sites/blank/doclib1/file.pdf?test=a+b%20c
var urlencd = HttpUtility.UrlEncode(#"http://server/sites/blank/doclib1/file.pdf?test=a+b c");
// http%3a%2f%2fserver%2fsites%2fblank%2fdoclib1%2ffile.pdf%3ftest%3da%2bb+c
var urlpenc = HttpUtility.UrlPathEncode(#"http://server/sites/blank/doclib1/file.pdf?test=a+b c");
// http://server/sites/blank/doclib1/file.pdf?test=a+b c

I'm not aware of a built-in function that will do this, but here is a quick-and-dirty solution:
string s = "http://myurl.com/whatever";
StringBuilder sb = new StringBuilder();
foreach (char c in s)
{
sb.Append(String.Format("&#x{0:X2};", (uint)c));
}
var result = sb.ToString();
And as a one-liner using LINQ:
string s = "http://myurl.com/whatever";
string result = String.Join("", s.SelectMany(c=> String.Format("&#x{0:X2};", (uint)c)).ToArray());

Is there a simpler way to calculate this MD5 hash?

I'm working on implementing a hosted checkout, and the hosted checkout is supposed to redirect the user back to my website so that I can show a custom receipt page.
This is a sample querystring that I'd get back:
trnApproved=0&trnId=10000000&messageId=71&messageText=Declined&authCode=000000&responseType=T&trnAmount=20.00&trnDate=9%2f23%2f2011+9%3a30%3a56+AM&trnOrderNumber=1000000&trnLanguage=eng&trnCustomerName=FirstName+LastName&trnEmailAddress=something_something%40gmail.com&trnPhoneNumber=1235550123&avsProcessed=0&avsId=0&avsResult=0&avsAddrMatch=0&avsPostalMatch=0&avsMessage=Address+Verification+not+performed+for+this+transaction.&cvdId=3&cardType=VI&trnType=P&paymentMethod=CC&ref1=9dae6af7-7c22-4697-b23a-413d8a129a75&ref2=&ref3=&ref4=&ref5=&hashValue=33dacf84682470f267b2cc6d528b1594
To validate the request, I'm supposed to remove &hashValue=f3cf58ef0fd363e0c2241938b04f1068 from the end of the querystring, and then append a key. I then perform an MD5 hash of the entire string, and the result should be 33dacf84682470f267b2cc6d528b1594, same as the original.
This is easy, except that a few of the fields are causing a problem for me. This is the code I use (taken from a dummy application, so you can ignore some of the bad coding):
// Split up the query string parameters
string[] parameters = GetQueryString().Split(new[] { "&" }, StringSplitOptions.None);
var querySet = new List<string>();
// Rebuild the query string, encoding the values.
foreach (string s in parameters)
{
// Every field that contains a "." will need to be encoded except for trnAmount
querySet.Add(param.Contains("trnAmount") ? param : UrlEncodeToUpper(param));
}
// Create the querystring without the hashValue, we need to calculate our hash without it.
string qs = string.Join("&", querySet.ToArray());
qs = qs.Substring(0, qs.IndexOf("&hashValue"));
qs = qs + "fb76124fea73488fa11995dfa4cbe89b";
var encoding = new UTF8Encoding();
var md5 = new MD5CryptoServiceProvider();
var hash = md5.ComputeHash(encoding.GetBytes(qs));
var calculatedHash = BitConverter.ToString(hash).Replace("-", String.Empty).ToLower();
This is the UrlEncode method I use.
private static string UrlEncodeToUpper(string value)
{
// Convert their encoding into uppercase so we can do our hash
value = Regex.Replace(value, "(%[0-9af][0-9a-f])", c => c.Value.ToUpper());
// Encode the characters that they missed
value = value.Replace("-", "%2D").Replace(".", "%2E").Replace("_", "%5F");
return value;
}
This all works (until someone enters a character I haven't accounted for), except this seems more complicated than it should be. I know I'm not the only one who has to implement this HCO into an ASP.NET application, so I don't think the simple validation should be so complicated.
Am I missing an easier way to do this? Having to loop through the fields, encoding some of them while skipping others, converting their encoding to uppercase and then selectively replacing characters seems a little... odd.

Here's a better way to work with query strings:
var queryString = "trnApproved=0&trnId=10000000&messageId=71&messageText=Declined&authCode=000000&responseType=T&trnAmount=20.00&trnDate=9%2f23%2f2011+9%3a30%3a56+AM&trnOrderNumber=1000000&trnLanguage=eng&trnCustomerName=FirstName+LastName&trnEmailAddress=something_something%40gmail.com&trnPhoneNumber=1235550123&avsProcessed=0&avsId=0&avsResult=0&avsAddrMatch=0&avsPostalMatch=0&avsMessage=Address+Verification+not+performed+for+this+transaction.&cvdId=3&cardType=VI&trnType=P&paymentMethod=CC&ref1=9dae6af7-7c22-4697-b23a-413d8a129a75&ref2=&ref3=&ref4=&ref5=&hashValue=33dacf84682470f267b2cc6d528b1594";
var values = HttpUtility.ParseQueryString(queryString);
// remove the hashValue parameter
values.Remove("hashValue");
var result = values.ToString();
// At this stage result = trnApproved=0&trnId=10000000&messageId=71&messageText=Declined&authCode=000000&responseType=T&trnAmount=20.00&trnDate=9%2f23%2f2011+9%3a30%3a56+AM&trnOrderNumber=1000000&trnLanguage=eng&trnCustomerName=FirstName+LastName&trnEmailAddress=something_something%40gmail.com&trnPhoneNumber=1235550123&avsProcessed=0&avsId=0&avsResult=0&avsAddrMatch=0&avsPostalMatch=0&avsMessage=Address+Verification+not+performed+for+this+transaction.&cvdId=3&cardType=VI&trnType=P&paymentMethod=CC&ref1=9dae6af7-7c22-4697-b23a-413d8a129a75&ref2=&ref3=&ref4=&ref5=
// now add some other query string value
values["foo"] = "bar"; // you can stuff whatever you want it will be properly url encoded
Then I didn't quite understand what you wanted to do. You want to calculate an MD5 on the result? You could do that and then append to the query string.

Get a value from a webpage c#

I a have a string that contains the code of a webpage.
This is an example:
<input type="text" name="x4B07" value="650"
onchange="this.form.x8000.value=this.name;this.form.submit();"/>
<input type="text" name="x4B08" value="250"
onchange="this.form.x8000.value=this.name;this.form.submit();"/>
In that string I want to get the 650 and 250 (these are variables and they change value).
How can I do so?
Example:
name
value
x4b08
254
x4b07
253
x4b06
252
x4b05
251

If you were confident that the markup would never change (and you have a simple snippet like your example line) a regex could get you those values, for example:
Regex re = new Regex("name=\"(.*?)\" value=\"(.*?)\"");
Match match = re.Match(yourString);
if(match.Success && match.Groups.Count == 3){
String name = match.Groups[1];
String value = match.Groups[2];
}
Alternatively you could parse the page content and query the resulting document for the elements, and then extract the values. (C# HTML Parser: Looking for C# HTML parser )

You can use regular expressions to match value="([0-9]*)"
Or you can look for the string "value" using string.IndexOf and then take the following few characters.

This should work for you (assuming that s contains the string you want to parse):
string value = s.Substring(s.IndexOf("value=")+7);
value = value.Substring(0, value.IndexOf("\""));

How specific are your examples? Could you also want to extract varying length alphabetic strings? Will the strings you want to extract always be properties?
While the regex/substring way works for the specified examples I think they will scale quite badly.
I'd parse the HTML using a parser (see ndtreviv's answer) or possibly with an XML parser (if the HTML is valid XHTML). That way you will get better control and don't have to bleed your eyes out from fidgeting with a bucketload of regex.

If you have multiple such controls in the form of string you can create and XmlDocument and iterate through it.

just solved with this
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(URL);
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
Stream st = resp.GetResponseStream();
StreamReader sr = new StreamReader(st);
string buffer = sr.ReadToEnd();
ArrayList uniqueMatches = new ArrayList();
Match[] retArray = null;
Regex RE = new Regex("name=\"(.*?)\" value=\"(.*?)\"", RegexOptions.Multiline);
MatchCollection theMatches = RE.Matches(buffer);
for (int counter = 0; counter < theMatches.Count; counter++)
{
//string[] tempSplit = theMatches[counter].Value.Split('"');
Regex reName = new Regex("name=\"(.*?)\"");
Match matchName = reName.Match(theMatches[counter].Value);
Regex reValue = new Regex("value=\"(.*?)\"");
Match matchValue = reValue.Match(theMatches[counter].Value);
string[] dados = new string[2];
dados[0] = matchName.Groups[1].ToString();
dados[1] = matchValue.Groups[1].ToString();
uniqueMatches.Add(dados);
}
Tks all for the help

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

UriBuilder().Query will wrongly encode non-ASCII characters - c#

If nothing helps, then just manually convert those problematic chars inside values of parameters & to %26 + to %2B ? to %3F

Related

How can I use C# to search an XML file for specific words?

How to convert a utf8 like string to a real utf8?

Method to encode to HTML character entity

Is there a simpler way to calculate this MD5 hash?

Get a value from a webpage c#

Categories

Resources