Url encoded characters, ViewState and byte[] arrays oh my - c#

I am trying to throw together a screen scraper and keep getting invalid viewstate issues. It appears that during the System.Net.WebClient download of data or the System.Text.UTF8Encoding.Default.GetString call to convert the byte array returned by the WebClient DownloadData call to a string - that strings which match url character codes are being converted.
ie
Url encoded characters strings like %2B are being converted to their normal characters (+ for %2B).
Is this happening in the WebClient class? Is it the way I am converting the byte array to a string?
EDIT:
Based on suggestions I tried changing to the DownloadString call from the WebClient class and the resulting string has converted the character codes to the specific character so it appears WebClient is the culprit.
EDIT 2:
Solved. By making a call to System.Web.HttpUtility.UrlEncode I was able to convert the + back to %2B before sending the viewstate string back up to the server in subsequent requests. I am still at a loss as to where and why the problem was occurring but the server was expecting a viewstate string that contained ...%2B... and was getting ...+... and determining the viewstate to be invalid and throwing the exception. Kudos to Jon & Henk for forcing me to rethink my assumptions.

If you use System.Text.UTF8Encoding.Default then you're not using UTF-8 - you're using the default encoding for the system. It's equivalent to Encoding.Default, but in a more confusing form. Use Encoding.UTF8 to get a UTF-8 encoding... or use WebClient.DownloadString as Henk suggested.
On the other hand, it's not clear what you're trying to download. If you're trying to download geuinely binary data then you shouldn't be trying to convert it to a string at all.
It would help if you would clarify you question - try to provide a lot more context about what's making the requests, what's having problems, etc.

And what happened if you just use WebClient.DownloadString() instead of opening a binary stream?

Related

Decoding querystring values on the server side

I'm running a web service on my server using WCF and .Net 4. The contract type is WebGet. Here is my problem, at one point in time, someone was sending data through the querystring that was URL encoded. So I added HttpUtility.UrlDecode to decode the parameters. I think that fixed my issue at the time. Now, I've sent a URL encoded string to it and I see that the string is being URL decoded coming into the method (before even getting to the HttpUtility.UrlDecode).
So now I'm confused, if the .Net code is decoding it before it gets to my method, why would I need to call on decode explicitly? But for a time it wasn't, so is this a recent change to the underlying .Net framework?
My problem now is that my users are sending data (unencoded), where the data looks like this: "abc%1234" and I'm getting "abc34", the decoding is eating 3 characters. However, if I urlencode the % sign to be "abc%251234", the value coming into the method is "abc%1234" (what I expected) and then the call to HttpUtility.UrlDecode is changing it to "abc34" (which is not what I expected).
I'm not sure how to proceed here. Do I rip out the explicit call to URLDecode until it starts coming across encoded again or is there a better way to handle this?
It's a subtle thing in documentation, easily missed:
HttpRequest.QueryString Property
Property Value
NameValueCollection
The query string variables sent by the client. Keys and values are
URL-decoded.
So if you access the query string via HttpRequest.QueryString (or Params) collection they are already decoded.
You can get to the raw string in RawUrl, QueryString.ToString() (manually that is - re: manipulation, split, etc.).
End of day, %:
Because the percent ("%") character serves as the indicator for
percent-encoded octets, it must be percent-encoded as "%25" for that
octet to be used as data within a URI.
REF: RFC3986
Hth

C# Converting a string to bytes and then back to a string with default encoder mangles the string

I am troubleshooting a strange issue reported by a client which is caused by the application trying to parse invalid XML. I believe the root cause to be related to how the XML string is encoded and then decoded. I have an internal API that gets the XML string (which I know to be valid to begin with), then converts it to a byte array and wraps it with a readonly MemoryStream. Then on the other side, the stream is converted back to a string and then passed to XDocument.Parse(string). The latter call fails, saying "Data at the root level is invalid. Line 1, position 1." Anyway, I believe the root cause has to do with how I am encoding and then decoding the string. In fact, the following line of debugging code returns a different string than what was passed in.
Encoding.Default.GetString(Encoding.Default.GetBytes(GetMeAnXmlString())));
Using Encoding.Default on the way in and then back out yields a different string than what I started with. That's craaaazy. Any ideas?
Note:
I am using an API which I cannot change which retrieves the stream containing the XML, so I cannot alter the use of Encoding.Default. Doing so will risk production issues (a.k.a showstoppers) for clients where everything is working fine.
The long and short of it is that Encoding.Default is sketchy because of the code page aspect that Weeble mentioned.
http://msdn.microsoft.com/en-us/library/system.text.encoding.default%28v=vs.110%29.aspx
and http://blogs.msdn.com/b/shawnste/archive/2005/03/15/don-t-use-encoding-default.aspx
You'd likely be better off just deciding to use Encoding.Unicode or Encoding.UTF8.

Proper way to handle the ampersand character in JSON string send to REST web service

OK,
I am using the System.Runtime.Serialization and the DataContractJsonSerialization.
The problem is that in the request I send a value of a property with the & character. Say, AT&T, and I get a response with error: Invalid JSON Data.
I thought that the escaping would be done inside the library but now I see that the serialization is left untouched the ampersand & character.
Yes, for a JSON format this is valid.
But it will be a problem to my POST request since I need to send this to a server that if contains an ampersand will response with error, hence here I am.
HttpUtility.HtmlEncode is in the System.Web library and so the way to go is using Uri.EscapeUriString. I did this to try, but anyway, and without it all requests are working fine, except an ampersand is in a value.
EDIT: HttpUtility class is ported to the Windows Phone SDK but the prefer way to encode a string should be still Uri.EscapeUriString.
First thought was to get hands dirty and start replacing the special character which would cause a problem in the server, but, I wonder, is there another solution I should do, that it would be efficient and 'proper'?
I should tell that I use
// Convert the string into a byte array.
byte[] postBytes = Encoding.UTF8.GetBytes(data);
To convert the JSON to a byte[] and write to the Stream.
And,
request.ContentType = "application/x-www-form-urlencoded";
As the WebRequest.ContentType.
So, am I messed up for a reason or something I miss?
Thank you.
The problem was that I was encoding the whole request string including the key.
I had a request data={JSON} and I was formatting it, but the {JSON} part should only be encoded.
string requestData = "data=" + Uri.EncodeDataString(json) // worked perfect!
Stupid hole to step into.
Have you tried replacing the ampersand with & for the POST?

ASP.NET MVC Override POST parse of fields

I wrote an MVC action to receive a post from a service. My problem is that the service is posting multipart data with wrong encoding.
Let me give an example:
The service will post the "á" for the form field "text".
I see (using Wireshark), that the byte written on the packet is 225, which is the right byte for "á" in ISO-8859-1 .
I do Request.Form["text"] and actually get a strange (different) char.
I believe this is cause by .NET attempting to convert the value 225 to a unicode char, when converting to string using the utf-8 encoding, but couldn't, as 225 isn't valid for utf-8.
So my question is: Is there a way to override the parsing of those bytes to string?
You could try to add a HttpModule and try to overwrite the ContentEncoding property of the Request object. Though I'm not sure this will work.
It's possible to set the default encoding in Web.Config's GlobalizationSection. The setting is called RequestEncoding and is taken to effect only if the HTTP request of your service does not contain a ContentType header. See http://msdn.microsoft.com/en-us/library/system.web.configuration.globalizationsection.requestencoding.aspx
You can further use inside Web.config to set the above setting only to a specific directory / MVC controller.

Which HttpUtility decode method to use?

this may be a silly question, but it trips me up every time.
HttpUtility has the methods HtmlDecode and UrlDecode. Do these two methods decode anything (Html/Http related) I might find? When do I have to use them, and which one am I supposed to use?
Just now I hit an error. This is my error log:
Payment receiver was not payment#mysite.com. (it was payment%40mysite.com).
But, I wrapped the email address here in HttpUtility.HtmlDecode before using it. It turns out I have to use .UrlDecode instead, but this email address didn't come from a URL so this wasn't obvious to me.
Can someone clarify this?
See What is meant by htmlencode and urlencode?
It's the reverse of your case, but essentially you need to use UrlEncode/Decode anytime you are using an address of sorts (urls and yes, email addresses). HtmlEncode/Decode is for code that typically a browser would render (html/xml tags).
This same encoding is also used in Form POST requests as well.
My guess is something read it 'naked' without decoding it.
Html Encoding/Decoding is only used to escape strings that contain characters that would otherwise be interpreted as html control characters. The process turns the characters into html entities and back again.
Url Encoding is to get around the fact that many characters are not allowed in Uris; or because they too could be misinterpreted. Thus the percent encoding is used.
Percent encoding is also used in the body of http requests.
In both cases, of course, it's also a way of expressing a specific character code in a request/response independent of character sets; but equally, interpreting what is meant by a particular code can also be dependent on knowing a particular character set. Generally you don't worry about that - but it can be important (especially in the HTML case).
URLEncode converts characters that aren't allowed in a URL into character equivalents which are parsable as a URL. In your example # became %40. URLDecode reverses this.
HTMLEncode is similar to URLEncode, but the target environment is text NESTED inside of HTML. This helps the browser from interpereting your content as HTML, but when rendered it should look like the decoded version. HTMLDecode reverses this.
When you see %xx this means percent encoding has occured - this is a URL encoding scheme, so you need to use UrlEncode / UrlDecode.
The HtmlEncode and HtmlDecode methods are for encoding and decoding elements for HTML display - so things like & get encoded to & and > to >.

Categories

Resources