Decoding querystring values on the server side - c#

I'm running a web service on my server using WCF and .Net 4. The contract type is WebGet. Here is my problem, at one point in time, someone was sending data through the querystring that was URL encoded. So I added HttpUtility.UrlDecode to decode the parameters. I think that fixed my issue at the time. Now, I've sent a URL encoded string to it and I see that the string is being URL decoded coming into the method (before even getting to the HttpUtility.UrlDecode).
So now I'm confused, if the .Net code is decoding it before it gets to my method, why would I need to call on decode explicitly? But for a time it wasn't, so is this a recent change to the underlying .Net framework?
My problem now is that my users are sending data (unencoded), where the data looks like this: "abc%1234" and I'm getting "abc34", the decoding is eating 3 characters. However, if I urlencode the % sign to be "abc%251234", the value coming into the method is "abc%1234" (what I expected) and then the call to HttpUtility.UrlDecode is changing it to "abc34" (which is not what I expected).
I'm not sure how to proceed here. Do I rip out the explicit call to URLDecode until it starts coming across encoded again or is there a better way to handle this?

It's a subtle thing in documentation, easily missed:
HttpRequest.QueryString Property
Property Value
NameValueCollection
The query string variables sent by the client. Keys and values are
URL-decoded.
So if you access the query string via HttpRequest.QueryString (or Params) collection they are already decoded.
You can get to the raw string in RawUrl, QueryString.ToString() (manually that is - re: manipulation, split, etc.).
End of day, %:
Because the percent ("%") character serves as the indicator for
percent-encoded octets, it must be percent-encoded as "%25" for that
octet to be used as data within a URI.
REF: RFC3986
Hth

Related

Header encoding issues with Mailkit and Mimekit

When I use Mailkit to send emails, I noticed that it automatically decides to encode both the content as well as headers. Now, the content encoding is perfect however some email clients have difficulty decoding the headers which are like.
Is there a way to instruct the client to not encode certain headers?.
List-Unsubscribe:
=?us-ascii?q?=3Chttps=3A=2F=2Fbarlinkar=2Eus19=2Elist-manage=2Ecom=2Funsubscribe=3Fu=3D8c60690?=
=?us-ascii?q?5a7e637766f218816b&id=3D2e47bac84d&e=3D407e758886&c=3De27229afde=3E=2C?=
=?us-ascii?q?_=3Cmailto=3Aunsubscribe-mc=2Eus19=5F8c606905a7e637766f218816b=2Ee27229a?=
=?us-ascii?q?fde-407e758886=40mailin=2Emcsv=2Enet=3Fsubject=3Dunsubscribe=3E?=
X-Report-Abuse:
=?us-ascii?q?=3Chttps=3A=2F=2Fmailchimp=2Ecom=2Fcontact=2Fabuse=2F=3Fu=3D8c606905a7e637766f218?=
=?us-ascii?q?816b&id=3De27229afde&e=3D407e758886=3E?=
To: k****#****.***
EDIT: Jstedfast pointed out some errors and I fixed them but the overall result is the same.
I doubt the problem is that the header value is encoded. Your value is invalid to begin with.
Here's the raw value that you are using:
https://barlinkar.us19.list-manage.com/unsubscribe?u=8c606905a7e637766f218816b&id=2e47bac84d&e=407e758886&c=e27229afde>, <mailto:unsubscribe-mc.us19_8c606905a7e637766f218816b.e27229afde-407e758886#mailin.mcsv.net?subject=unsubscribe>List - Unsubscribe - Post: List - Unsubscribe = One - Click
Do you see anything wrong with that?
First, each URL should be enclosed in <>'s. Your first URL is missing the leading < character.
Secondly, you are including the List-Unsubscribe-Post header in the value of the List-Unsubscribe header. They need to be 2 distinct headers.
In other words, the receiving client is probably getting confused as to what the value is supposed to be because it is completely borked.

Accessing Key/Value pairs using HttpListener

EDIT: Please note that this is a WinForms application, not a web app.
I am using the WebClient.UploadValues(Uri, "POST", NameValueCollection) to send values to an instance of HttpListener. On the listener side, when the HttpListener.GetContext() method returns, I can access the sent data as a byte [].
I can convert this data to text using EncodingXXX.GetString(buffer) which returns the following:
Key1=Value1
Key2=Value2
...
Each item in the string is delimited by the ampersand sign &. Both the key and value are encoded using HttpUtility.HtmlEncode/HttpUtility.HtmlDecode so I can split the data based on ampersands fine. The equal = sign, however does not get encoded if the key or value contains it.
The equal sign in the data is to be expected and since HtmlEncode does not take care of it, are there other standard utility classes that can help out? I'd like to avoid manual string replacement if possible since it is error-prone.
It turns out HttpUtility.UrlEncode / HttpUtility.UrlDecode are better suited to this kind of data.

C# Converting a string to bytes and then back to a string with default encoder mangles the string

I am troubleshooting a strange issue reported by a client which is caused by the application trying to parse invalid XML. I believe the root cause to be related to how the XML string is encoded and then decoded. I have an internal API that gets the XML string (which I know to be valid to begin with), then converts it to a byte array and wraps it with a readonly MemoryStream. Then on the other side, the stream is converted back to a string and then passed to XDocument.Parse(string). The latter call fails, saying "Data at the root level is invalid. Line 1, position 1." Anyway, I believe the root cause has to do with how I am encoding and then decoding the string. In fact, the following line of debugging code returns a different string than what was passed in.
Encoding.Default.GetString(Encoding.Default.GetBytes(GetMeAnXmlString())));
Using Encoding.Default on the way in and then back out yields a different string than what I started with. That's craaaazy. Any ideas?
Note:
I am using an API which I cannot change which retrieves the stream containing the XML, so I cannot alter the use of Encoding.Default. Doing so will risk production issues (a.k.a showstoppers) for clients where everything is working fine.
The long and short of it is that Encoding.Default is sketchy because of the code page aspect that Weeble mentioned.
http://msdn.microsoft.com/en-us/library/system.text.encoding.default%28v=vs.110%29.aspx
and http://blogs.msdn.com/b/shawnste/archive/2005/03/15/don-t-use-encoding-default.aspx
You'd likely be better off just deciding to use Encoding.Unicode or Encoding.UTF8.

Which HttpUtility decode method to use?

this may be a silly question, but it trips me up every time.
HttpUtility has the methods HtmlDecode and UrlDecode. Do these two methods decode anything (Html/Http related) I might find? When do I have to use them, and which one am I supposed to use?
Just now I hit an error. This is my error log:
Payment receiver was not payment#mysite.com. (it was payment%40mysite.com).
But, I wrapped the email address here in HttpUtility.HtmlDecode before using it. It turns out I have to use .UrlDecode instead, but this email address didn't come from a URL so this wasn't obvious to me.
Can someone clarify this?
See What is meant by htmlencode and urlencode?
It's the reverse of your case, but essentially you need to use UrlEncode/Decode anytime you are using an address of sorts (urls and yes, email addresses). HtmlEncode/Decode is for code that typically a browser would render (html/xml tags).
This same encoding is also used in Form POST requests as well.
My guess is something read it 'naked' without decoding it.
Html Encoding/Decoding is only used to escape strings that contain characters that would otherwise be interpreted as html control characters. The process turns the characters into html entities and back again.
Url Encoding is to get around the fact that many characters are not allowed in Uris; or because they too could be misinterpreted. Thus the percent encoding is used.
Percent encoding is also used in the body of http requests.
In both cases, of course, it's also a way of expressing a specific character code in a request/response independent of character sets; but equally, interpreting what is meant by a particular code can also be dependent on knowing a particular character set. Generally you don't worry about that - but it can be important (especially in the HTML case).
URLEncode converts characters that aren't allowed in a URL into character equivalents which are parsable as a URL. In your example # became %40. URLDecode reverses this.
HTMLEncode is similar to URLEncode, but the target environment is text NESTED inside of HTML. This helps the browser from interpereting your content as HTML, but when rendered it should look like the decoded version. HTMLDecode reverses this.
When you see %xx this means percent encoding has occured - this is a URL encoding scheme, so you need to use UrlEncode / UrlDecode.
The HtmlEncode and HtmlDecode methods are for encoding and decoding elements for HTML display - so things like & get encoded to & and > to >.

Url encoded characters, ViewState and byte[] arrays oh my

I am trying to throw together a screen scraper and keep getting invalid viewstate issues. It appears that during the System.Net.WebClient download of data or the System.Text.UTF8Encoding.Default.GetString call to convert the byte array returned by the WebClient DownloadData call to a string - that strings which match url character codes are being converted.
ie
Url encoded characters strings like %2B are being converted to their normal characters (+ for %2B).
Is this happening in the WebClient class? Is it the way I am converting the byte array to a string?
EDIT:
Based on suggestions I tried changing to the DownloadString call from the WebClient class and the resulting string has converted the character codes to the specific character so it appears WebClient is the culprit.
EDIT 2:
Solved. By making a call to System.Web.HttpUtility.UrlEncode I was able to convert the + back to %2B before sending the viewstate string back up to the server in subsequent requests. I am still at a loss as to where and why the problem was occurring but the server was expecting a viewstate string that contained ...%2B... and was getting ...+... and determining the viewstate to be invalid and throwing the exception. Kudos to Jon & Henk for forcing me to rethink my assumptions.
If you use System.Text.UTF8Encoding.Default then you're not using UTF-8 - you're using the default encoding for the system. It's equivalent to Encoding.Default, but in a more confusing form. Use Encoding.UTF8 to get a UTF-8 encoding... or use WebClient.DownloadString as Henk suggested.
On the other hand, it's not clear what you're trying to download. If you're trying to download geuinely binary data then you shouldn't be trying to convert it to a string at all.
It would help if you would clarify you question - try to provide a lot more context about what's making the requests, what's having problems, etc.
And what happened if you just use WebClient.DownloadString() instead of opening a binary stream?

Categories

Resources