URL unicode parameters decoding C#

URL unicode parameters decoding C# - c#

I got a URL which contains parameters, one of which is with Cyrillic letters.
http://localhost/Print.aspx?id=4&subwebid=243572&docnumber=%u0417%u041f005637-1&deliverypoint=4630013519990
Doc-number must be ЗП005637-1.
I have tried the following code, but string is still with those characters %u0417%u041f.
public static String DecodeUrlString(this String url)
{
String newUrl;
while ((newUrl = Uri.UnescapeDataString(url)) != url)
url = newUrl;
return newUrl;
}
It's not a possibility to use HttpUtility.

If your goal is to avoid a dependency on System.Web.dll, then you would normally use the equivalent method in the WebUtility Class: WebUtility.UrlDecode Method.
However, you will find that, even then, your url won't get decoded the way you want it to.
This is because WebUtility.UrlDecode does not handle the %uNNNN escape notation on purpose. Notice this comment in the source code:
// *** Source: alm/tfs_core/Framework/Common/UriUtility/HttpUtility.cs
// This specific code was copied from above ASP.NET codebase.
// Changes done - Removed the logic to handle %Uxxxx as it is not standards compliant.
As stated in the comment, the %uNNNN escape format is not standard compliant and should be avoided if possible. You can find more info on this and on the proper way of encoding urls from this thread.
If you have any control over how the url is generated, consider changing it to be standard-compliant. Otherwise, consider adding System.Web.dll as a dependency, find another third-party library that does the job, or write your own decoder. As commented already, the source code is out there.

Related

How to use AntiXss with a Web API

This is a question that has been asked before, but I've not found the information I'm looking for or maybe I'm just missing the point so please bear with me. I can always adjust my question if I'm asking it the wrong way.
If for example, I have a POST endpoint that use a simply DTO object with 2 properties (i.e. companyRequestDto) and contains a script tag in one of its properties. When I call my endpoint from Postman I use the following:
{
"company": "My Company<script>alert(1);</script>",
"description": "This is a description"
}
When it is received by the action in my endpoint,
public void Post(CompanyRequestDto companyRequestDto)
my DTO object will automatically be set and its properties will be set to:
companyDto.Company = "My Brand<script>alert(1);</script>";
companyDto.Description = "This is a description";
I clearly don't want this information to be stored in our database as is, nor do I want it stored as an escaped string as displayed above.
1) Request: So my first question is how do I throw an error if the DTO posted contains some invalid content such as the tag?
I've looked at Microsoft AntiXss but I don't understand how to handle this as the data provided in the properties of a DTO object is not an html string but just a string, so What I am missing here as I don't understand how this is helping sanitizing or validating the passed data.
When I call
var test = AntiXss.AntiXssEncoder.HtmlEncode(companyRequestDto.Company, true);
It returns an encoded string, but then what??
Is there a way to remove disallowed keywords or just simply throw an error?
2) Response: Assuming 1) was not implemented or didn't work properly and it ended up being stored in our database, am I suppose to return encoded data as a json string, so instead of returning:
"My company"
Am I suppose to return:
"My Company<script>alert(1)</script>"
Is the browser (or whatever app) just supposed to display as below then?:
"My Company<script>alert(1)</script>"
3) Code: Assuming there is a way to sanitize or throw an error, should I use this at the property level using attribute on all the properties of my various DTO objects or is there a way to apply this at the class level using an attribute that will validate and/or sanitize all string properties of a DTO object for example?
I found interesting articles but none really answering my problems or I'm having other problems with some of the answers:
asp.net mvc What is the difference between AntiXss.HtmlEncode and HttpUtility.HtmlEncode?
Stopping XSS when using WebAPI (currently looking into this one but don't see how example is solving problem as property is always failing whether I use the script tag or not)
how to sanitize input data in web api using anti xss attack (also looking at this one but having a problem calling ReadFromStreamAsync from my project at work. Might be down to some of the settings in my web.config but haven't figured out why but it always seems to return an empty string)
Thanks.
UPDATE 1:
I've just finished going through the answer from Stopping XSS when using WebAPI
This is probably the closest one to what I am looking for. Except I don't want to encode the data, as I don't want to store it in my database, so I'll see if I can figure out how to throw an error but I'm not sure what the condition will be. Maybe I should just look for characters such as <, >, ; , etc... as these will not likely be used in any of our fields.

You need to consider where your data will be used when you think about encoding, so that data with in it is only a problem if it's rendered as HTML so if you are going to display data that has been provided by users anywhere, it's probably at the point you are going to display it that you would want to html encode it for display (you want to avoid repeatedly html encoding the same string when saving it for example).
Again, it depends what the response is going to be used for... you probably want to html encode it at the point it's going to be displayed... remember if you are encoding something in the response it may not match whats in data so if the calling code could do something like call your API to search for a company with that name that could cause problems. If the browser does display the html encoded version it might look ugly but it's better than users being compromised by XSS attacks.
It's quite difficult to sanitize text for things like tags if you allow most characters for normal use. It's easier if you can whitelist characters allowed and only allow, say, alphanumeric but that isn't often possible. This can be done using a regex validation attribute on the DTO object. The best approach I think is to encode values for display if you can't stop certain characters. It's really difficult to try to allow all characters but avoid things like as people can start using ascii characters etc.

Preserve special characters when parsing variables from http request to c# string

I am working on a small application based on owin and katana to handle links internally.
So the application handles HttpGet requests. When someone calls
http:localhost/?document=path/to/my/document/foo.doc
the application opens this document.
My problem is: When the document name contains a special character like '+' my code interprets the + sign as space because the variable is parsed into a string.
[HttpGet]
[Route("")]
public HttpResponseMessage Get(string document = "")
{
//open document
}
So how to preserve the special characters and don't allow c# string to convert them before executing any code?
I tried with HttpResponseMessage Get([FromUri]string document = "")
I tried encoding the document variable afterwards with HttpUtility.UrlEncode but it will also encode the legit spaces.

Meanwhile I got a workaround:
this.Request.RequestUri.OriginalString
inside of the GET Method will give the the full link as string without any interpretations. The rest is to get the relevant variable with string.substring operations.
Nevertheless it would like to know if someone knows a more elegant solution.

How can I check if a string is url friendly

I'm making an ecommerce application and I want the user to be able to put content at a URL they have specified. IF a user were to put in something like "/thank-you!", how can I clean the string to either be a valid URL or check this is valid URL format? I would want the url to basically always be hyphened between words so like "/thank-you" from "/thankyou". What's the best approach for achieving such a thing. I'm within c# using .NET MVC 4.

Alas, I cannot comment 'possible duplicate' yet (How to check whether a string is a valid HTTP URL?).
As this must be an answer however, one way to validate a string URL would be using the URI.TryCreate functioanlity. See here also https://msdn.microsoft.com/en-us/library/system.uri.trycreate(v=vs.110).aspx
URI is also the preferred data type for URLs, rather than strings.

Request.QueryString[] vs. Request.Query.Get() vs. HttpUtility.ParseQueryString()

I searched SO and found similar questions, but none compared all three. That surprised me, so if someone knows of one, please point me to it.
There are a number of different ways to parse the query string of a request... the "correct" way (IMO) should handle null/missing values, but also decode parameter values as appropriate. Which of the following would be the best way to do both?
Method 1
string suffix = Request.QueryString.Get("suffix") ?? "DefaultSuffix";
Method2
string suffix = Request.QueryString["suffix"] ?? "DefaultSuffix";
Method 3
NameValueCollection params = HttpUtility.ParseQueryString(Request.RawUrl);
string suffix = params.Get("suffix") ?? "DefaultSuffix";
Method 4
NameValueCollection params = HttpUtility.ParseQueryString(Request.RawUrl);
string suffix = params["suffix"] ?? "DefaultSuffix";
Questions:
Would Request.QueryString["suffix"] return a null if no suffix was specified?
(Embarrassingly basic question, I know)
Does HttpUtility.ParseQueryString() provide any extra functionality over accessing Request.QueryString directly?
The MSDN documentation lists this warning:
The ParseQueryString method uses query strings that might contain user input, which is a potential security threat. By default, ASP.NET Web pages validate that user input does not include script or HTML elements. For more information, see Script Exploits Overview.
But it's not clear to me if that means ParseQueryString() should be used to handle that, or is exposed to security flaws because of it... Which is it?
ParseQueryString() uses UTF8 encoding by default... do all browsers encode the query string in UTF8 by default?
ParseQueryString() will comma-separate values if more than one is specified... does Request.QueryString() do that as well, or what happens if it doesn't?
Which of those methods would correctly decode "%2b" to be a "+"?
Showing my Windows development roots again... and I would be a much faster developer if I didn't wonder about these things so much... : P

Methods #1 and #2 are the same thing, really. (I think the .Get() method is provided for language compatibility.)
ParseQueryString returns you something that is the functional equivalent of Request.Querystring. You would usually use it when you have a raw URL and no other way to parse the query string parameters from it. Request.Querystring does that for you, so in this case, it's not needed.
You can't leave off "suffix". You either have to pass a string or an index number. If you leave off the [] entirely, you get the whole NameValueCollection. If you mean what if "suffix" was not one of the QueryString values then yes; you would get null if you called Request.QueryString["suffix"].
No. The most likely time you would use it is if you had an external URL and wanted to parse the query string parameters from it.
ParseQueryString does not handle it... neither does pulling the values straight from Request.QueryString. For ASP.NET, you usually handle form values as the values of controls, and that is where ASP.NET usually 'handles' these things for you. In other words: DON'T TRUST USER INPUT Ever. No matter what framework is doing what ever for you.
I have no clue (I think no). However, I think what you are reading is telling you that ParseQueryString is returning UTF-8 encoded text - regardless if it was so encoded when it came in.
Again: ParseQueryString returns basically the same thing you get from Request.QueryString. In fact, I think ParseQueryString is used internally to provide Request.QueryString.
They would produce the equivalent; they will all properly decode the values submitted. If you have URL: http://site.com/page.aspx?id=%20Hello then call Request.QueryString["id"] the return value will be " Hello", because it automatically decodes.

Example 1:
string itsMeString = string.IsNullOrEmpty(Request.QueryString["itsMe"]) ? string.Empty : HttpUtillity.UrlDecode(Request.QueryString["itsMe"]);
Stright to your questions:
Not quite sure what do you mean by suffix, if you are asking what happens if the key is not present(you don't have it in the QueryString) - yes it will return null.
My GUESS here is that when constructed, Request.QueryString internally calls HttpUtillity.ParseQueryString() method and caches the NameValueCollection for subsequential access. I think the first is only left so you can use it over a string that is not present in the Request, for example if you are scrapping a web page and need to get some arguments from a string you've found in the code of that page. This way you won't need to construct an Uri object but will be able to get just the query string as a NameValueCollection if you are sure you only need this. This is a wild guess ;).)
This is implemented on a page level so if you are accessing the QueryString let's say in Page_Load event handler, you are having a valid and safe string (ASP.NET will throw an exception otherwise and will not let the code flow enter the Page_Load so you are protected from storing XSS in your database, the exception will be: "A potentially dangerous Request.QueryString value was detected from the client, same as if a post variable contains any traces of XSS but instead Request.Form the exception says Request.QueryString."). This is so if you let the "validateRequest" switched on (by default it is). The ASP.NET pipeline will throw an exception earlier, so you don't have the chance to save any XSS things to your store (Database). Switching it off implies you know what you're doing so you will then need to implement the security yourself (by checking what's comming in).
Probably it will be safe to say yes. Anyway, since you will in most cases generating the QueryString on your own (via JavaScript or server side code - be sure to use HttpUtillity.UrlEncode for backend code and escape for JavaScript). This way the browser will be forced to turn "It's me!" to "It%27s%20me%21". You can refer to this article for more on Url Encoding in JavaScript: http://www.javascripter.net/faq/escape.htm.
Please elaborate on that, couldn't quite get what do you mean by "will comma-separate values if more than one is specified.".
As far as I remember, none of them will. You will probably need to call HttpUtillity.UrlDecode / HttpUtillity.HtmlDecode (based on what input do you have) to get the string correctly, in the above example with "It's me!" you will do something like (see Example 1 as something's wrong with the code formatting if I put it after the numbered list).

How do I parse and rebuild a URL in ASP.Net?

I'm devloping a C#/ASP.Net app and I'm trying to find a means of breaking down a URL into its component parts, then swapping out, or deleting these parts and creating a new URL.
For example if I have the following URL:
https://www.site.com/page.aspx?parm1=value1&parm2=value2
I'd like to split the URL down into:
Protocol (http, https, ftp, etc)
Domain (www.site.com)
Page (page.aspx)
URL parameters (parm1 = value1, parm2 = value2)
Once the URL is split down I'd like to manipulate each of the parts, for example:
add or remove parameters
change the value of parameters
change the page from page.aspx to page2.aspx
Then once I'm done create a new URL ready for use with the above changes.
I've checked out the MSDN documentation etc and can't find a utility class in .Net to take care of this. Any ideas?
Cheers,
Steve

The framework comes with the UriBuilder class for this purpose.
It has get/set properties for the things you need:
Protocol: Scheme property
Domain: Host property
Page: Path property (will give you whole path, you might need to do some processing here).
Parameters: Query property (exposed as a string, you might need to do some processing on the string your self).
When you are done manipulating the UriBuilder, use the Uri property to get the result as a Uri object, or just ToString() if you just need the URL as a string.

Start by using UriBuilder (see driis's answer).
To parse the Query property use:
NameValueCollection q=HttpUtility.ParseQueryString(uri.Query);
You actually get an HttpValueCollection (internal) - so when you later call q.ToString() you'll get an url encoded query string back.
Since the class is internal you need to call
NameValueCollection q=HttpUtility.ParseQueryString("");
if you want to build the query string from scratch.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.