How to use AntiXss with a Web API

How to use AntiXss with a Web API - c#

This is a question that has been asked before, but I've not found the information I'm looking for or maybe I'm just missing the point so please bear with me. I can always adjust my question if I'm asking it the wrong way.
If for example, I have a POST endpoint that use a simply DTO object with 2 properties (i.e. companyRequestDto) and contains a script tag in one of its properties. When I call my endpoint from Postman I use the following:
{
"company": "My Company<script>alert(1);</script>",
"description": "This is a description"
}
When it is received by the action in my endpoint,
public void Post(CompanyRequestDto companyRequestDto)
my DTO object will automatically be set and its properties will be set to:
companyDto.Company = "My Brand<script>alert(1);</script>";
companyDto.Description = "This is a description";
I clearly don't want this information to be stored in our database as is, nor do I want it stored as an escaped string as displayed above.
1) Request: So my first question is how do I throw an error if the DTO posted contains some invalid content such as the tag?
I've looked at Microsoft AntiXss but I don't understand how to handle this as the data provided in the properties of a DTO object is not an html string but just a string, so What I am missing here as I don't understand how this is helping sanitizing or validating the passed data.
When I call
var test = AntiXss.AntiXssEncoder.HtmlEncode(companyRequestDto.Company, true);
It returns an encoded string, but then what??
Is there a way to remove disallowed keywords or just simply throw an error?
2) Response: Assuming 1) was not implemented or didn't work properly and it ended up being stored in our database, am I suppose to return encoded data as a json string, so instead of returning:
"My company"
Am I suppose to return:
"My Company<script>alert(1)</script>"
Is the browser (or whatever app) just supposed to display as below then?:
"My Company<script>alert(1)</script>"
3) Code: Assuming there is a way to sanitize or throw an error, should I use this at the property level using attribute on all the properties of my various DTO objects or is there a way to apply this at the class level using an attribute that will validate and/or sanitize all string properties of a DTO object for example?
I found interesting articles but none really answering my problems or I'm having other problems with some of the answers:
asp.net mvc What is the difference between AntiXss.HtmlEncode and HttpUtility.HtmlEncode?
Stopping XSS when using WebAPI (currently looking into this one but don't see how example is solving problem as property is always failing whether I use the script tag or not)
how to sanitize input data in web api using anti xss attack (also looking at this one but having a problem calling ReadFromStreamAsync from my project at work. Might be down to some of the settings in my web.config but haven't figured out why but it always seems to return an empty string)
Thanks.
UPDATE 1:
I've just finished going through the answer from Stopping XSS when using WebAPI
This is probably the closest one to what I am looking for. Except I don't want to encode the data, as I don't want to store it in my database, so I'll see if I can figure out how to throw an error but I'm not sure what the condition will be. Maybe I should just look for characters such as <, >, ; , etc... as these will not likely be used in any of our fields.

You need to consider where your data will be used when you think about encoding, so that data with in it is only a problem if it's rendered as HTML so if you are going to display data that has been provided by users anywhere, it's probably at the point you are going to display it that you would want to html encode it for display (you want to avoid repeatedly html encoding the same string when saving it for example).
Again, it depends what the response is going to be used for... you probably want to html encode it at the point it's going to be displayed... remember if you are encoding something in the response it may not match whats in data so if the calling code could do something like call your API to search for a company with that name that could cause problems. If the browser does display the html encoded version it might look ugly but it's better than users being compromised by XSS attacks.
It's quite difficult to sanitize text for things like tags if you allow most characters for normal use. It's easier if you can whitelist characters allowed and only allow, say, alphanumeric but that isn't often possible. This can be done using a regex validation attribute on the DTO object. The best approach I think is to encode values for display if you can't stop certain characters. It's really difficult to try to allow all characters but avoid things like as people can start using ascii characters etc.

Related

SELECT from PHP (MySQL) Back Into Android (C#)

I have an Android app and I'm attempting to use PHP/MySQL.
I'm having a lot of trouble getting my results from PHP accessible in C#/Android.
This is my PHP so far:
$sql = "SELECT Name FROM Employees WHERE Password='$password'";
if(!$result = $mysqli->query($sql)) {
echo "Sorry, the query was unsuccessful";
}
while($employee = $result->fetch_assoc()) {
$jsonResult = json_encode($employee);
$employee->close();
}
I've left out the basic connection code as I have all that up and running. Here is my C#:
private void OnLoginButtonClick()
{
var mClient = new WebClient();
mClient.DownloadDataAsync(new Uri("https://127.0.0.1/JMapp/Login.php?password=" + _passwordEditText.Text));
}
As you can see I really am at a very basic stage. I've installed Newtonsoft so I'm ready to deal with the Json that is coming back, however I have a few questions.
I'm well aware of SQL injection, and the way that my variable (password) is passed to the PHP concerns me. Is there a safer way of doing this?
Secondly, I am now unsure of how to get the 'Employees' that match the MySQL command in PHP back into C#. How am I able to access the object that is passed back from PHP?

Leaving aside other aspects of the code in the question, I sugest some reading on sanitizing and escaping user data.
For this specific case of a password see #Jay Blanchard comments. For other input you would not trasform upon input, the idea is to sanitize it as soon as you receive it.
This is to make sure you receive what you were expecting. In the case of a String, trim() the text, match it against a regex of allowed characters. If you allow html tags or not you can match it against a white list of them. Max length.
Then you would validate it. This is that it makes sense and meets the business requirements.
At the time of storing it in the database you can avoid sqlinjection by using prepared statements. By doing this it is clear what is text to be stored and what is sql instructions.
At the time of using the data, you will escape it accoring to where it is going to be used, for example, if it is html content you escape it for html content, if it is an html attribute, or an URL parameter, you do the escaping accordingly for each case. (Wordpress has a nice suite of functions that do this)
Also don't send passwords as URL parameters. Use a form instead with method POST. Urls are seen in the Browser's address widget. And they also get copy pasted in emails, facebook, etc

Request.QueryString[] vs. Request.Query.Get() vs. HttpUtility.ParseQueryString()

I searched SO and found similar questions, but none compared all three. That surprised me, so if someone knows of one, please point me to it.
There are a number of different ways to parse the query string of a request... the "correct" way (IMO) should handle null/missing values, but also decode parameter values as appropriate. Which of the following would be the best way to do both?
Method 1
string suffix = Request.QueryString.Get("suffix") ?? "DefaultSuffix";
Method2
string suffix = Request.QueryString["suffix"] ?? "DefaultSuffix";
Method 3
NameValueCollection params = HttpUtility.ParseQueryString(Request.RawUrl);
string suffix = params.Get("suffix") ?? "DefaultSuffix";
Method 4
NameValueCollection params = HttpUtility.ParseQueryString(Request.RawUrl);
string suffix = params["suffix"] ?? "DefaultSuffix";
Questions:
Would Request.QueryString["suffix"] return a null if no suffix was specified?
(Embarrassingly basic question, I know)
Does HttpUtility.ParseQueryString() provide any extra functionality over accessing Request.QueryString directly?
The MSDN documentation lists this warning:
The ParseQueryString method uses query strings that might contain user input, which is a potential security threat. By default, ASP.NET Web pages validate that user input does not include script or HTML elements. For more information, see Script Exploits Overview.
But it's not clear to me if that means ParseQueryString() should be used to handle that, or is exposed to security flaws because of it... Which is it?
ParseQueryString() uses UTF8 encoding by default... do all browsers encode the query string in UTF8 by default?
ParseQueryString() will comma-separate values if more than one is specified... does Request.QueryString() do that as well, or what happens if it doesn't?
Which of those methods would correctly decode "%2b" to be a "+"?
Showing my Windows development roots again... and I would be a much faster developer if I didn't wonder about these things so much... : P

Methods #1 and #2 are the same thing, really. (I think the .Get() method is provided for language compatibility.)
ParseQueryString returns you something that is the functional equivalent of Request.Querystring. You would usually use it when you have a raw URL and no other way to parse the query string parameters from it. Request.Querystring does that for you, so in this case, it's not needed.
You can't leave off "suffix". You either have to pass a string or an index number. If you leave off the [] entirely, you get the whole NameValueCollection. If you mean what if "suffix" was not one of the QueryString values then yes; you would get null if you called Request.QueryString["suffix"].
No. The most likely time you would use it is if you had an external URL and wanted to parse the query string parameters from it.
ParseQueryString does not handle it... neither does pulling the values straight from Request.QueryString. For ASP.NET, you usually handle form values as the values of controls, and that is where ASP.NET usually 'handles' these things for you. In other words: DON'T TRUST USER INPUT Ever. No matter what framework is doing what ever for you.
I have no clue (I think no). However, I think what you are reading is telling you that ParseQueryString is returning UTF-8 encoded text - regardless if it was so encoded when it came in.
Again: ParseQueryString returns basically the same thing you get from Request.QueryString. In fact, I think ParseQueryString is used internally to provide Request.QueryString.
They would produce the equivalent; they will all properly decode the values submitted. If you have URL: http://site.com/page.aspx?id=%20Hello then call Request.QueryString["id"] the return value will be " Hello", because it automatically decodes.

Example 1:
string itsMeString = string.IsNullOrEmpty(Request.QueryString["itsMe"]) ? string.Empty : HttpUtillity.UrlDecode(Request.QueryString["itsMe"]);
Stright to your questions:
Not quite sure what do you mean by suffix, if you are asking what happens if the key is not present(you don't have it in the QueryString) - yes it will return null.
My GUESS here is that when constructed, Request.QueryString internally calls HttpUtillity.ParseQueryString() method and caches the NameValueCollection for subsequential access. I think the first is only left so you can use it over a string that is not present in the Request, for example if you are scrapping a web page and need to get some arguments from a string you've found in the code of that page. This way you won't need to construct an Uri object but will be able to get just the query string as a NameValueCollection if you are sure you only need this. This is a wild guess ;).)
This is implemented on a page level so if you are accessing the QueryString let's say in Page_Load event handler, you are having a valid and safe string (ASP.NET will throw an exception otherwise and will not let the code flow enter the Page_Load so you are protected from storing XSS in your database, the exception will be: "A potentially dangerous Request.QueryString value was detected from the client, same as if a post variable contains any traces of XSS but instead Request.Form the exception says Request.QueryString."). This is so if you let the "validateRequest" switched on (by default it is). The ASP.NET pipeline will throw an exception earlier, so you don't have the chance to save any XSS things to your store (Database). Switching it off implies you know what you're doing so you will then need to implement the security yourself (by checking what's comming in).
Probably it will be safe to say yes. Anyway, since you will in most cases generating the QueryString on your own (via JavaScript or server side code - be sure to use HttpUtillity.UrlEncode for backend code and escape for JavaScript). This way the browser will be forced to turn "It's me!" to "It%27s%20me%21". You can refer to this article for more on Url Encoding in JavaScript: http://www.javascripter.net/faq/escape.htm.
Please elaborate on that, couldn't quite get what do you mean by "will comma-separate values if more than one is specified.".
As far as I remember, none of them will. You will probably need to call HttpUtillity.UrlDecode / HttpUtillity.HtmlDecode (based on what input do you have) to get the string correctly, in the above example with "It's me!" you will do something like (see Example 1 as something's wrong with the code formatting if I put it after the numbered list).

Which HttpUtility decode method to use?

this may be a silly question, but it trips me up every time.
HttpUtility has the methods HtmlDecode and UrlDecode. Do these two methods decode anything (Html/Http related) I might find? When do I have to use them, and which one am I supposed to use?
Just now I hit an error. This is my error log:
Payment receiver was not payment#mysite.com. (it was payment%40mysite.com).
But, I wrapped the email address here in HttpUtility.HtmlDecode before using it. It turns out I have to use .UrlDecode instead, but this email address didn't come from a URL so this wasn't obvious to me.
Can someone clarify this?

See What is meant by htmlencode and urlencode?
It's the reverse of your case, but essentially you need to use UrlEncode/Decode anytime you are using an address of sorts (urls and yes, email addresses). HtmlEncode/Decode is for code that typically a browser would render (html/xml tags).

This same encoding is also used in Form POST requests as well.
My guess is something read it 'naked' without decoding it.
Html Encoding/Decoding is only used to escape strings that contain characters that would otherwise be interpreted as html control characters. The process turns the characters into html entities and back again.
Url Encoding is to get around the fact that many characters are not allowed in Uris; or because they too could be misinterpreted. Thus the percent encoding is used.
Percent encoding is also used in the body of http requests.
In both cases, of course, it's also a way of expressing a specific character code in a request/response independent of character sets; but equally, interpreting what is meant by a particular code can also be dependent on knowing a particular character set. Generally you don't worry about that - but it can be important (especially in the HTML case).

URLEncode converts characters that aren't allowed in a URL into character equivalents which are parsable as a URL. In your example # became %40. URLDecode reverses this.
HTMLEncode is similar to URLEncode, but the target environment is text NESTED inside of HTML. This helps the browser from interpereting your content as HTML, but when rendered it should look like the decoded version. HTMLDecode reverses this.

When you see %xx this means percent encoding has occured - this is a URL encoding scheme, so you need to use UrlEncode / UrlDecode.
The HtmlEncode and HtmlDecode methods are for encoding and decoding elements for HTML display - so things like & get encoded to & and > to >.

How do I implement a search box in an ASP.NET MVC application?

I need to implement a "Search" box on a C# MVC application that I'm writting.
I've never had to implement a "Search" box before and I've been looking for some best practices and I'm not quite finding what I'm looking for.
I really like how the search works on stackoverflow.
If I type in a few random words, it navigates to the url http://stackoverflow/search?q=few+random+words.
If I type in title:random, it navigates to the url https://stackoverflow.com/search?q=title%3Arandom
What is happening both on the client (when I hit the enter key) and on the server to make the search happen?
I've purposely left out any thoughts I've already had on what is happening because I don't want to bias the answers (or show my ignorance).
EDIT: I'm adding some specifics to this question.
Where and how are the search terms transformed into the querystring parameters? ie few random words transformed into few+random+words,title:random transformed into title%3Arandom
Where and how is few+random+words tranformed into variables used in a query?
Is the query just one big Where clause that keeps appending "and" for each item that lands between the + signs?
I guess you could parse through the strings and do some replaces to achieve 1 and 2 but it sure looks like there is something already available that would automatically convert (and revert) the search strings. I'm trying to be prepared for my user's typing ANYTHING in the search box.

These posts will help you in understanding how it works
https://blog.stackoverflow.com/2008/10/stack-overflow-search-now-51-less-crappy/
https://blog.stackoverflow.com/2009/07/stack-overflow-search-now-61-less-crappy/
https://blog.stackoverflow.com/2011/01/stack-overflow-search-now-81-less-crappy/

Those URL's use what is called a query string. It is a "GET" request that allows the client script (javascript) as well as the back end code retrieve the users "query". In a URL whenever you see a '?' it is the beginning of the query string. This allows somebody to be like:
http://google.com?q=Stuff%20to%20Search%20here
Multiple parameters can be added via &anothercommand=somethingelse
Thus allowing a program or script to invoke a search on google without having to type anything into the box.
You can get access to the query string using the C# "Request.QueryString["parameter"]" where in this case the parameter for those stack overflow URL's would be "q".
After that, you query your database and return the results. Since I'm not sure how good at coding you are, I am not sure if you're trying to ask for the Web site, or the C# SQL side. If I am wrong, apologies.
On the Client:
The way I imagine it is happening on the client is that the script on the textbox when the form is submitted redirects to the url you mentioned and adds in those query parameters into the url string. Don't forget to url encode. This is built into javascript. i.e. space ' ' becomes '%20'
When the form submits, the server code does a check to see if there are any query string parameters of the form "q". If there are, and it is not null, it would query the database, returning that in one of a few ways, most likely through a server control.
1) That is what URL encoding is. It is a list of characters that are not supported in a URL. Thus they need to be changed. There is a standard set such as %20 for space. In javascript, you would redirect to the results page with the query string you want. prior to redirecting, use the information here to encode it. i.e. change ' ' into + or %20 (it really should be %20, I find + is usually the internet explorer way.
)
2) Query string works like a hashtable of key pair values. Using the Request.QueryString, you can select the key "q" and receive the string "few random words". That would then get substituted into your SQL query. This is done on the C# side as a very first check to see if the parameter q exists.
3) you can do your query many different ways. However, searching for "and" etc will give you many different results. What you can do is parse out a list of common words, and then rank results based on the number of results of each word. i.e. in the most simplistic of searches which would be ill advised for LARGE databases "..... Where like '%word% or '%word2%' etc. To get each word, do a string.split.

As much as I hate to do it, I have to answer my own question. What I couldn't understand is how the search words where seemingly automatically transformed into querystring encoded parameters (ie all spaces where replaced with the + sign vs. being replace with %20). I didn't understand how that was being achieved and I like it so I wanted the same abilities.
In the end, what I should have done was copy the html from SO and tried it out on my own MVC site because it turns out that the encoding is built in/automatic. I didn't have to do anything to get the functionality.
Here is the basic HTML for the search box:
<form id="frmsearch" action="~/Catalog/Search" method="get">
<input id="q" name="q" value="#q" style="width:275px;"/>
<input id="submit" name="submit" type="submit" style="font-weight:bold;" value="Search" />
</form>
Now, if you type "few random words" in the text box named "q" and click the submit button, the form action automatically takes you to "~/Catalog/Search?q=few+random+words" without any additional coding.
Now for the best part, in the controller code, the "q" parameter is automatically available as "few random words" without any additional coding as well.
Example:
public ActionResult Search(string q)
{
//q = "few random words" (no need to remove '+' signs)
var model = GetSearchResults(q)
return View(model);
}
The only thing I haven't tested out it how it would handle scripting attacks but I think I'm going to get that for free as well. : )
Hope this helps anyone who stumbles across this answer. Thank you to everyone who submitted answers trying to help. I'm sorry if my question wasn't clear enough.

How do I use a pattern Url to extract a segment from an actual Url?

If I have a series of "pattern" Urls of the form:
http://{username}.sitename.com/
http://{username}.othersite.net/
http://mysite.com/{username}
and I have an actual Url of the form:
http://joesmith.sitename.com/
Is there any way that I can match a pattern Url and in turn use it to extract the username portion out the actual Url? I've thought of nasty ways to do it, but it just seems like there should be a more intuitive way to accomplish this.
ASP.NET MVC uses a similar approach to extract the various segments of the URL when it is building its routes. Given the example:
{controller}/{action}
So given the Url of the form, Home/Index, it knows that it is the Home controller calling the Index action method.

Not sure I understand this question correctly but you can just use a regular expression to match anything between 'http://' and the first dot.

A very simple regex will do:
':https?://([a-z0-9\.-]*[a-z0-9])\.sitename\.com'
This will allow any subdomain that only contains valid subdomain characters. Example of allowed subdomains:
joesmith.sitename.com
joe.smith.sitename.com
joe-smith.sitename.com
a-very-long-subdomain.sitename.com
As you can see, you might want to complicate the regex slightly. For instance, you could limit it to only allow a certain amount of characters in the subdomain.

It seems the the quickest and easiest solution is going off of Machine's answer.
var givenUri = "http://joesmith.sitename.com/";
var patternUri = "http://{username}.sitename.com/";
patternUri = patternUri.Replace("{username}", #"([a-z0-9\.-]*[a-z0-9]");
var result = Regex.Match(givenUri, patternUri, RegexOptions.IgnoreCase).Groups;
if(!String.IsNullOrEmpty(result[1].Value))
return result[1].Value;
Seems to work great.

Well, this "pattern URL" is a format you've made up, right? You basically you'll just need to process it.
If the format of it is:
anything inside "{ }" is a thing to capture, everything else must be as is
Then you'd just find the start/end index of those brackets, and match everything else. Then when you get to a place where one is, make sure you only look for chars such that they don't match whatever 'token' comes after the next ending '}'.

There are definitely different ways - ultimately though your server must be configured to handle (and possibly route) these different subdomain requests.
What I would do would be to answer all subdomain requests (except maybe some reserved words, like 'www', 'mail', etc.) on sitename.com with a single handler or page (I'm assuming ASP.NET here based on your C# tag).
I'd use the request path, which is easy enough to get, with some simple string parsing/regex routines (remove the 'http://', grab the first token up until '.' or '/' or '\', etc.) and then use that in a session, making sure to observe URL changes.
Alternately, you could map certain virtual paths to request urls ('joesmith.sitename.com' => 'sitename.com/index.aspx?username=joesmith') via IIS but that's kind of nasty too.
Hope this helps!

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.