Regex to detect Javascript In a string - c#

I am trying to detect JavaScript in my querystrings value.
I have the following c# code
private bool checkForXSS(string value)
{
Regex regex = new Regex(#"/((\%3C)|<)[^\n]+((\%3E)|>)/I");
if (regex.Match(value).Success) return true;
return false;
}
This works for detecting <script></script> tags but unfortunately if there were no tags a match is not reached.
Is it possible for a regex to match on JavaScript keywords and semi-colons etc?
This is not meant to cover all XSS attack bases. Just a way to detect simple JS attacks that can be in a string value.
Thanks

Nº 1 Rule: Use a whitelist, not a blacklist.
You are preventing one way to do a XSS, not any. To achieve this, you must validate the input against what you should accept as a user input, i.e.
If you expect a number, validate the input against /^\d{1, n}$/
If you expect a string, validate it against /^[\s\w\.\,]+$/, etc...
For further info, start reading the Wikipedia entry, the entry at OWASP, webappsec articles and some random blog entries written by unknown people

That's a pretty lame way of preventing cross-site scripting attacks. You need to use a completely different approach: make sure that your user-supplied input is:
Validated such that it matches the semantics of the data being gathered;
Appropriately quoted every time that it is used to construct expressions to be interpreted by some language interpreter (SQL, HTML, Javascript - even when going to a plain-text logfile). Appropriate quoting completely depends on the output context, and there is no single way to do it.

There are many ways to embed javascript. E.g.
%3Cp+style="expression(alert('hi'))"
will make it through your filter.
You probably can't find a magical regexp that will find all JS and that won't reject a lot of valid query strings.
This kind of checking might be useful, but it should only be one part of a defense-in-depth.

It should be enough for you to check if the tag <script is present.
private bool checkForXSS(string value)
{
return value.IndexOf("<script") != -1;
}

Related

Regex expression to validate a user input

I am building a system where the user builds a query by selecting his operands from a combobox(names of operands are then put between $ sign).
eg. $TotalPresent$+56
eg. $Total$*100
eg 100*($TotalRegistered$-$NumberPresent$)
Things like that,
However since the user is allowed to enter brackets and the +,-,* and /.
Thus he can also make mistakes like
eg. $Total$+1a
eg. 78iu+$NumberPresent$
ETC...
I need a way to validate the query built by the user.
How can I do that ?
A regex will never be able to properly validate a query like that. Either your validation would be incomplete, or you would reject valid input.
As you're building a query, you must already have a way parse and execute it. Why not use your parsing code to validate the user input? If you want to have client-side validation you could use an ajax call to the server.
I need a way to validate the query built by the user.
Personally, I don't think it is a good idea to use regex here. It can be possible with help of some extensions (see here, for example), but original Kleene expressions aren't fit for checking whether unlimited number of parentheses is balanced. Even worse, too difficult expression may result in significant time and memory spent, opening doors to denial-of-service attacks (if your service is public).
You can make use of a weak expression, though: one which is easy to write and match with and forbids most obvious mistakes. Some inputs will still be illegal, but you will discover that on parsing, as Menno van den Heuvel offered. Something like this should do:
^(?:[-]?\(*)?(?:\$[A-Za-z][A-Za-z0-9_]*\$|\d+)(?:\)*[+/*-]\(*(?:\$[A-Za-z][A-Za-z0-9_]*\$|\d+))*(?:\)*)$
Hey guys I managed to get to my ends(Thanks to Anirudh(Validating a String using regex))
I am posting my answer as it may help further visitors.
string UserFedData = ttextBox1.Text.Trim().ToString();
//this is a regex to detect conflicting user built queries.
var troublePattern = new Regex(#"^(\(?\d+\)?|\(?[$][^$]+[$]\)?)([+*/-](\(?\d+\)?|\(?[$][^$]+[$]\)?))*$");
//var troublePattern = new Regex(#"var troublePattern = new Regex(#"^(?:[-]?\(*)?(?:\$[A-Za-z][A-Za-z0-9_]*\$|\d+)(?:\)*[+/*-]\(*(?:\$[A-Za-z][A-Za-z0-9_]*\$|\d+))*(?:\)*)$");
string TroublePattern = troublePattern.ToString();
//readyToGo is the boolean that indicates if further processing of data is safe or not
bool readyToGo = Regex.IsMatch(UserFedData, TroublePattern, RegexOptions.None);

RegEx to Validate URL with optional Scheme

I want to validate a URL using regular expression. Following are my conditions to validate the URL:
Scheme is optional
Subdomains should be allowed
Port number should be allowed
Path should be allowed.
I was trying the following pattern:
((http|https)://)?([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)?
But I am not getting the desired results. Even an invalid URL like '*.example.com' is getting matched.
What is wrong with it?
are you matching the entire string? you don't say what language you are using, but in python it looks like you may be using search instead of match.
one way to fix this is to start you regexp with ^ and end it with $.
While parsing URL's is best left to a library (since I know perl best, I would suggest something like http://search.cpan.org/dist/URI/), if you want some help debugging that statement, it might be best to try it in a debugger, something like: http://www.debuggex.com/.
I think one of the main reasons it is matching, is because you don't use beginning and ending string match markers. Meaning, no part of that string might be matching what you put in explicitly, but because you haven't marked it with beginning and end markers for the string, your regex could just be matching 'example.com' in your string, not the entire input.
Found the regular expression for my condition with help from your inputs
^(http(s)?://)?[0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*(:(0-9)*)*(\/?)([a-zA-Z0-9\-‌​\.\?\,\'\/\\\+&%\$#_]*)?$
Following code works for me in c#
private static bool IsValidUrl(string url)
{
return new Regex(#"^(http|http(s)?://)?([\w-]+\.)+[\w-]+[.\w]+(\[\?%&=]*)?").IsMatch(url) &&!new Regex(#"[^a-zA-Z0-9]+$").IsMatch(url);
}
it allows "something.anything (at least 2 later after period) with or without http(s) and www.

How to validate a textbox that it must contain the values starting from fixed combination of words?

I am trying to validate a textbox that it must contain the values starting from fixed word "temp", User must enter temp before entering any other thing in the textbox.
Please help.
Regards.
Have you tried regular expressions? Regular expressions are a way to see if a string contains a specified sequence of characters, and is much more robust than a simple 'search'! They're a powerful tool and I would suggest google for a tutorial.
I noticed you said this is client side, so here's a page describing regexp in javascript. I haven't used regular expressions in javascript, but they can be very useful. Of course, regular expressions are also available in C#.
Basically you'll want to use "^temp" as your pattern. The '^' will make sure that the matching starts at the beginning of the string you're testing, and check to see if 'temp' is there. If the pattern doesn't match, the string doesn't have 'temp' at the start of it.
var stringToTest = "TemP this should match"
var pattern = /^temp/i
var result = pattern.test(stringToTest)
Above is a simple example that I pulled from W3Schools. As you see, the pattern uses '^temp' as its pattern, and it uses the modifier 'i' to make the check case-insensitive, so that it doesn't matter how the user types in 'temp'(Could be Temp, temP, teMp, teMP, tEmp, etc).

Regex for google referrer validation

I am totally new to Regex and have been trying to do this with little success.
Basically what I want to do is to create a regex that matches any google domain such as Google.com, Google.co.uk, etc.
So far I have ^http://www.google\.com/.*$, but this only matches Google.com. How can I modify it to allow any extension besides com?
Thanks!
You could use alternation, but then you would have to supply all TLDs you want to allow:
^http://www\.google\.(?:com|co\.uk|de|es)/.*$
Add more options separated by pipes. Alternatively, you could allow any TLD (whether valid or not) with this:
^http://www\.google\.[a-z.]+/.*$
However this would also match something like http://www.google.myowndomain.com/. I don't think there would be any approach that allows only valid domains without listing them all.
By the way, if you want to make that slash and the path/query at the end optional, change that to one of the following:
^http://www\.google\.(?:com|co\.uk|de|es)(?:/.*)?$
^http://www\.google\.[a-z.]+(?:/.*)?$
And then you could go another step further and make the www. optional:
^http://(?:www\.)?google\.(?:com|co\.uk|de|es)(?:/.*)?$
^http://(?:www\.)?google\.[a-z.]+(?:/.*)?$
You see, matching all possible but valid URLs for a given problem is not an easy task, but one that needs careful consideration ;).
Depending on the language you are using there might be better options with built-in URL-parsing functions. In PHP for instance, this would be a much easier approach:
$domain = parse_url($urlStr, PHP_URL_HOST);
$isGoogle = preg_match('/^(?:www\.)?google\.[a-z.]+/', $domain);
Or (since this is not perfect anyway, as outlined above) you could abandon regex altogether and do the check like this:
$isGoogle = strpos($domain, 'google.') !== false;

Request.QueryString[] vs. Request.Query.Get() vs. HttpUtility.ParseQueryString()

I searched SO and found similar questions, but none compared all three. That surprised me, so if someone knows of one, please point me to it.
There are a number of different ways to parse the query string of a request... the "correct" way (IMO) should handle null/missing values, but also decode parameter values as appropriate. Which of the following would be the best way to do both?
Method 1
string suffix = Request.QueryString.Get("suffix") ?? "DefaultSuffix";
Method2
string suffix = Request.QueryString["suffix"] ?? "DefaultSuffix";
Method 3
NameValueCollection params = HttpUtility.ParseQueryString(Request.RawUrl);
string suffix = params.Get("suffix") ?? "DefaultSuffix";
Method 4
NameValueCollection params = HttpUtility.ParseQueryString(Request.RawUrl);
string suffix = params["suffix"] ?? "DefaultSuffix";
Questions:
Would Request.QueryString["suffix"] return a null if no suffix was specified?
(Embarrassingly basic question, I know)
Does HttpUtility.ParseQueryString() provide any extra functionality over accessing Request.QueryString directly?
The MSDN documentation lists this warning:
The ParseQueryString method uses query strings that might contain user input, which is a potential security threat. By default, ASP.NET Web pages validate that user input does not include script or HTML elements. For more information, see Script Exploits Overview.
But it's not clear to me if that means ParseQueryString() should be used to handle that, or is exposed to security flaws because of it... Which is it?
ParseQueryString() uses UTF8 encoding by default... do all browsers encode the query string in UTF8 by default?
ParseQueryString() will comma-separate values if more than one is specified... does Request.QueryString() do that as well, or what happens if it doesn't?
Which of those methods would correctly decode "%2b" to be a "+"?
Showing my Windows development roots again... and I would be a much faster developer if I didn't wonder about these things so much... : P
Methods #1 and #2 are the same thing, really. (I think the .Get() method is provided for language compatibility.)
ParseQueryString returns you something that is the functional equivalent of Request.Querystring. You would usually use it when you have a raw URL and no other way to parse the query string parameters from it. Request.Querystring does that for you, so in this case, it's not needed.
You can't leave off "suffix". You either have to pass a string or an index number. If you leave off the [] entirely, you get the whole NameValueCollection. If you mean what if "suffix" was not one of the QueryString values then yes; you would get null if you called Request.QueryString["suffix"].
No. The most likely time you would use it is if you had an external URL and wanted to parse the query string parameters from it.
ParseQueryString does not handle it... neither does pulling the values straight from Request.QueryString. For ASP.NET, you usually handle form values as the values of controls, and that is where ASP.NET usually 'handles' these things for you. In other words: DON'T TRUST USER INPUT Ever. No matter what framework is doing what ever for you.
I have no clue (I think no). However, I think what you are reading is telling you that ParseQueryString is returning UTF-8 encoded text - regardless if it was so encoded when it came in.
Again: ParseQueryString returns basically the same thing you get from Request.QueryString. In fact, I think ParseQueryString is used internally to provide Request.QueryString.
They would produce the equivalent; they will all properly decode the values submitted. If you have URL: http://site.com/page.aspx?id=%20Hello then call Request.QueryString["id"] the return value will be " Hello", because it automatically decodes.
Example 1:
string itsMeString = string.IsNullOrEmpty(Request.QueryString["itsMe"]) ? string.Empty : HttpUtillity.UrlDecode(Request.QueryString["itsMe"]);
Stright to your questions:
Not quite sure what do you mean by suffix, if you are asking what happens if the key is not present(you don't have it in the QueryString) - yes it will return null.
My GUESS here is that when constructed, Request.QueryString internally calls HttpUtillity.ParseQueryString() method and caches the NameValueCollection for subsequential access. I think the first is only left so you can use it over a string that is not present in the Request, for example if you are scrapping a web page and need to get some arguments from a string you've found in the code of that page. This way you won't need to construct an Uri object but will be able to get just the query string as a NameValueCollection if you are sure you only need this. This is a wild guess ;).)
This is implemented on a page level so if you are accessing the QueryString let's say in Page_Load event handler, you are having a valid and safe string (ASP.NET will throw an exception otherwise and will not let the code flow enter the Page_Load so you are protected from storing XSS in your database, the exception will be: "A potentially dangerous Request.QueryString value was detected from the client, same as if a post variable contains any traces of XSS but instead Request.Form the exception says Request.QueryString."). This is so if you let the "validateRequest" switched on (by default it is). The ASP.NET pipeline will throw an exception earlier, so you don't have the chance to save any XSS things to your store (Database). Switching it off implies you know what you're doing so you will then need to implement the security yourself (by checking what's comming in).
Probably it will be safe to say yes. Anyway, since you will in most cases generating the QueryString on your own (via JavaScript or server side code - be sure to use HttpUtillity.UrlEncode for backend code and escape for JavaScript). This way the browser will be forced to turn "It's me!" to "It%27s%20me%21". You can refer to this article for more on Url Encoding in JavaScript: http://www.javascripter.net/faq/escape.htm.
Please elaborate on that, couldn't quite get what do you mean by "will comma-separate values if more than one is specified.".
As far as I remember, none of them will. You will probably need to call HttpUtillity.UrlDecode / HttpUtillity.HtmlDecode (based on what input do you have) to get the string correctly, in the above example with "It's me!" you will do something like (see Example 1 as something's wrong with the code formatting if I put it after the numbered list).

Categories

Resources