Regex expression to validate a user input - c#

I am building a system where the user builds a query by selecting his operands from a combobox(names of operands are then put between $ sign).
eg. $TotalPresent$+56
eg. $Total$*100
eg 100*($TotalRegistered$-$NumberPresent$)
Things like that,
However since the user is allowed to enter brackets and the +,-,* and /.
Thus he can also make mistakes like
eg. $Total$+1a
eg. 78iu+$NumberPresent$
ETC...
I need a way to validate the query built by the user.
How can I do that ?

A regex will never be able to properly validate a query like that. Either your validation would be incomplete, or you would reject valid input.
As you're building a query, you must already have a way parse and execute it. Why not use your parsing code to validate the user input? If you want to have client-side validation you could use an ajax call to the server.

I need a way to validate the query built by the user.
Personally, I don't think it is a good idea to use regex here. It can be possible with help of some extensions (see here, for example), but original Kleene expressions aren't fit for checking whether unlimited number of parentheses is balanced. Even worse, too difficult expression may result in significant time and memory spent, opening doors to denial-of-service attacks (if your service is public).
You can make use of a weak expression, though: one which is easy to write and match with and forbids most obvious mistakes. Some inputs will still be illegal, but you will discover that on parsing, as Menno van den Heuvel offered. Something like this should do:
^(?:[-]?\(*)?(?:\$[A-Za-z][A-Za-z0-9_]*\$|\d+)(?:\)*[+/*-]\(*(?:\$[A-Za-z][A-Za-z0-9_]*\$|\d+))*(?:\)*)$

Hey guys I managed to get to my ends(Thanks to Anirudh(Validating a String using regex))
I am posting my answer as it may help further visitors.
string UserFedData = ttextBox1.Text.Trim().ToString();
//this is a regex to detect conflicting user built queries.
var troublePattern = new Regex(#"^(\(?\d+\)?|\(?[$][^$]+[$]\)?)([+*/-](\(?\d+\)?|\(?[$][^$]+[$]\)?))*$");
//var troublePattern = new Regex(#"var troublePattern = new Regex(#"^(?:[-]?\(*)?(?:\$[A-Za-z][A-Za-z0-9_]*\$|\d+)(?:\)*[+/*-]\(*(?:\$[A-Za-z][A-Za-z0-9_]*\$|\d+))*(?:\)*)$");
string TroublePattern = troublePattern.ToString();
//readyToGo is the boolean that indicates if further processing of data is safe or not
bool readyToGo = Regex.IsMatch(UserFedData, TroublePattern, RegexOptions.None);

Related

How to validate a textbox that it must contain the values starting from fixed combination of words?

I am trying to validate a textbox that it must contain the values starting from fixed word "temp", User must enter temp before entering any other thing in the textbox.
Please help.
Regards.
Have you tried regular expressions? Regular expressions are a way to see if a string contains a specified sequence of characters, and is much more robust than a simple 'search'! They're a powerful tool and I would suggest google for a tutorial.
I noticed you said this is client side, so here's a page describing regexp in javascript. I haven't used regular expressions in javascript, but they can be very useful. Of course, regular expressions are also available in C#.
Basically you'll want to use "^temp" as your pattern. The '^' will make sure that the matching starts at the beginning of the string you're testing, and check to see if 'temp' is there. If the pattern doesn't match, the string doesn't have 'temp' at the start of it.
var stringToTest = "TemP this should match"
var pattern = /^temp/i
var result = pattern.test(stringToTest)
Above is a simple example that I pulled from W3Schools. As you see, the pattern uses '^temp' as its pattern, and it uses the modifier 'i' to make the check case-insensitive, so that it doesn't matter how the user types in 'temp'(Could be Temp, temP, teMp, teMP, tEmp, etc).

Regex for google referrer validation

I am totally new to Regex and have been trying to do this with little success.
Basically what I want to do is to create a regex that matches any google domain such as Google.com, Google.co.uk, etc.
So far I have ^http://www.google\.com/.*$, but this only matches Google.com. How can I modify it to allow any extension besides com?
Thanks!
You could use alternation, but then you would have to supply all TLDs you want to allow:
^http://www\.google\.(?:com|co\.uk|de|es)/.*$
Add more options separated by pipes. Alternatively, you could allow any TLD (whether valid or not) with this:
^http://www\.google\.[a-z.]+/.*$
However this would also match something like http://www.google.myowndomain.com/. I don't think there would be any approach that allows only valid domains without listing them all.
By the way, if you want to make that slash and the path/query at the end optional, change that to one of the following:
^http://www\.google\.(?:com|co\.uk|de|es)(?:/.*)?$
^http://www\.google\.[a-z.]+(?:/.*)?$
And then you could go another step further and make the www. optional:
^http://(?:www\.)?google\.(?:com|co\.uk|de|es)(?:/.*)?$
^http://(?:www\.)?google\.[a-z.]+(?:/.*)?$
You see, matching all possible but valid URLs for a given problem is not an easy task, but one that needs careful consideration ;).
Depending on the language you are using there might be better options with built-in URL-parsing functions. In PHP for instance, this would be a much easier approach:
$domain = parse_url($urlStr, PHP_URL_HOST);
$isGoogle = preg_match('/^(?:www\.)?google\.[a-z.]+/', $domain);
Or (since this is not perfect anyway, as outlined above) you could abandon regex altogether and do the check like this:
$isGoogle = strpos($domain, 'google.') !== false;

Request.QueryString[] vs. Request.Query.Get() vs. HttpUtility.ParseQueryString()

I searched SO and found similar questions, but none compared all three. That surprised me, so if someone knows of one, please point me to it.
There are a number of different ways to parse the query string of a request... the "correct" way (IMO) should handle null/missing values, but also decode parameter values as appropriate. Which of the following would be the best way to do both?
Method 1
string suffix = Request.QueryString.Get("suffix") ?? "DefaultSuffix";
Method2
string suffix = Request.QueryString["suffix"] ?? "DefaultSuffix";
Method 3
NameValueCollection params = HttpUtility.ParseQueryString(Request.RawUrl);
string suffix = params.Get("suffix") ?? "DefaultSuffix";
Method 4
NameValueCollection params = HttpUtility.ParseQueryString(Request.RawUrl);
string suffix = params["suffix"] ?? "DefaultSuffix";
Questions:
Would Request.QueryString["suffix"] return a null if no suffix was specified?
(Embarrassingly basic question, I know)
Does HttpUtility.ParseQueryString() provide any extra functionality over accessing Request.QueryString directly?
The MSDN documentation lists this warning:
The ParseQueryString method uses query strings that might contain user input, which is a potential security threat. By default, ASP.NET Web pages validate that user input does not include script or HTML elements. For more information, see Script Exploits Overview.
But it's not clear to me if that means ParseQueryString() should be used to handle that, or is exposed to security flaws because of it... Which is it?
ParseQueryString() uses UTF8 encoding by default... do all browsers encode the query string in UTF8 by default?
ParseQueryString() will comma-separate values if more than one is specified... does Request.QueryString() do that as well, or what happens if it doesn't?
Which of those methods would correctly decode "%2b" to be a "+"?
Showing my Windows development roots again... and I would be a much faster developer if I didn't wonder about these things so much... : P
Methods #1 and #2 are the same thing, really. (I think the .Get() method is provided for language compatibility.)
ParseQueryString returns you something that is the functional equivalent of Request.Querystring. You would usually use it when you have a raw URL and no other way to parse the query string parameters from it. Request.Querystring does that for you, so in this case, it's not needed.
You can't leave off "suffix". You either have to pass a string or an index number. If you leave off the [] entirely, you get the whole NameValueCollection. If you mean what if "suffix" was not one of the QueryString values then yes; you would get null if you called Request.QueryString["suffix"].
No. The most likely time you would use it is if you had an external URL and wanted to parse the query string parameters from it.
ParseQueryString does not handle it... neither does pulling the values straight from Request.QueryString. For ASP.NET, you usually handle form values as the values of controls, and that is where ASP.NET usually 'handles' these things for you. In other words: DON'T TRUST USER INPUT Ever. No matter what framework is doing what ever for you.
I have no clue (I think no). However, I think what you are reading is telling you that ParseQueryString is returning UTF-8 encoded text - regardless if it was so encoded when it came in.
Again: ParseQueryString returns basically the same thing you get from Request.QueryString. In fact, I think ParseQueryString is used internally to provide Request.QueryString.
They would produce the equivalent; they will all properly decode the values submitted. If you have URL: http://site.com/page.aspx?id=%20Hello then call Request.QueryString["id"] the return value will be " Hello", because it automatically decodes.
Example 1:
string itsMeString = string.IsNullOrEmpty(Request.QueryString["itsMe"]) ? string.Empty : HttpUtillity.UrlDecode(Request.QueryString["itsMe"]);
Stright to your questions:
Not quite sure what do you mean by suffix, if you are asking what happens if the key is not present(you don't have it in the QueryString) - yes it will return null.
My GUESS here is that when constructed, Request.QueryString internally calls HttpUtillity.ParseQueryString() method and caches the NameValueCollection for subsequential access. I think the first is only left so you can use it over a string that is not present in the Request, for example if you are scrapping a web page and need to get some arguments from a string you've found in the code of that page. This way you won't need to construct an Uri object but will be able to get just the query string as a NameValueCollection if you are sure you only need this. This is a wild guess ;).)
This is implemented on a page level so if you are accessing the QueryString let's say in Page_Load event handler, you are having a valid and safe string (ASP.NET will throw an exception otherwise and will not let the code flow enter the Page_Load so you are protected from storing XSS in your database, the exception will be: "A potentially dangerous Request.QueryString value was detected from the client, same as if a post variable contains any traces of XSS but instead Request.Form the exception says Request.QueryString."). This is so if you let the "validateRequest" switched on (by default it is). The ASP.NET pipeline will throw an exception earlier, so you don't have the chance to save any XSS things to your store (Database). Switching it off implies you know what you're doing so you will then need to implement the security yourself (by checking what's comming in).
Probably it will be safe to say yes. Anyway, since you will in most cases generating the QueryString on your own (via JavaScript or server side code - be sure to use HttpUtillity.UrlEncode for backend code and escape for JavaScript). This way the browser will be forced to turn "It's me!" to "It%27s%20me%21". You can refer to this article for more on Url Encoding in JavaScript: http://www.javascripter.net/faq/escape.htm.
Please elaborate on that, couldn't quite get what do you mean by "will comma-separate values if more than one is specified.".
As far as I remember, none of them will. You will probably need to call HttpUtillity.UrlDecode / HttpUtillity.HtmlDecode (based on what input do you have) to get the string correctly, in the above example with "It's me!" you will do something like (see Example 1 as something's wrong with the code formatting if I put it after the numbered list).

Regular expression in C# , is this possible?

I never use regular expression before and plan to use it to solve my problem but not quite sure whether it can help me.
I have a situation where I need store a rule or formula to build string values like following examples in a database field then retrieve this rule and build the string value.
FacilityCode + Left(ModelNO,2)
Right(PO,3) + Left(Serial,2)
Is this achievable using .net regular expression? Any good tutorial or simple examples of this problem.
Regexp : http://msdn.microsoft.com/en-us/library/2k3te2cs(VS.80).aspx
http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.aspx
But it doesn't seems fitting :)
It might be better to code some random string generator. Regex is for searching data not creating data.
The thing to remember about regex is that it is like an aircraft carrier; it does one thing very very well, it does not do other jobs very well at all.
An aircraft carrier moves planes very well on the ocean; it does not make a cheese sandwich well AT ALL!!
That is to say, if you use regex when you shouldn't you will almost certainly use far more processing power than if you used another tool for that job. Html parsing comes to mind.
Regex is provided as part of System.Text.RegularExpressions, but you can't rely exclusively on it. It'll let you search existing strings, but you'll need to implement your own logic for building new strings based on what you find in the existing data.
Also, keep in mind that System.Text.RegularExpressions works differently from regexp in Perl and other implementations. For example, it doesn't recognize POSIX character class definitions.
Since you're new to regex, you might want to check out the "Regular Expressions User Guide" on zytrax.com. It's not as comprehensive as an O'Reilly manual, but it'll do as a start.

Regex to detect Javascript In a string

I am trying to detect JavaScript in my querystrings value.
I have the following c# code
private bool checkForXSS(string value)
{
Regex regex = new Regex(#"/((\%3C)|<)[^\n]+((\%3E)|>)/I");
if (regex.Match(value).Success) return true;
return false;
}
This works for detecting <script></script> tags but unfortunately if there were no tags a match is not reached.
Is it possible for a regex to match on JavaScript keywords and semi-colons etc?
This is not meant to cover all XSS attack bases. Just a way to detect simple JS attacks that can be in a string value.
Thanks
NÂș 1 Rule: Use a whitelist, not a blacklist.
You are preventing one way to do a XSS, not any. To achieve this, you must validate the input against what you should accept as a user input, i.e.
If you expect a number, validate the input against /^\d{1, n}$/
If you expect a string, validate it against /^[\s\w\.\,]+$/, etc...
For further info, start reading the Wikipedia entry, the entry at OWASP, webappsec articles and some random blog entries written by unknown people
That's a pretty lame way of preventing cross-site scripting attacks. You need to use a completely different approach: make sure that your user-supplied input is:
Validated such that it matches the semantics of the data being gathered;
Appropriately quoted every time that it is used to construct expressions to be interpreted by some language interpreter (SQL, HTML, Javascript - even when going to a plain-text logfile). Appropriate quoting completely depends on the output context, and there is no single way to do it.
There are many ways to embed javascript. E.g.
%3Cp+style="expression(alert('hi'))"
will make it through your filter.
You probably can't find a magical regexp that will find all JS and that won't reject a lot of valid query strings.
This kind of checking might be useful, but it should only be one part of a defense-in-depth.
It should be enough for you to check if the tag <script is present.
private bool checkForXSS(string value)
{
return value.IndexOf("<script") != -1;
}

Categories

Resources