Regular expression for accepting alphanumeric characters (6-10 chars) .NET, C# - c#

I am building a user registration form using C# with .NET.
I have a requirement to validate user entered password fields.
Validation requirement is as below.
It should be alphanumeric (a-z , A-Z , 0-9)
It should accept 6-10 characters (minimum 6 characters, maximum 10 characters)
With at least 1 alphabet and number (example: stack1over)
I am using a regular expression as below.
^([a-zA-Z0-9]{6,10})$
It satisfies my first 2 conditions.
It fails when I enter only characters or numbers.

Pass it through multiple regexes if you can. It'll be a lot cleaner than those look-ahead monstrosities :-)
^[a-zA-Z0-9]{6,10}$
[a-zA-Z]
[0-9]
Though some might consider it clever, it's not necessary to do everything with a single regex (or even with any regex, sometimes - just witness the people who want a regex to detect numbers between 75 and 4093).
Would you rather see some nice clean code like:
if not checkRegex(str,"^[0-9]+$")
return false
val = string_to_int(str);
return (val >= 75) and (val <= 4093)
or something like:
return checkRegex(str,"^7[5-9]$|^[89][0-9]$|^[1-9][0-9][0-9]$|^[1-3][0-9][0-9][0-9]$|^40[0-8][0-9]$|^409[0-3]$")
I know which one I'd prefer to maintain :-)

Use positive lookahead
^(?=.*[a-zA-Z])(?=.*[0-9])[a-zA-Z0-9]{6,10}$
Look arounds are also called zero-width assertions. They are zero-width just like the start and end of line (^, $). The difference is that lookarounds will actually match characters, but then give up the match and only return the result: match or no match. That is why they are called "assertions". They do not consume characters in the string, but only assert whether a match is possible or not.
The syntax for look around:
(?=REGEX) Positive lookahead
(?!REGEX) Negative lookahead
(?<=REGEX) Positive lookbehind
(?<!REGEX) Negative lookbehind

string r = #"^(?=.*[A-Za-z])(?=.*[0-9])[A-Za-z0-9]{6,10}$";
Regex x = new Regex(r);
var z = x.IsMatch(password);
http://www.regular-expressions.info/refadv.html
http://www.regular-expressions.info/lookaround.html

Related

Regular Expressions C#

I know there are many questions about making regular expressions, but they all seem to be about a single problem than the general usage. I, too, have a problem like to solve. I have tried to learn by reading about regular expressions, but it gets tricky quick. Here's my question:
C#
I need to validate two textboxes that exist on the same form. The math operations I've coded can handle any floating point number. For this particular application I know of three formats the numbers will be in or there is a mistake on the users behalf. I'd like to prevent those mistakes in example if an extra number is accidentally typed or if enter is hit too early, etc.
Here are the formats: "#.####" "##.####" "###.##" where the "#" represents a mandatory digit. The formats starting with a one or two digit whole number must have 4 trailing digits or more. I've capped it at 8, or so I tried to lol.The format starting with a three digit whole number should never be allowed to have more than two digits trailing the decimal.
Here's what I have tried thus far.
Regex acceptedInputRegex = new Regex(#"^\b[0-9]{3}.[0-9]{2}|[0-9]{1,2}.[0-9]{4,8}$");
Regex acceptedInputRegex = new Regex(#"^\b\d{3}.\d{2} | \d{1,2}.\d{4,8}$");
I have tried it in thinking a match was what I wanted to achieve and as if a match to my negated expression means there is a problem. I was unsuccessful in both attempts. This is the code:
if (acceptedInputRegex.IsMatch(txtMyTextBox1.Text) || acceptedInputRegex.IsMatch(txtMyTextBox2.Text))
{
} else
{
MessageBox.Show("Numbers are not in the right format", "Invalid Input!");
return;
}
Are regular expressions what I should be using to solve this problem?
If not, please tell me what you recommend. If so, please help me correct my regex.
Thanks.
You are close, you need to escape the dots and group the alternatives so that the ^ and $ anchors could be applied to both of them:
#"^(?:\d{3}\.\d{2}|\d{1,2}\.\d{4,8})$"
See the regex demo.
Details:
^ - start of string
(?: - start of a non-capturing group matching either of the two alternatives:
\d{3}\.\d{2} - 3 digits, . and 2 digits
| - or
\d{1,2}\.\d{4,8} - 1 or 2 digits, ., 4 to 8 digits
) - end of the non-capturing group
$ - end of string.
To make \d match only ASCII digits, use RegexOptions.ECMAScript option:
var isValid = Regex.IsMatch(s, #"^(?:\d{3}\.\d{2}|\d{1,2}\.\d{4,8})$", RegexOptions.ECMAScript);

Matching a number preceeded by a know string, followed by an unknown number of characters

[SOME_WORDS:200:1000]
Trying to match just the last 1000 part. Both numbers are variable and can contain an unknown number of characters (although they are expected to contain digits, I cannot rule out that they may also contain other characters). The SOME_WORDS part is known and does not change.
So I begin by doing a positive lookbehind for [SOME_WORDS: followed by a positive lookahead for the trailing ]
That gives us the pattern (?<=\[SOME_WORDS:).*(?=])
And captures the part 200:1000
Now because I don't know how many characters are after SOME_WORDS:, but I know that it ends with another : I use .*: to indicate any character any amount of time followed by :
That gives us the pattern (?<=\[SOME_WORDS:.*:).*(?=])
However at this point the pattern no longer matches anything and this is where I become confused. What am I doing wrong here?
If I assume that the first number will always be 3 characters long I can replace .* with ... to get the pattern (?<=\[SOME_WORDS:...:).*(?=]) and this correctly captures just the 1000 part. However I don't understand why replacing ... with .* makes the pattern not capture anything.
EDIT:
It seems like the online tool I was using to test the regex pattern wasn't working correctly. The pattern (?<=\[SOME_WORDS:.*:).*(?=]) matches the 1000 with no issues when actually done in .net
You usually cannot use a + or a * in a lookbehind, only in a lookahead.
If c# does allow these than you could use a .*? instead of a .* as the .* will eat the second :
Try this:
(?<=\[SOME_WORDS:)(?=\d+:(\d+)])
The match wil be in the first capture group
Quote from http://www.regular-expressions.info/lookaround.html
The bad news is that most regex flavors do not allow you to use just any regex inside a lookbehind, because they cannot apply a regular expression backwards. The regular expression engine needs to be able to figure out how many characters to step back before checking the lookbehind. When evaluating the lookbehind, the regex engine determines the length of the regex inside the lookbehind, steps back that many characters in the subject string, and then applies the regex inside the lookbehind from left to right just as it would with a normal regex.
As Robert Smit mentions this is due to the * being a greedy operator. Greedy operators consume as many characters as they possibly can when they are matched first. They only give up characters if the match fails. If you make the greedy operator lazy(*?), then matching consumes as little number of characters as possible for the match to succeed, so the : is not consumed by *. You can also use [^:]* which is match any character other than :.

Regular Expression to not allow 3 consecutive characters

I have the following regex:
Regex pattern = new Regex(#"^(?=.*\d)(?=.*[a-z])(?=.*[A-Z])[0-9a-zA-Z]{8,20}/(.)$");
(?=.*\d) //should contain at least one digit
(?=.*[a-z]) //should contain at least one lower case
(?=.*[A-Z]) //should contain at least one upper case
[a-zA-Z0-9]{8,20} //should contain at least 8 characters and maximum of 20
My problem is I also need to check if 3 consecutive characters are identical. Upon searching, I saw this solution:
/(.)\1\1/
However, I can't make it to work if I combined it to my existing regex, still no luck:
Regex(#"^(?=.*\d)(?=.*[a-z])(?=.*[A-Z])[0-9a-zA-Z]{8,20}$/(.)\1\1/");
What did I missed here? Thanks!
The problem is that /(.)\1\1/ includes the surrounding / characters which are used to quote literal regular expressions in some languages (like Perl). But even if you don't use the quoting characters, you can't just add it to a regular expression.
At the beginning of your regex, you have to say "What follows cannot contain a character followed by itself and then itself again", like this: (?!.*(.)\1\1). The (?! starts a zero-width negative lookahead assertion. The "zero-width" part means that it does not consume any characters in the input string, and the "negative lookahead assertions" means that it looks forward in the input string to make sure that the given pattern does not appear anywhere.
All told, you want a regex like this:
new Regex(#"^(?!.*(.)\1\1)(?=.*\d)(?=.*[a-z])(?=.*[A-Z])[0-9a-zA-Z]{8,20}$")
I solved by using trial and error:
Regex pattern = new Regex(#"^(?!.*(.)\1\1)(?=.*\d)(?=.*[a-z])(?=.*[A-Z])[0-9a-zA-Z]{8,20}$");

Regex to ensure that in a string such as "05123:12315", the first number is less than the second?

I must have strings in the format x:y where x and y have to be five digits (zero padded) and x <= y.
Example:
00515:02152
What Regex will match this format?
If possible, please explain the solution briefly to help me learn.
EDIT: Why do I need Regex? I've written a generic tool that takes input and validates it according to a configuration file. An unexpected requirement popped up that would require me to validate a string in the format I've shown (using the configuration file). I was hoping to solve this problem using the existing configuration framework I've coded up, as splitting and parsing would be out of the scope of this tool. For an outstanding requirement such as this, I don't mind having some unorthodox/messy regex, as long as it's not 10000 lines long. Any intelligent solutions using Regex are appreciated! Thanks.
Description
This expression will validate that the first 5 digit number is smaller then the second 5 digit number where zero padded 5 digit numbers are in a : delimited string and is formatted as 01234:23456.
^
(?:
(?=0....:[1-9]|1....:[2-9]|2....:[3-9]|3....:[4-9]|4....:[5-9]|5....:[6-9]|6....:[7-9]|7....:[8-9]|8....:[9])
|(?=(.)(?:0...:\1[1-9]|1...:\1[2-9]|2...:\1[3-9]|3...:\1[4-9]|4...:\1[5-9]|5...:\1[6-9]|6...:\1[7-9]|7...:\1[8-9]|8...:\1[9]))
|(?=(..)(?:0..:\2[1-9]|1..:\2[2-9]|2..:\2[3-9]|3..:\2[4-9]|4..:\2[5-9]|5..:\2[6-9]|6..:\2[7-9]|7..:\2[8-9]|8..:\2[9]))
|(?=(...)(?:0.:\3[1-9]|1.:\3[2-9]|2.:\3[3-9]|3.:\3[4-9]|4.:\3[5-9]|5.:\3[6-9]|6.:\3[7-9]|7.:\3[8-9]|8.:\3[9]))
|(?=(....)(?:0:\4[1-9]|1:\4[2-9]|2:\4[3-9]|3:\4[4-9]|4:\4[5-9]|5:\4[6-9]|6:\4[7-9]|7:\4[8-9]|8:\4[9]))
)
\d{5}:\d{5}$
Live demo: http://www.rubular.com/r/w1QLZhNoEa
Note that this is using the x option to ignore all white space and allow comments, if you use this without x then the expression will need to be all on one line
The language you want to recognize is finite, so the easiest thing to do is just list all the cases separated by "or". The regexp you want is:
(00000:[00000|00001| ... 99999])| ... |(99998:[99998|99999])|(99999:99999)
That regexp will be several billion characters long and take quite some time to execute, but it is what you asked for: a regular expression that matches the stated language.
Obviously that's impractical. Now is it clear why regular expressions are the wrong tool for this job? Use a regular expression to match 5 digits - colon - five digits, and then once you know you have that, split up the string and convert the two sets of digits to integers that you can compare.
x <= y.
Well, you are using wrong tool. Really, regex can't help you here. Or even if you get a solution, that will be too complex, and will be too difficult to expand.
Regex is a text-processing tool to match pattern in regular languages. It is very weak when it comes to semantics. It cannot identify meaning in the given string. Like in your given condition, to conform to x <= y condition, you need to have the knowledge of their numerical values.
For e.g., it can match digits in a sequence, or a mix of digits and characters, but what it cannot do is the stuff like -
match a number greater than 15 and less than 1245, or
match a pattern which is a date between given two dates.
So, where-ever matching a pattern, involves applying semantics to the matched string, Regex is not an option there.
The appropriate way here would be to split the string on colon, and then compare numbers. For leading zero, you can find some workaround.
You can't generally* do this with regex. You can use regex to match the pattern and extract the numbers, then compare the numbers in your code.
For example to match such format (without comparing the numbers) and get the numbers you could use:
^(\d{5}):(\d{5})\z
*) You probably could in this case (as the numbers are always 5 digits and zero padded, but it wouldn't be nice.
You should do something like this instead:
bool IsCorrect(string s)
{
string[] split = s.split(':');
int number1, number2;
if (split.Length == 2 && split[0].Length == 5 && split[1].Length == 5)
{
if (int.TryParse(split[0], out number1) && int.TryParse(split[1], out number2) && number1 <= number2)
{
return true;
}
}
return false;
}
With regex you can't make comparisons to see if a number is bigger than another number.
Let me show you a good example of why you shouldn't try to do this. This is a regex that (nearly) does the same job.
https://gist.github.com/anonymous/ad74e73f0350535d09c1
Raw file:
https://gist.github.com/anonymous/ad74e73f0350535d09c1/raw/03ea835b0e7bf7ac3c5fb6f9c7e934b83fb09d95/gistfile1.txt
Except it's just for 3 digits. For 4, the program that generates these fails with an OutOfMemoryException. With gcAllowVeryLargeObjects enabled. It went on until 5GB until it crashed. You don't want most of your app to be a Regex, right?
This is not a Regex's job.
This is a two step process because regex is a text parser and not analyzer. But with that said, Regex is perfect for validating that we have the 5:5 number pattern and this regex pattern will determine if we have that form factor \d\d\d\d\d:\d\d\d\d\d right. If that form factor is not found then a match fails and the whole validation fails. If it is valid, we can use regex/linq to parse out the numbers and then check for validity.
This code would be inside a method to do the check
var data = "00515:02151";
var pattern = #"
^ # starting from the beginning of the string...
(?=[\d:]{11}) # Is there is a string that is at least 11 characters long with only numbers and a ;, fail if not
(?=\d{5}:\d{5}) # Does it fall into our pattern? If not fail the match
((?<Values>[^:]+)(?::?)){2}
";
// IgnorePatternWhitespace only allows us to comment the pattern, it does not affect the regex parsing
var result = Regex.Matches(data, pattern, RegexOptions.IgnorePatternWhitespace)
.OfType<Match>()
.Select (mt => mt.Groups["Values"].Captures
.OfType<Capture>()
.Select (cp => int.Parse(cp.Value)))
.FirstOrDefault();
// Two values at this point 515, 2151
bool valid = ((result != null) && (result.First () < result.Last ()));
Console.WriteLine (valid); // True
Using Javascript this can work.
var string = "00515:02152";
string.replace(/(\d{5})\:(\d{5})/, function($1,$2,$3){
return (parseInt($2)<=parseInt($3))?$1:null;
});
FIDDLE http://jsfiddle.net/VdzF7/

.NET Regular expression which check length and non-alphanumeric characters

I need Regexp to validate string has minimum length 6 and it is contains at least one non-alphanumeric character e.g: "eN%{S$u)", "h9YI!>4j", "{9YI!;4j", "eN%{S$usdf)", "dfh9YI!>4j", "ghffg{9YI!;4j".
This one is working well ^.*(?=.{6,})(?=.*\\d).*$" but in cases when string does not contain any numbers(e.g "eN%{S$u)") it is not working.
^(?=.{6})(.*[^0-9a-zA-Z].*)$
We use positive lookahead to assure there are at least 6 characters. Then we match the pattern that looks for at least one non-alphanumeric character ([^0-9a-zA-Z]). The .*'s match any number of any characters around this one non-alphanumeric character, but by the time we've reached here we've already checked that we're matching at least 6.
^.*(?=.{6,})(?=.*\\d).*$"
is the regex you tried. Here are some suggestions:
You don't need to match more than 6 characters in the lookahead. Matching only 6 here does no restrict the rest of the regular expression from matching more than 6.
\d matches a digit, and (?=.*\\d) is a lookahead for one of them. This is why you are experiencing the problems you mentioned with strings like eN%{S$u).
Even if the point above wasn't incorrect and the regular expression here was correct, you can combine the second lookahead with the .* that follows by just using .*\\d.*.
marcog's answer is pretty good, but I'd do it the other way around so that it's easier to add even more conditions (such as having at least one digit or whatever), and I'd use lazy quantifiers because they are cheaper for certain patterns:
^(?=.*?[^0-9a-zA-Z]).{6}
So if you were to add the mentioned additional condition, it would be like this:
^(?=.*?[^0-9a-zA-Z])(?=.*?[0-9]).{6}
As you can see, this pattern is easily extensible. Note that is is designed to be used for checking matches only, its capture is not useful.
Keep it easy.
// long enough and contains something not digit or a-z
x.Length >= 6 && Regex.IsMatch(x, #"[^\da-zA-Z]")
Happy coding.
Edit, pure "regular expression":
This first asserts there are 6 letters of anything in the look-ahead, and then ensures that within the look-ahead there is something that is not alpha-numeric (it will "throw away" up to the first 5 characters trying to match).
(?=.{6}).{0,5}[^\da-zA-Z]
What about that(fixed): ^(?=.{6})(.*[^\w].*)$
Check this out http://www.ultrapico.com/Expresso.htm it is cool tool which could help you a lot in Regexps learning.

Categories

Resources