.NET Regular expression which check length and non-alphanumeric characters - c#

I need Regexp to validate string has minimum length 6 and it is contains at least one non-alphanumeric character e.g: "eN%{S$u)", "h9YI!>4j", "{9YI!;4j", "eN%{S$usdf)", "dfh9YI!>4j", "ghffg{9YI!;4j".
This one is working well ^.*(?=.{6,})(?=.*\\d).*$" but in cases when string does not contain any numbers(e.g "eN%{S$u)") it is not working.

^(?=.{6})(.*[^0-9a-zA-Z].*)$
We use positive lookahead to assure there are at least 6 characters. Then we match the pattern that looks for at least one non-alphanumeric character ([^0-9a-zA-Z]). The .*'s match any number of any characters around this one non-alphanumeric character, but by the time we've reached here we've already checked that we're matching at least 6.
^.*(?=.{6,})(?=.*\\d).*$"
is the regex you tried. Here are some suggestions:
You don't need to match more than 6 characters in the lookahead. Matching only 6 here does no restrict the rest of the regular expression from matching more than 6.
\d matches a digit, and (?=.*\\d) is a lookahead for one of them. This is why you are experiencing the problems you mentioned with strings like eN%{S$u).
Even if the point above wasn't incorrect and the regular expression here was correct, you can combine the second lookahead with the .* that follows by just using .*\\d.*.

marcog's answer is pretty good, but I'd do it the other way around so that it's easier to add even more conditions (such as having at least one digit or whatever), and I'd use lazy quantifiers because they are cheaper for certain patterns:
^(?=.*?[^0-9a-zA-Z]).{6}
So if you were to add the mentioned additional condition, it would be like this:
^(?=.*?[^0-9a-zA-Z])(?=.*?[0-9]).{6}
As you can see, this pattern is easily extensible. Note that is is designed to be used for checking matches only, its capture is not useful.

Keep it easy.
// long enough and contains something not digit or a-z
x.Length >= 6 && Regex.IsMatch(x, #"[^\da-zA-Z]")
Happy coding.
Edit, pure "regular expression":
This first asserts there are 6 letters of anything in the look-ahead, and then ensures that within the look-ahead there is something that is not alpha-numeric (it will "throw away" up to the first 5 characters trying to match).
(?=.{6}).{0,5}[^\da-zA-Z]

What about that(fixed): ^(?=.{6})(.*[^\w].*)$
Check this out http://www.ultrapico.com/Expresso.htm it is cool tool which could help you a lot in Regexps learning.

Related

Simple phone number regex to match numbers, spaces, etc

I'm trying to modify a fairly basic regex pattern in C# that tests for phone numbers.
The patterns is -
[0-9]+(\.[0-9][0-9]?)?
I have two questions -
1) The existing expression does work (although it is fairly restrictive) but I can't quite understand how it works. Regexps for similar issues seem to look more like this one -
/^[0-9()]+$/
2) How could I extend this pattern to allow brackets, periods and a single space to separate numbers. I tried a few variations to include -
[0-9().+\s?](\.[0-9][0-9]?)?
Although i can't seem to create a valid pattern.
Any help would be much appreciated.
Thanks,
[0-9]+(\.[0-9][0-9]?)?
First of all, I recommend checking out either regexr.com or regex101.com, so you yourself get an understanding of how regex works. Both websites will give you a step-by-step explanation of what each symbol in the regex does.
Now, one of the main things you have to understand is that regex has special characters. This includes, among others, the following: []().-+*?\^$. So, if you want your regex to match a literal ., for example, you would have to escape it, since it's a special character. To do so, either use \. or [.]. Backslashes serve to escape other characters, while [] means "match any one of the characters in this set". Some special characters don't have a special meaning inside these brackets and don't require escaping.
Therefore, the regex above will match any combination of digits of length 1 or more, followed by an optional suffix (foobar)?, which has to be a dot, followed by one or two digits. In fact, this regex seems more like it's supposed to match decimal numbers with up to two digits behind the dot - not phone numbers.
/^[0-9()]+$/
What this does is pretty simple - match any combination of digits or round brackets that has the length 1 or greater.
[0-9().+\s?](\.[0-9][0-9]?)?
What you're matching here is:
one of: a digit, round bracket, dot, plus sign, whitespace or question mark; but exactly once only!
optionally followed by a dot and one or two digits
A suitable regex for your purpose could be:
(\+\d{2})?((\(0\)\d{2,3})|\d{2,3})?\d+
Enter this in one of the websites mentioned above to understand how it works. I modified it a little to also allow, for example +49 123 4567890.
Also, for simplicity, I didn't include spaces - so when using this regex, you have to remove all the spaces in your input first. In C#, that should be possible with yourString.Replace(" ", ""); (simply replacing all spaces with nothing = deleting spaces)
The + after the character set is a quantifier (meaning the preceeding character, character set or group is repeated) at least one, and unlimited number of times and it's greedy (matched the most possible).
Then [0-9().+\s]+ will match any character in set one or more times.

Matching a number preceeded by a know string, followed by an unknown number of characters

[SOME_WORDS:200:1000]
Trying to match just the last 1000 part. Both numbers are variable and can contain an unknown number of characters (although they are expected to contain digits, I cannot rule out that they may also contain other characters). The SOME_WORDS part is known and does not change.
So I begin by doing a positive lookbehind for [SOME_WORDS: followed by a positive lookahead for the trailing ]
That gives us the pattern (?<=\[SOME_WORDS:).*(?=])
And captures the part 200:1000
Now because I don't know how many characters are after SOME_WORDS:, but I know that it ends with another : I use .*: to indicate any character any amount of time followed by :
That gives us the pattern (?<=\[SOME_WORDS:.*:).*(?=])
However at this point the pattern no longer matches anything and this is where I become confused. What am I doing wrong here?
If I assume that the first number will always be 3 characters long I can replace .* with ... to get the pattern (?<=\[SOME_WORDS:...:).*(?=]) and this correctly captures just the 1000 part. However I don't understand why replacing ... with .* makes the pattern not capture anything.
EDIT:
It seems like the online tool I was using to test the regex pattern wasn't working correctly. The pattern (?<=\[SOME_WORDS:.*:).*(?=]) matches the 1000 with no issues when actually done in .net
You usually cannot use a + or a * in a lookbehind, only in a lookahead.
If c# does allow these than you could use a .*? instead of a .* as the .* will eat the second :
Try this:
(?<=\[SOME_WORDS:)(?=\d+:(\d+)])
The match wil be in the first capture group
Quote from http://www.regular-expressions.info/lookaround.html
The bad news is that most regex flavors do not allow you to use just any regex inside a lookbehind, because they cannot apply a regular expression backwards. The regular expression engine needs to be able to figure out how many characters to step back before checking the lookbehind. When evaluating the lookbehind, the regex engine determines the length of the regex inside the lookbehind, steps back that many characters in the subject string, and then applies the regex inside the lookbehind from left to right just as it would with a normal regex.
As Robert Smit mentions this is due to the * being a greedy operator. Greedy operators consume as many characters as they possibly can when they are matched first. They only give up characters if the match fails. If you make the greedy operator lazy(*?), then matching consumes as little number of characters as possible for the match to succeed, so the : is not consumed by *. You can also use [^:]* which is match any character other than :.

Assist me on building my own regex

I'm completely new in this area, I need a regex that follows these rules:
Only numbers and symbols are allowed.
Must start with a number and ends with a number.
Must not contain more than 1 symbol in a row. (for example 123+-4567 is not accepted but 12+345-67 is accepted.
I tried ^[0-9]*[+-*/][0-9]*$ but I think it's a stupid try.
You were close with your attempt. This one should work.
^[0-9]+([+*/-][0-9]+)*$
explanation:
^ matches beginning of the string
[0-9]+ matches 1 or more digits.
[+*/-] matches one from specified symbols
([+*/-][0-9]+)* matches group of symbol followed by at least one digit, repeated 0 or more times
$ matches end of string
We'll build that one from individual parts and then we'll see how we can be smarter about that:
Numbers
\d+
will match an integer. Not terribly fancy, but we need to start somewhere.
Must start with a number and end with a number:
^\d+.*\d+$
Pretty straightforward. We don't know anything about the part in between, though (also the last \d+ will only match a single digit; we might want to fix that eventually).
Only numbers and symbols are allowed. Depending on the complexity of the rest of the regex this might be easier by explicitly spelling it out or using a negative lookahead to make sure there is no non-(number|symbol) somewhere in the string. I'll go for the latter here because we need that again:
(?!.*[^\d+*/-])
Sticking this to the start of the regex makes sure that the regex won't match if there is any non-(number|symbol) character anywhere in the string. Also note that I put the - at the end of the character class. This is because it has a certain special meaning when used between two other characters in a character class.
Must not contain more than one symbol in a row. This is a variation on the one before. We just make sure that there never is more than one symbol by using a negative lookahead to disallow two in sequence:
(?!.*[+/*-]{2})
Putting it all together:
(?!.*[^\d+*/-])(?!.*[+/*-]{2})^\d+.*\d+$
Testing it:
PS Home:\> '123+-4567' -match '(?!.*[^\d+*/-])(?!.*[+/*-]{2})^\d+.*\d+$'
False
PS Home:\> '123-4567' -match '(?!.*[^\d+*/-])(?!.*[+/*-]{2})^\d+.*\d+$'
True
However, I only literally interpreted your rules. If you're trying to match arithmetic expressions that can have several operands and operators in sequence (but without parentheses), then you can approach that problem differently:
Numbers again
\d+
Operators
[+/*-]
A number followed by an operator
\d+[+/*-]
Using grouping and repetition to match a number followed by any number of repetitions of an operator and another number:
\d+([+/*-]\d+)*
Anchoring it so we match the whole string:
^\d+([+/*-]\d+)*$
Generally, for problems where it works, this latter approach works better and leads to more understandable expressions. The former approach has its merits, but most often only in implementing password policies (apart from »cannot repeat any of your previous 30689 passwords«).

What does .* do in regex?

After extensive search, I am unable to find an explanation for the need to use .* in regex. For example, MSDN suggests a password regex of
#\"(?=.{6,})(?=(.*\d){1,})(?=(.*\W){1,})"
for length >= 6, 1+ digit and 1+ special character.
Why can't I just use:
#\"(?=.{6,})(?=(\d){1,})(?=(\W){1,})"
.* just means "0 or more of any character"
It's broken down into two parts:
. - a "dot" indicates any character
* - means "0 or more instances of the preceding regex token"
In your example above, this is important, since they want to force the password to contain a special character and a number, while still allowing all other characters. If you used \d instead of .*, for example, then that would restrict that portion of the regex to only match decimal characters (\d is shorthand for [0-9], meaning any decimal). Similarly, \W instead of .*\W would cause that portion to only match non-word characters.
A good reference containing many of these tokens for .NET can be found on the MSDN here: Regular Expression Language - Quick Reference
Also, if you're really looking to delve into regex, take a look at http://www.regular-expressions.info/. While it can sometimes be difficult to find what you're looking for on that site, it's one of the most complete and begginner-friendly regex references I've seen online.
Just FYI, that regex doesn't do what they say it does, and the way it's written is needlessly verbose and confusing. They say it's supposed to match more than seven characters, but it really matches as few as six. And while the other two lookaheads correctly match at least one each of the required character types, they can be written much more simply.
Finally, the string you copied isn't just a regex, it's an XML attribute value (including the enclosing quotes) that seems to represent a C# string literal (except the closing quote is missing). I've never used a Membership object, but I'm pretty sure that syntax is faulty. In any case, the actual regex is:
(?=.{6,})(?=(.*\d){1,})(?=(.*\W){1,})
..but it should be:
(?=.{8,})(?=.*\d)(?=.*\W)
The first lookahead tries to match eight or more of any characters. If it succeeds, the match position (or cursor, if you prefer) is reset to the beginning and the second lookahead scans for a digit. If it finds one, the cursor is reset again and the third lookahead scans for a special character. (Which, by the way, includes whitespace, control characters, and a boatload of other esoteric characters; probably not what the author intended.)
If you left the .* out of the latter two lookaheads, you would have (?=\d) asserting that the first character is a digit, and (?=\W) asserting that it's not a digit. (Digits are classed as word characters, and \W matches anything that's not a word character.) The .* in each lookahead causes it to initially gobble up the whole string, then backtrack, giving back one character at a time until it reaches a spot where the \d or \W can match. That's how they can match the digit and the special character anywhere in the string.
The .* portion just allows for literally any combination of characters to be entered. It's essentially allowing for the user to add any level of extra information to the password on top of the data you are requiring
Note: I don't think that MSDN page is actually suggesting that as a password validator. It is just providing an example of a possible one.

Regular Expression to not allow 3 consecutive characters

I have the following regex:
Regex pattern = new Regex(#"^(?=.*\d)(?=.*[a-z])(?=.*[A-Z])[0-9a-zA-Z]{8,20}/(.)$");
(?=.*\d) //should contain at least one digit
(?=.*[a-z]) //should contain at least one lower case
(?=.*[A-Z]) //should contain at least one upper case
[a-zA-Z0-9]{8,20} //should contain at least 8 characters and maximum of 20
My problem is I also need to check if 3 consecutive characters are identical. Upon searching, I saw this solution:
/(.)\1\1/
However, I can't make it to work if I combined it to my existing regex, still no luck:
Regex(#"^(?=.*\d)(?=.*[a-z])(?=.*[A-Z])[0-9a-zA-Z]{8,20}$/(.)\1\1/");
What did I missed here? Thanks!
The problem is that /(.)\1\1/ includes the surrounding / characters which are used to quote literal regular expressions in some languages (like Perl). But even if you don't use the quoting characters, you can't just add it to a regular expression.
At the beginning of your regex, you have to say "What follows cannot contain a character followed by itself and then itself again", like this: (?!.*(.)\1\1). The (?! starts a zero-width negative lookahead assertion. The "zero-width" part means that it does not consume any characters in the input string, and the "negative lookahead assertions" means that it looks forward in the input string to make sure that the given pattern does not appear anywhere.
All told, you want a regex like this:
new Regex(#"^(?!.*(.)\1\1)(?=.*\d)(?=.*[a-z])(?=.*[A-Z])[0-9a-zA-Z]{8,20}$")
I solved by using trial and error:
Regex pattern = new Regex(#"^(?!.*(.)\1\1)(?=.*\d)(?=.*[a-z])(?=.*[A-Z])[0-9a-zA-Z]{8,20}$");

Categories

Resources