regex for alphanumeric word, must be 6 characters long - c#

What is the regex for a alpha numeric word, at least 6 characters long (but at most 50).

/[a-zA-Z0-9]{6,50}/
You can use word boundaries at the beginning/end (\b) if you want to actually match a word within text.
/\b[a-zA-Z0-9]{6,50}\b/

\b\w{6,50}\b
\w is any 'word' character - depending on regex flavour it might be just [a-z0-9_] or it might include others (e.g. accented chars/etc).
{6,50} means between 6 and 50 (inclusive)
\b means word boundary (ensuring the word does not exceed the 50 at either end).
After re-reading, it appears that what you want do is ensure the entire text matches? If so...
^\w{6,50}$

With PCRE regex you could do this:
/[a-zA-Z0-9]{6,50}/
It would be very hard to do in regex without the min/max quantifiers so hopefully your language supports them.

Related

Regex for first name

I am quite new to regex thing and need regex for first name which satisfies following conditions:
First Name must contain letters only. It may contain spaces, hyphens, or apostrophes.
It must begin with letters.
All other characters and numbers are not valid.
Special characters ‘ and – cannot be together (e.g. John’-s is not allowed)
An alphabet should be present before and after the special characters ‘ and – (e.g. John ‘s is not allowed)
Two consecutive spaces are not allowed (e.g. Annia St is not allowed)
Can anyone help? I tried this ^([a-z]+['-]?[ ]?|[a-z]+['-]?)*?[a-z]$ but it's not working as expected.
Regexes are notoriously difficult to write and maintain.
One technique that I've used over the years is to annotate my regexes by using named capture groups. It's not perfect, but can greatly help with the readability and maintainability of your regex.
Here is a regex that meets your requirements.
^(?<firstchar>(?=[A-Za-z]))((?<alphachars>[A-Za-z])|(?<specialchars>[A-Za-z]['-](?=[A-Za-z]))|(?<spaces> (?=[A-Za-z])))*$
It is split down into the following parts:
1) (?<firstchar>(?=[A-Za-z])) This ensures the first character is an alpha character, upper or lowercase.
2) (?<alphachars>[A-Za-z]) We allow more alpha chars.
3) (?<specialchars>[A-Za-z]['-](?=[A-Za-z])) We allow special characters, but only with an alpha character before and after.
4) (?<spaces> (?=[A-Za-z])) We allow spaces, but only one space, which must be followed by alpha characters.
You should use a testing tool when writing regexes, I'd recommend https://regex101.com/
You can see from the screenshot below how this regex performs.
Take the regex I've given you, run it in https://regex101.com/ with samples you'd like to match against, and tweak it to fit your requirements. Hopefully I've given you enough information to be self sufficient in customising it to your needs.
You can use this link to run the regex https://regex101.com/r/O2wFfi/1/
Edit
I've updated to address the issue in your comment, rather than just give you the code, I will explain the problem and how I fixed it.
For your example "Sam D'Joe", if we run the original regex, the following happens.
^(?<firstchar>[A-Za-z])((?<alphachars>[A-Za-z])|(?<specialchars>[A-Za-z]['-][A-Za-z])|(?<spaces> [A-Za-z]))*$
1) ^ matches the start of the string
2) (?<firstchar>[A-Za-z]) matches the first character
3) (?<alphachars>[A-Za-z]) matches every character up to the space
4) (?<spaces> [A-Za-z]) matches the space and the subsequent alpha char
Matches consume the characters that they match
This is where we run into a problem. Our "specialchars" part of the regex matches an alpha char, our special char and then another alpha char ((?<specialchars>[A-Za-z]['-](?=[A-Za-z]))).
The thing you need to know about regexes, is each time you match a character, that character is then consumed. We've already matched the alpha char before the special character, so our regex will never match.
Each step actually looks like this:
1) ^ matches the start of the string
2) (?<firstchar>[A-Za-z]) matches the first character
3) (?<alphachars>[A-Za-z]) matches every character up to the space
4) (?<spaces> [A-Za-z]) matches the space and the subsequent alpha char
and then we're left with the following
We cannot match this, because one of our rules is "An alphabet should be present before and after the special characters ‘ and –".
Lookahead
Regex has a concept called "lookahead". A lookahead allows you to match a character without consuming it!
The syntax for a lookahead is ?= followed by what you want to match. E.g. ?=[A-Z] would look ahead for a single character that is an uppercase letter.
We can fix our regex, by using lookaheads.
1) ^ matches the start of the string
2) (?<firstchar>[A-Za-z]) matches the first character
3) (?<alphachars>[A-Za-z]) matches every character up to the space
4) We now change our "spaces" regex, to lookahead to the alpha char, so we don't consume it. We change (?<spaces> [A-Za-z]) to (?<spaces> ?=[A-Za-z]). This matches the space and looks ahead to the subsequent alpha char, but doesn't consume it.
5) (?<specialchars>[A-Za-z]['-][A-Za-z]) matches the alpha char, the special char, and the subsequent alpha char.
6) We use a wildcard to repeat matching our previous 3 rules multiple times, and we match until the end of the line.
I also added lookaheads to the "firstchar", "specialchars" and "spaces" capture groups, I've bolded the changes below.
^(?<firstchar>(?=[A-Za-z]))((?<alphachars>[A-Za-z])|(?<specialchars>[A-Za-z]['-](?=[A-Za-z]))|(?<spaces> (?=[A-Za-z])))*$
This short regex should do it ^([a-zA-Z]+?)([-\s'][a-zA-Z]+)*?$ ,
([a-zA-Z]+?) - Means the String should start with alphabets.
([-\s'][a-zA-Z]+)*? - Means the string must have hyphen,space or apostrophe followed by alphabets.
^ and $ - denote start and end of string
Here's the link to regex demo.
Try this one
^[^- '](?=(?![A-Z]?[A-Z]))(?=(?![a-z]+[A-Z]))(?=(?!.*[A-Z][A-Z]))(?=(?!.*[- '][- '.]))(?=(?!.*[.][-'.]))[A-Za-z- '.]{2,}$
Demo

Simple phone number regex to match numbers, spaces, etc

I'm trying to modify a fairly basic regex pattern in C# that tests for phone numbers.
The patterns is -
[0-9]+(\.[0-9][0-9]?)?
I have two questions -
1) The existing expression does work (although it is fairly restrictive) but I can't quite understand how it works. Regexps for similar issues seem to look more like this one -
/^[0-9()]+$/
2) How could I extend this pattern to allow brackets, periods and a single space to separate numbers. I tried a few variations to include -
[0-9().+\s?](\.[0-9][0-9]?)?
Although i can't seem to create a valid pattern.
Any help would be much appreciated.
Thanks,
[0-9]+(\.[0-9][0-9]?)?
First of all, I recommend checking out either regexr.com or regex101.com, so you yourself get an understanding of how regex works. Both websites will give you a step-by-step explanation of what each symbol in the regex does.
Now, one of the main things you have to understand is that regex has special characters. This includes, among others, the following: []().-+*?\^$. So, if you want your regex to match a literal ., for example, you would have to escape it, since it's a special character. To do so, either use \. or [.]. Backslashes serve to escape other characters, while [] means "match any one of the characters in this set". Some special characters don't have a special meaning inside these brackets and don't require escaping.
Therefore, the regex above will match any combination of digits of length 1 or more, followed by an optional suffix (foobar)?, which has to be a dot, followed by one or two digits. In fact, this regex seems more like it's supposed to match decimal numbers with up to two digits behind the dot - not phone numbers.
/^[0-9()]+$/
What this does is pretty simple - match any combination of digits or round brackets that has the length 1 or greater.
[0-9().+\s?](\.[0-9][0-9]?)?
What you're matching here is:
one of: a digit, round bracket, dot, plus sign, whitespace or question mark; but exactly once only!
optionally followed by a dot and one or two digits
A suitable regex for your purpose could be:
(\+\d{2})?((\(0\)\d{2,3})|\d{2,3})?\d+
Enter this in one of the websites mentioned above to understand how it works. I modified it a little to also allow, for example +49 123 4567890.
Also, for simplicity, I didn't include spaces - so when using this regex, you have to remove all the spaces in your input first. In C#, that should be possible with yourString.Replace(" ", ""); (simply replacing all spaces with nothing = deleting spaces)
The + after the character set is a quantifier (meaning the preceeding character, character set or group is repeated) at least one, and unlimited number of times and it's greedy (matched the most possible).
Then [0-9().+\s]+ will match any character in set one or more times.

At least one digit, minimum 8 chars length, with unicode

I know that regex questions have been asked many times before, but I just can't make it to work as I need. What I need is a regex, with a minimum of 8 characters, containing at least one digit (digits can appear in the start, end or after other characters), and supporting Unicode, so that Hebrew, Arabic etc. characters can be used.
Here's the basic regex:
^(?=.*?\d).{8}
^.{8} will match any string that has at least 8 characters. (?=.*?\d) will assert there's a digit in there.
As for the Unicode support, that's up to the regex engine. If Unicode is supported, . should match a Unicode character. If you want to match graphemes instead, your regex flavor may support \X, which you could use instead of ..
If you want to allow non-latin digits, you may need to replace \d with \p{N} depending on your regex engine.
Update for the .NET flavor:
\d already matches Unicode digits so you don't need to use \p{N}
\X is not supported so you'll have to stick with . or use a workaround like (?>\P{M}\p{M}*).
Assuming you are using a C# or Java like regex flavor, and you mean
with characters a character of the unicode category "letter" you can
use:
(?=\p{L}*?\p{Nd})[\p{L}\p{Nd}]{8,}

What does .* do in regex?

After extensive search, I am unable to find an explanation for the need to use .* in regex. For example, MSDN suggests a password regex of
#\"(?=.{6,})(?=(.*\d){1,})(?=(.*\W){1,})"
for length >= 6, 1+ digit and 1+ special character.
Why can't I just use:
#\"(?=.{6,})(?=(\d){1,})(?=(\W){1,})"
.* just means "0 or more of any character"
It's broken down into two parts:
. - a "dot" indicates any character
* - means "0 or more instances of the preceding regex token"
In your example above, this is important, since they want to force the password to contain a special character and a number, while still allowing all other characters. If you used \d instead of .*, for example, then that would restrict that portion of the regex to only match decimal characters (\d is shorthand for [0-9], meaning any decimal). Similarly, \W instead of .*\W would cause that portion to only match non-word characters.
A good reference containing many of these tokens for .NET can be found on the MSDN here: Regular Expression Language - Quick Reference
Also, if you're really looking to delve into regex, take a look at http://www.regular-expressions.info/. While it can sometimes be difficult to find what you're looking for on that site, it's one of the most complete and begginner-friendly regex references I've seen online.
Just FYI, that regex doesn't do what they say it does, and the way it's written is needlessly verbose and confusing. They say it's supposed to match more than seven characters, but it really matches as few as six. And while the other two lookaheads correctly match at least one each of the required character types, they can be written much more simply.
Finally, the string you copied isn't just a regex, it's an XML attribute value (including the enclosing quotes) that seems to represent a C# string literal (except the closing quote is missing). I've never used a Membership object, but I'm pretty sure that syntax is faulty. In any case, the actual regex is:
(?=.{6,})(?=(.*\d){1,})(?=(.*\W){1,})
..but it should be:
(?=.{8,})(?=.*\d)(?=.*\W)
The first lookahead tries to match eight or more of any characters. If it succeeds, the match position (or cursor, if you prefer) is reset to the beginning and the second lookahead scans for a digit. If it finds one, the cursor is reset again and the third lookahead scans for a special character. (Which, by the way, includes whitespace, control characters, and a boatload of other esoteric characters; probably not what the author intended.)
If you left the .* out of the latter two lookaheads, you would have (?=\d) asserting that the first character is a digit, and (?=\W) asserting that it's not a digit. (Digits are classed as word characters, and \W matches anything that's not a word character.) The .* in each lookahead causes it to initially gobble up the whole string, then backtrack, giving back one character at a time until it reaches a spot where the \d or \W can match. That's how they can match the digit and the special character anywhere in the string.
The .* portion just allows for literally any combination of characters to be entered. It's essentially allowing for the user to add any level of extra information to the password on top of the data you are requiring
Note: I don't think that MSDN page is actually suggesting that as a password validator. It is just providing an example of a possible one.

Regular Expression for alphanumeric and space

What is the regular exp for a text that can't contain any special characters except space?
Because Prajeesh only wants to match spaces, \s will not suffice as it matches all whitespace characters including line breaks and tabs.
A character set that should universally work across all RegEx parsers is:
[a-zA-Z0-9 ]
Further control depends on your needs. Word boundaries, multi-line support, etc... I would recommend visiting Regex Library which also has some links to various tutorials on how Regular Expression Parsing works.
[\w\s]*
\w will match [A-Za-z0-9_] and the \s will match whitespaces.
[\w ]* should match what you want.
Assuming "special characters" means anything that's not a letter or digit, and "space" means the space character (ASCII 32):
^[A-Za-z0-9 ]+$
You need #"^[A-Za-z0-9 ]+$". The \s character class matches things other than space (such as tab) and you since you want to match sure that no part of the string has other characters you should anchor it with ^ and $.
If you just want alphabets and spaces then you can use: #"[A-Za-z\s]+" to match at least one character or space. You could also use #"[A-Za-z ]+" instead without explicitly denoting the space.
Otherwise please clarify.
In C#, I'd believe it's ^(\w|\s)*$

Categories

Resources