Regular expression thingy - c#

I would like to create a custom regular expression. Which shall track that the first character of a username should be an alphabet.
Followed by alphanumeric or can have maximum one occurrence of a special character (- or _). I can check for username starts with the alphabet with this ^[a-zA-Z]+$ but not sure what to do to check at most one occurrence of a special character. Any ideas are welcome.
Thanks

From what I understood of your post, you want the following to match.
a-afdsafd
aafdsafd
aafdsa_fd
aafdsa-fd
aAfdsa-FD
And the following to not match:
aa-dsa-fd
aa-dsa_fd
-afdsafd
_afdsafd
Try /^[a-z](?:(?![a-z]+[\-_])[\-_])?[a-z]+(?:(?<![a-z]+[\-_])[\-_]?)[a-z]+?$/i
The i modifier enables case-insensitive matching.
The ^ and $ anchors ensure that the entire string matches our regex.
[a-z] checks that the first character is an alphabet.
(?:(?![a-z]+[\-_])[\-_])?) looks ahead to check that there is no "special character" used later and if there is none, we optionally match one special character.
[a-z]+ Match one or more alphabets.
(?:(?<![a-z]+[\-_])[\-_]?) does the same thing as 4 except it looks behind.
[a-z]+? Optionally match one or more alphabets.
https://regexr.com/3t86l
Edit: I noticed that aAfdsaFd_ should also match. The above does not match this. Slightly modifying #Wiktor Stribiżew's comment, ^[a-zA-Z][a-zA-Z0-9]*(?:[-_][a-zA-Z0-9]*)?$ seems to work fine with all cases. That's cleaner and more efficient. All credit to #Wiktor Stribiżew.

You could match an upper or lowercase character from the start of the string ^[a-zA-Z], match zero or more times alphanumeric [a-zA-Z0-9]* followed by an optional hyphen or underscore [-_]?.
At the end match zero or more times alphanumeric [a-zA-Z0-9]*$ until the end of the string.
^[a-zA-Z][a-zA-Z0-9]*[-_]?[a-zA-Z0-9]*$

Related

Regex for first name

I am quite new to regex thing and need regex for first name which satisfies following conditions:
First Name must contain letters only. It may contain spaces, hyphens, or apostrophes.
It must begin with letters.
All other characters and numbers are not valid.
Special characters ‘ and – cannot be together (e.g. John’-s is not allowed)
An alphabet should be present before and after the special characters ‘ and – (e.g. John ‘s is not allowed)
Two consecutive spaces are not allowed (e.g. Annia St is not allowed)
Can anyone help? I tried this ^([a-z]+['-]?[ ]?|[a-z]+['-]?)*?[a-z]$ but it's not working as expected.
Regexes are notoriously difficult to write and maintain.
One technique that I've used over the years is to annotate my regexes by using named capture groups. It's not perfect, but can greatly help with the readability and maintainability of your regex.
Here is a regex that meets your requirements.
^(?<firstchar>(?=[A-Za-z]))((?<alphachars>[A-Za-z])|(?<specialchars>[A-Za-z]['-](?=[A-Za-z]))|(?<spaces> (?=[A-Za-z])))*$
It is split down into the following parts:
1) (?<firstchar>(?=[A-Za-z])) This ensures the first character is an alpha character, upper or lowercase.
2) (?<alphachars>[A-Za-z]) We allow more alpha chars.
3) (?<specialchars>[A-Za-z]['-](?=[A-Za-z])) We allow special characters, but only with an alpha character before and after.
4) (?<spaces> (?=[A-Za-z])) We allow spaces, but only one space, which must be followed by alpha characters.
You should use a testing tool when writing regexes, I'd recommend https://regex101.com/
You can see from the screenshot below how this regex performs.
Take the regex I've given you, run it in https://regex101.com/ with samples you'd like to match against, and tweak it to fit your requirements. Hopefully I've given you enough information to be self sufficient in customising it to your needs.
You can use this link to run the regex https://regex101.com/r/O2wFfi/1/
Edit
I've updated to address the issue in your comment, rather than just give you the code, I will explain the problem and how I fixed it.
For your example "Sam D'Joe", if we run the original regex, the following happens.
^(?<firstchar>[A-Za-z])((?<alphachars>[A-Za-z])|(?<specialchars>[A-Za-z]['-][A-Za-z])|(?<spaces> [A-Za-z]))*$
1) ^ matches the start of the string
2) (?<firstchar>[A-Za-z]) matches the first character
3) (?<alphachars>[A-Za-z]) matches every character up to the space
4) (?<spaces> [A-Za-z]) matches the space and the subsequent alpha char
Matches consume the characters that they match
This is where we run into a problem. Our "specialchars" part of the regex matches an alpha char, our special char and then another alpha char ((?<specialchars>[A-Za-z]['-](?=[A-Za-z]))).
The thing you need to know about regexes, is each time you match a character, that character is then consumed. We've already matched the alpha char before the special character, so our regex will never match.
Each step actually looks like this:
1) ^ matches the start of the string
2) (?<firstchar>[A-Za-z]) matches the first character
3) (?<alphachars>[A-Za-z]) matches every character up to the space
4) (?<spaces> [A-Za-z]) matches the space and the subsequent alpha char
and then we're left with the following
We cannot match this, because one of our rules is "An alphabet should be present before and after the special characters ‘ and –".
Lookahead
Regex has a concept called "lookahead". A lookahead allows you to match a character without consuming it!
The syntax for a lookahead is ?= followed by what you want to match. E.g. ?=[A-Z] would look ahead for a single character that is an uppercase letter.
We can fix our regex, by using lookaheads.
1) ^ matches the start of the string
2) (?<firstchar>[A-Za-z]) matches the first character
3) (?<alphachars>[A-Za-z]) matches every character up to the space
4) We now change our "spaces" regex, to lookahead to the alpha char, so we don't consume it. We change (?<spaces> [A-Za-z]) to (?<spaces> ?=[A-Za-z]). This matches the space and looks ahead to the subsequent alpha char, but doesn't consume it.
5) (?<specialchars>[A-Za-z]['-][A-Za-z]) matches the alpha char, the special char, and the subsequent alpha char.
6) We use a wildcard to repeat matching our previous 3 rules multiple times, and we match until the end of the line.
I also added lookaheads to the "firstchar", "specialchars" and "spaces" capture groups, I've bolded the changes below.
^(?<firstchar>(?=[A-Za-z]))((?<alphachars>[A-Za-z])|(?<specialchars>[A-Za-z]['-](?=[A-Za-z]))|(?<spaces> (?=[A-Za-z])))*$
This short regex should do it ^([a-zA-Z]+?)([-\s'][a-zA-Z]+)*?$ ,
([a-zA-Z]+?) - Means the String should start with alphabets.
([-\s'][a-zA-Z]+)*? - Means the string must have hyphen,space or apostrophe followed by alphabets.
^ and $ - denote start and end of string
Here's the link to regex demo.
Try this one
^[^- '](?=(?![A-Z]?[A-Z]))(?=(?![a-z]+[A-Z]))(?=(?!.*[A-Z][A-Z]))(?=(?!.*[- '][- '.]))(?=(?!.*[.][-'.]))[A-Za-z- '.]{2,}$
Demo

Regex to restrict special characters in the beginning of an email address

PFB the regex. I want to make sure that the regex should not contain any special character just after # and just before. In-between it can allow any combination.
The regex I have now:
#"^[^\W_](?:[\w.-]*[^\W_])?#(([a-zA-Z0-9]+)(\.))([a-zA-Z]{2,3}|[0-9]{1,3})(\]?)$"))"
For example, the regex should not match
abc#.sj.com
abc#-.sj-.com
SSDFF-SAF#-_.SAVAVSAV-_.IP
Since you consider _ special, I'd recommend using [^\W_] at the beginning and then rearrange the starting part a bit. To prevent a special char before a #, just make sure there is a letter or digit there. I also recommend to remove redundant capturing groups/convert them into non-capturing:
#"^[^\W_](?:[\w.-]*[^\W_])?#(?:\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.|(?:[\w-]+\.)+)(?:[a-zA-Z]{2,3}|[0-9]{1,3})\]?$"
Here is a demo of how this regex matches now.
The [^\W_](?:[\w.-]*[^\W_])? matches:
[^\W_] - a digit or a letter only
(?:[\w.-]*[^\W_])? - a 1 or 0 occurrences of:
[\w.-]* - 0+ letters, digits, _, . and -
[^\W_] - a digit or a letter only
Change the initial [\w-\.]+ for [A-Za-z0-9\-\.]+.
Note that this excludes many acceptable email addresses.
Update
As pointed out, [A-Za-z0-9] is not an exact translation of \w. However, you appear to have a specific definition as to what you consider special characters and so it is probably easier for you to define within the square brackets what you class as allowable.

How can I match two capital letters together, that aren't preceded by special characters, using regex?

I've read through a lot of really interesting stuff recently about regex. Especially about creating your own regex boundaries
One thing that I don't think I've seen done (I'm 100% it has been done, but I haven't noticed any examples) is how to exclude a regex match if it's preceded by a 'special character', such as & ! % $ #. For example:
If I use the regex (Note this is from C#)
([A-Z]{2,}\\b)
It will match any capital letters that are two or more in length, and use the \b boundary to make sure the two capital letters don't start with or end with any other letters. But here's where I'm not sure how this would behave:
AA -Match
sAB -No Match
ACs -No Match
!AD -Match
AF! -Match
I would like to know how to select only two or more capital letters that aren't preceded by a lower case letter/number/symbol, or followed by a lower case letter/number/special characters.
I've seen people use spaces, so make sure the string starts with or ends with a space, but that doesn't work if it's at the beginning or end of a line.
So, the output I would look for from the example above would be:
AA -Match
sAB -No Match
ACs -No Match
!AD -No Match
AF! -No Match
Any help is appreciated.
You just need to use a lookbehind and a lookahead:
(?<![a-z\d!##$%^&*()])[A-Z]{2,}(?![a-z\d!##$%^&*()])
See regex demo
The (?<![a-z\d!##$%^&*()]) lookbehind makes sure there is no lowercase letters ([a-z]), digits (\d), or special characters that you defined. If there is one, the match is failed, nothing is returned.
The (?![a-z\d!##$%^&*()]) lookahead also fails a match if the same characters are found after the ALLCAPS letters.
See more details on Lookahead and Lookbehind Zero-Length Assertions here.
I think it's enough to just precede the pattern you have with a negation of lower case letter and any symbols you want to exclude. My example only excludes !, but you can add to the list as appropriate. ^ inside brackets negates what is inside them. So, for example, you can incorporate the pattern
/[^a-z!][A-Z]{2,}[^a-z!]/g

What does .* do in regex?

After extensive search, I am unable to find an explanation for the need to use .* in regex. For example, MSDN suggests a password regex of
#\"(?=.{6,})(?=(.*\d){1,})(?=(.*\W){1,})"
for length >= 6, 1+ digit and 1+ special character.
Why can't I just use:
#\"(?=.{6,})(?=(\d){1,})(?=(\W){1,})"
.* just means "0 or more of any character"
It's broken down into two parts:
. - a "dot" indicates any character
* - means "0 or more instances of the preceding regex token"
In your example above, this is important, since they want to force the password to contain a special character and a number, while still allowing all other characters. If you used \d instead of .*, for example, then that would restrict that portion of the regex to only match decimal characters (\d is shorthand for [0-9], meaning any decimal). Similarly, \W instead of .*\W would cause that portion to only match non-word characters.
A good reference containing many of these tokens for .NET can be found on the MSDN here: Regular Expression Language - Quick Reference
Also, if you're really looking to delve into regex, take a look at http://www.regular-expressions.info/. While it can sometimes be difficult to find what you're looking for on that site, it's one of the most complete and begginner-friendly regex references I've seen online.
Just FYI, that regex doesn't do what they say it does, and the way it's written is needlessly verbose and confusing. They say it's supposed to match more than seven characters, but it really matches as few as six. And while the other two lookaheads correctly match at least one each of the required character types, they can be written much more simply.
Finally, the string you copied isn't just a regex, it's an XML attribute value (including the enclosing quotes) that seems to represent a C# string literal (except the closing quote is missing). I've never used a Membership object, but I'm pretty sure that syntax is faulty. In any case, the actual regex is:
(?=.{6,})(?=(.*\d){1,})(?=(.*\W){1,})
..but it should be:
(?=.{8,})(?=.*\d)(?=.*\W)
The first lookahead tries to match eight or more of any characters. If it succeeds, the match position (or cursor, if you prefer) is reset to the beginning and the second lookahead scans for a digit. If it finds one, the cursor is reset again and the third lookahead scans for a special character. (Which, by the way, includes whitespace, control characters, and a boatload of other esoteric characters; probably not what the author intended.)
If you left the .* out of the latter two lookaheads, you would have (?=\d) asserting that the first character is a digit, and (?=\W) asserting that it's not a digit. (Digits are classed as word characters, and \W matches anything that's not a word character.) The .* in each lookahead causes it to initially gobble up the whole string, then backtrack, giving back one character at a time until it reaches a spot where the \d or \W can match. That's how they can match the digit and the special character anywhere in the string.
The .* portion just allows for literally any combination of characters to be entered. It's essentially allowing for the user to add any level of extra information to the password on top of the data you are requiring
Note: I don't think that MSDN page is actually suggesting that as a password validator. It is just providing an example of a possible one.

C# Regex explicitly match string

I want to match only words from A-Z and a-z with an optional period at the end. This is the code I have so far:
return Regex.IsMatch(word, #"[A-Za-z]\.?")
This means the following should return true: test, test..
The following should return false: t3st, test.., ., .
Right now this regex returns true for everything.
Try this regex:
#"^[A-Za-z]+\.?$"
Boundary matchers
^ means beginning of a line
$ means end of a line
Greedy quantifier
[A-Za-z]+ means [A-Za-z], one or more times
Your regex only asks for a single letter followed by an optional period. Since all your words start with a letter, that regex returns true.
Notice the + in Prince John Wesley's answer - that says to use one or more letters, so it'll "eat" all the letters in the word, stopping at a non-letter. Then the \.? tests for an optional period.
I don't think the ^ and $ are needed in this case.
You might also want to include the hyphen and the single-quote in your regex; otherwise you'll have problems with hyphenated words and contractions.
That may get messy; some words have more than one hyphen: "port-of-call" or "coat-of-arms", for example. Very rarely you'll find a word with more than one quote: "I'd've" for "I would have". They're rare enough you can probably forget about 'em. (oops! there's a word that starts with a quote... :)

Categories

Resources