C# Regex explicitly match string - c#

I want to match only words from A-Z and a-z with an optional period at the end. This is the code I have so far:
return Regex.IsMatch(word, #"[A-Za-z]\.?")
This means the following should return true: test, test..
The following should return false: t3st, test.., ., .
Right now this regex returns true for everything.

Try this regex:
#"^[A-Za-z]+\.?$"
Boundary matchers
^ means beginning of a line
$ means end of a line
Greedy quantifier
[A-Za-z]+ means [A-Za-z], one or more times

Your regex only asks for a single letter followed by an optional period. Since all your words start with a letter, that regex returns true.
Notice the + in Prince John Wesley's answer - that says to use one or more letters, so it'll "eat" all the letters in the word, stopping at a non-letter. Then the \.? tests for an optional period.
I don't think the ^ and $ are needed in this case.
You might also want to include the hyphen and the single-quote in your regex; otherwise you'll have problems with hyphenated words and contractions.
That may get messy; some words have more than one hyphen: "port-of-call" or "coat-of-arms", for example. Very rarely you'll find a word with more than one quote: "I'd've" for "I would have". They're rare enough you can probably forget about 'em. (oops! there's a word that starts with a quote... :)

Related

Regular expression thingy

I would like to create a custom regular expression. Which shall track that the first character of a username should be an alphabet.
Followed by alphanumeric or can have maximum one occurrence of a special character (- or _). I can check for username starts with the alphabet with this ^[a-zA-Z]+$ but not sure what to do to check at most one occurrence of a special character. Any ideas are welcome.
Thanks
From what I understood of your post, you want the following to match.
a-afdsafd
aafdsafd
aafdsa_fd
aafdsa-fd
aAfdsa-FD
And the following to not match:
aa-dsa-fd
aa-dsa_fd
-afdsafd
_afdsafd
Try /^[a-z](?:(?![a-z]+[\-_])[\-_])?[a-z]+(?:(?<![a-z]+[\-_])[\-_]?)[a-z]+?$/i
The i modifier enables case-insensitive matching.
The ^ and $ anchors ensure that the entire string matches our regex.
[a-z] checks that the first character is an alphabet.
(?:(?![a-z]+[\-_])[\-_])?) looks ahead to check that there is no "special character" used later and if there is none, we optionally match one special character.
[a-z]+ Match one or more alphabets.
(?:(?<![a-z]+[\-_])[\-_]?) does the same thing as 4 except it looks behind.
[a-z]+? Optionally match one or more alphabets.
https://regexr.com/3t86l
Edit: I noticed that aAfdsaFd_ should also match. The above does not match this. Slightly modifying #Wiktor Stribiżew's comment, ^[a-zA-Z][a-zA-Z0-9]*(?:[-_][a-zA-Z0-9]*)?$ seems to work fine with all cases. That's cleaner and more efficient. All credit to #Wiktor Stribiżew.
You could match an upper or lowercase character from the start of the string ^[a-zA-Z], match zero or more times alphanumeric [a-zA-Z0-9]* followed by an optional hyphen or underscore [-_]?.
At the end match zero or more times alphanumeric [a-zA-Z0-9]*$ until the end of the string.
^[a-zA-Z][a-zA-Z0-9]*[-_]?[a-zA-Z0-9]*$

Regex for first name

I am quite new to regex thing and need regex for first name which satisfies following conditions:
First Name must contain letters only. It may contain spaces, hyphens, or apostrophes.
It must begin with letters.
All other characters and numbers are not valid.
Special characters ‘ and – cannot be together (e.g. John’-s is not allowed)
An alphabet should be present before and after the special characters ‘ and – (e.g. John ‘s is not allowed)
Two consecutive spaces are not allowed (e.g. Annia St is not allowed)
Can anyone help? I tried this ^([a-z]+['-]?[ ]?|[a-z]+['-]?)*?[a-z]$ but it's not working as expected.
Regexes are notoriously difficult to write and maintain.
One technique that I've used over the years is to annotate my regexes by using named capture groups. It's not perfect, but can greatly help with the readability and maintainability of your regex.
Here is a regex that meets your requirements.
^(?<firstchar>(?=[A-Za-z]))((?<alphachars>[A-Za-z])|(?<specialchars>[A-Za-z]['-](?=[A-Za-z]))|(?<spaces> (?=[A-Za-z])))*$
It is split down into the following parts:
1) (?<firstchar>(?=[A-Za-z])) This ensures the first character is an alpha character, upper or lowercase.
2) (?<alphachars>[A-Za-z]) We allow more alpha chars.
3) (?<specialchars>[A-Za-z]['-](?=[A-Za-z])) We allow special characters, but only with an alpha character before and after.
4) (?<spaces> (?=[A-Za-z])) We allow spaces, but only one space, which must be followed by alpha characters.
You should use a testing tool when writing regexes, I'd recommend https://regex101.com/
You can see from the screenshot below how this regex performs.
Take the regex I've given you, run it in https://regex101.com/ with samples you'd like to match against, and tweak it to fit your requirements. Hopefully I've given you enough information to be self sufficient in customising it to your needs.
You can use this link to run the regex https://regex101.com/r/O2wFfi/1/
Edit
I've updated to address the issue in your comment, rather than just give you the code, I will explain the problem and how I fixed it.
For your example "Sam D'Joe", if we run the original regex, the following happens.
^(?<firstchar>[A-Za-z])((?<alphachars>[A-Za-z])|(?<specialchars>[A-Za-z]['-][A-Za-z])|(?<spaces> [A-Za-z]))*$
1) ^ matches the start of the string
2) (?<firstchar>[A-Za-z]) matches the first character
3) (?<alphachars>[A-Za-z]) matches every character up to the space
4) (?<spaces> [A-Za-z]) matches the space and the subsequent alpha char
Matches consume the characters that they match
This is where we run into a problem. Our "specialchars" part of the regex matches an alpha char, our special char and then another alpha char ((?<specialchars>[A-Za-z]['-](?=[A-Za-z]))).
The thing you need to know about regexes, is each time you match a character, that character is then consumed. We've already matched the alpha char before the special character, so our regex will never match.
Each step actually looks like this:
1) ^ matches the start of the string
2) (?<firstchar>[A-Za-z]) matches the first character
3) (?<alphachars>[A-Za-z]) matches every character up to the space
4) (?<spaces> [A-Za-z]) matches the space and the subsequent alpha char
and then we're left with the following
We cannot match this, because one of our rules is "An alphabet should be present before and after the special characters ‘ and –".
Lookahead
Regex has a concept called "lookahead". A lookahead allows you to match a character without consuming it!
The syntax for a lookahead is ?= followed by what you want to match. E.g. ?=[A-Z] would look ahead for a single character that is an uppercase letter.
We can fix our regex, by using lookaheads.
1) ^ matches the start of the string
2) (?<firstchar>[A-Za-z]) matches the first character
3) (?<alphachars>[A-Za-z]) matches every character up to the space
4) We now change our "spaces" regex, to lookahead to the alpha char, so we don't consume it. We change (?<spaces> [A-Za-z]) to (?<spaces> ?=[A-Za-z]). This matches the space and looks ahead to the subsequent alpha char, but doesn't consume it.
5) (?<specialchars>[A-Za-z]['-][A-Za-z]) matches the alpha char, the special char, and the subsequent alpha char.
6) We use a wildcard to repeat matching our previous 3 rules multiple times, and we match until the end of the line.
I also added lookaheads to the "firstchar", "specialchars" and "spaces" capture groups, I've bolded the changes below.
^(?<firstchar>(?=[A-Za-z]))((?<alphachars>[A-Za-z])|(?<specialchars>[A-Za-z]['-](?=[A-Za-z]))|(?<spaces> (?=[A-Za-z])))*$
This short regex should do it ^([a-zA-Z]+?)([-\s'][a-zA-Z]+)*?$ ,
([a-zA-Z]+?) - Means the String should start with alphabets.
([-\s'][a-zA-Z]+)*? - Means the string must have hyphen,space or apostrophe followed by alphabets.
^ and $ - denote start and end of string
Here's the link to regex demo.
Try this one
^[^- '](?=(?![A-Z]?[A-Z]))(?=(?![a-z]+[A-Z]))(?=(?!.*[A-Z][A-Z]))(?=(?!.*[- '][- '.]))(?=(?!.*[.][-'.]))[A-Za-z- '.]{2,}$
Demo

How can I match two capital letters together, that aren't preceded by special characters, using regex?

I've read through a lot of really interesting stuff recently about regex. Especially about creating your own regex boundaries
One thing that I don't think I've seen done (I'm 100% it has been done, but I haven't noticed any examples) is how to exclude a regex match if it's preceded by a 'special character', such as & ! % $ #. For example:
If I use the regex (Note this is from C#)
([A-Z]{2,}\\b)
It will match any capital letters that are two or more in length, and use the \b boundary to make sure the two capital letters don't start with or end with any other letters. But here's where I'm not sure how this would behave:
AA -Match
sAB -No Match
ACs -No Match
!AD -Match
AF! -Match
I would like to know how to select only two or more capital letters that aren't preceded by a lower case letter/number/symbol, or followed by a lower case letter/number/special characters.
I've seen people use spaces, so make sure the string starts with or ends with a space, but that doesn't work if it's at the beginning or end of a line.
So, the output I would look for from the example above would be:
AA -Match
sAB -No Match
ACs -No Match
!AD -No Match
AF! -No Match
Any help is appreciated.
You just need to use a lookbehind and a lookahead:
(?<![a-z\d!##$%^&*()])[A-Z]{2,}(?![a-z\d!##$%^&*()])
See regex demo
The (?<![a-z\d!##$%^&*()]) lookbehind makes sure there is no lowercase letters ([a-z]), digits (\d), or special characters that you defined. If there is one, the match is failed, nothing is returned.
The (?![a-z\d!##$%^&*()]) lookahead also fails a match if the same characters are found after the ALLCAPS letters.
See more details on Lookahead and Lookbehind Zero-Length Assertions here.
I think it's enough to just precede the pattern you have with a negation of lower case letter and any symbols you want to exclude. My example only excludes !, but you can add to the list as appropriate. ^ inside brackets negates what is inside them. So, for example, you can incorporate the pattern
/[^a-z!][A-Z]{2,}[^a-z!]/g

Better way to write this RegEx?

I have this password regex for an application that is being built its purpose is to:
Make sure users use between 6 - 12 characters.
Make sure users use either one special character or one number.
Also that its case insensitive.
The application is in .net I have the following regex:
I have the following regex for the password checker, bit lengthy but for your viewing if you feel any of this is wrong please let me know.
^(?=.*\d)(?=.*[A-Za-z]).{6-12}$|^(?=.*[A-Za-z])(?=.*[!#$%&'\(\)\*\+-\.:;<=>\?#\[\\\]\^_`\{\|\}~0x0022]|.*\s).{6,12}$
Just a break down of the regex to make sure your all happy it’s correct.
^ = start of string ”^”
(?=.*\d) = must contain “?=” any set of characters “.*” but must include a digit “\d”.
(?=.*[A-Za-z]) = must contain “?=” any set of characters “.*” but must include an insensitive case letter.
.{6-12}$ = must contain any set of characters “.” but must have between 6-12 characters and end of string “$”.
|^ = or “|” start of string “^”
(?=.*[A-Za-z]) = must contain “?=” any set of characters “.*” but must include an insensitive case letter.
(?=.*[!#$%&'\(\)\*\+-\.:;<=>\?#\[\\\]\^_`\{\|\}~0x0022]|.*\s) = must contain “?=” any set of characters “.*” but must include at least special character we have defined or a space ”|.*\s)”. “0x0022” is Unicode for single quote “ character.
.{6,12}$ = set of characters “.” must be between 6 – 12 and this is the end of the string “$”
It's quite long winded, seems to be doing the job but I want to know if there is simpler methods to write this sort of regex and I want to know how I can shorten it if its possible?
Thanks in Advanced.
Does it have to be regex? Looking at the requirements, all you need is String.Length and String.IndexOfAny().
First, good job at providing comments for your regex. However, there is a much better way. Simply write your regex from the get-go in free-spacing mode with lots of comments. This way you can document your regex right in the source code (and provide indentation to improve readability when there are lots of parentheses). Here is how I would write your original regex in C# code:
if (Regex.IsMatch(usernameString,
#"# Validate username having a digit and/or special char.
^ # Either... Anchor to start of string.
(?=.*\d) # Assert there is a digit AND
(?=.*[A-Za-z]) # assert there is an alpha.
.{6-12} # Match any name with length from 6 to 12.
$ # Anchor to end of string.
| ^ # Or... Anchor to start of string
(?=.*[A-Za-z]) # Assert there is an alpha AND
(?=.* # assert there is either a special char
[!#$%&'\(\)\*\+-\.:;<=>\?#\[\\\]\^_`\{\|\}~\x22]
| .*\s # or a space char.
) # End specialchar-or-space assertion.
.{6-12} # Match any name with length from 6 to 12.
$ # Anchor to end of string.
", RegexOptions.IgnorePatternWhitespace)) {
// Valid username.
} else {
// Invalid username.
}
The code snippet above uses the preferable #"..." string syntax which simplifies the escaping of metacharacters. This original regex erroneously separates the two numbers of the curly brace quantifier using a dash, i.e. .{6-12}. The correct syntax is to separate these numbers with a comma, i.e. .*{6,12}. (Maybe .NET allows using the .{6-12} syntax?) I've also changed the 0x0022 (the " double quote char) to \x22.
That said, yes the original regex can be improved a bit:
if (Regex.IsMatch(usernameString,
#"# Validate username having a digit and/or special char.
^ # Anchor to start of string.
(?=.*?[A-Za-z]) # Assert there is an alpha.
(?: # Group for assertion alternatives.
(?=.*?\d) # Either assert there is a digit
| # or assert there is a special char
(?=.*?[!#$%&'()*+-.:;<=>?#[\\\]^_`{|}~\x22\s]) # or space.
) # End group of assertion alternatives.
.{6,12} # Match any name with length from 6 to 12.
$ # Anchor to end of string.
", RegexOptions.IgnorePatternWhitespace)) {
// Valid username.
} else {
// Invalid username.
}
This regex eliminates the global alternative and instead uses a non-capture group for the "digit or specialchar" assertion alternatives. Also, you can eliminate the non-capture group for the "special char or whitespace" alternatives by simply adding the \s to the list of special chars. I've also added a lazy modifier to the dot-stars in the assertions, i.e. .*? - (this may make the regex match a bit faster.) A bunch of unnecessary escapes were removed from the specialchar character class.
But as Stema cleverly pointed out, you can combine the digit and special char to simplify this even further:
if (Regex.IsMatch(usernameString,
#"# Validate username having a digit and/or special char.
^ # Anchor to start of string
(?=.*?[A-Za-z]) # Assert there is an alpha.
# Assert there is a special char, space
(?=.*?[!#$%&'()*+-.:;<=>?#[\\\]^_`{|}~\x22\s\d]) # or digit.
.{6,12} # Match any name with length from 6 to 12.
$ # Anchor to end of string.
", RegexOptions.IgnorePatternWhitespace)) {
// Valid username.
} else {
// Invalid username.
}
Other than that, there is really nothing wrong with your original regex with regard to accuracy. However, logically, this formula allows a username to end with whitespace which is probably not a good idea. I would also explicitly specify a whitelist of allowable chars in the name rather than using the overly permissive "." dot.
I am not sure if it makes sense what you are doing, but to achieve that, your regex can be simpler
^(?=.*[A-Za-z])(?=.*[\d\s!#$%&'\(\)\*\+-\.:;<=>\?#\[\\\]\^_`\{\|\}~0x0022]).{6,12}$
Why using alternatives? Just Add \d and \s to the character class.

Using Regular Expression Match a String that contains numbers letters and dashes

I need to match this string 011Q-0SH3-936729 but not 345376346 or asfsdfgsfsdf
It has to contain characters AND numbers AND dashes
Pattern could be 011Q-0SH3-936729 or 011Q-0SH3-936729-SDF3 or 000-222-AAAA or 011Q-0SH3-936729-011Q-0SH3-936729-011Q-0SH3-936729-011Q-0SH3-936729 and I want it to be able to match anyone of those. Reason for this is that I don't really know if the format is fixed and I have no way of finding out either so I need to come up with a generic solution for a pattern with any number of dashes and the pattern recurring any number of times.
Sorry this is probably a stupid question, but I really suck at Regular expressions.
TIA
foundMatch = Regex.IsMatch(subjectString,
#"^ # Start of the string
(?=.*\p{L}) # Assert that there is at least one letter
(?=.*\p{N}) # and at least one digit
(?=.*-) # and at least one dash.
[\p{L}\p{N}-]* # Match a string of letters, digits and dashes
$ # until the end of the string.",
RegexOptions.IgnorePatternWhitespace);
should do what you want. If by letters/digits you meant "only ASCII letters/digits" (and not international/Unicode letters, too), then use
foundMatch = Regex.IsMatch(subjectString,
#"^ # Start of the string
(?=.*[A-Z]) # Assert that there is at least one letter
(?=.*[0-9]) # and at least one digit
(?=.*-) # and at least one dash.
[A-Z0-9-]* # Match a string of letters, digits and dashes
$ # until the end of the string.",
RegexOptions.IgnorePatternWhitespace | RegexOptions.IgnoreCase);
EDIT:
this will match any of the key provided in your comments:
^[0-9A-Z]+(-[0-9A-Z]+)+$
this means the key starts with the digit or letter and have at leats one dash symbol:
Without more info about the regularity of the dashes or otherwise, this is the best we can do:
Regex.IsMatch(input,#"[A-Z0-9\-]+\-[A-Z0-9]")
Although this will also match -A-0
Most naive implementation EVER (might get you started):
([0-9]|[A-Z])+(-)([0-9]|[A-Z])+(-)([0-9]|[A-Z])+
Tested with Regex Coach.
EDIT:
That does match only three groups; here another, slightly better:
([0-9A-Z]+\-)+([0-9A-Z]+)
Are you applying the regex to a whole string (i.e., validating or filtering)? If so, Tim's answer should put you right. But if you're plucking matches from a larger string, it gets a bit more complicated. Here's how I would do that:
string input = #"Pattern could be 011Q-0SH3-936729 or 011Q-0SH3-936729-SDF3 or 000-222-AAAA or 011Q-0SH3-936729-011Q-0SH3-936729-011Q-0SH3-936729-011Q-0SH3-936729 but not 345-3763-46 or ASFS-DFGS-FSDF or ASD123FGH987.";
Regex pluckingRegex = new Regex(
#"(?<!\S) # start of 'word'
(?=\S*\p{L}) # contains a letter
(?=\S*\p{N}) # contains a digit
(?=\S*-) # contains a hyphen
[\p{L}\p{N}-]+ # gobble up letters, digits and hyphens only
(?!\S) # end of 'word'
", RegexOptions.IgnorePatternWhitespace);
foreach (Match m in pluckingRegex.Matches(input))
{
Console.WriteLine(m.Value);
}
output: 011Q-0SH3-936729
011Q-0SH3-936729-SDF3
000-222-AAAA
011Q-0SH3-936729-011Q-0SH3-936729-011Q-0SH3-936729-011Q-0SH3-936729
The negative lookarounds serve as 'word' boundaries: they insure the matched substring starts either at the beginning of the string or after a whitespace character ((?<!\S)), and ends either at the end of the string or before a whitespace character ((?!\S)).
The three positive lookaheads work just like Tim's, except they use \S* to skip whatever precedes the first letter/digit/hyphen. We can't use .* in this case because that would allow it to skip to the next word, or the next, etc., defeating the purpose of the lookahead.

Categories

Resources