I have a case where I am using a queue of regular expressions to filter out specific items in an Observer pattern. The filter will place the values in specific controls based on their values. However 1 of the controls pattern is that it can accept ANY ASCII Character. Let me list the filters in their order with the RegEx
Column Rule Regex
Receiving 7 DIGITS #"^[1-9]([0-9]{6}$)" --->Works
Count 2 digits, no leading 0 #"^[1-9]([0-9]{0,1})$" --->Works
Producer any ASCII char. #".*" --->too broad
MUST contain a letter
Is there a regular expression that will accept any set of ASCII characters, but 1 of them MUST be a letter (upper or lower case)?
#"^(?=.*[A-Za-z])$" -->Didn't work
examples that would need to go into expression
123 red
red
123 red123
red - 123
red
If you want to match the whole rang of ASCII chars you may use
#"^(?=[^A-Za-z]*[A-Za-z])[\x00-\x7F]*$"
If only printable chars are allowed use
#"^(?=[^A-Za-z]*[A-Za-z])[ -~]*$"
Note the (?=[^A-Za-z]*[A-Za-z]) positive lookahead is located right after ^, that is, it is only triggered at the start of a string. It requires an ASCII letter after any 0 or more chars other than an ASCII letter.
Your ^(?=.*[A-Za-z])$ pattern did not work because you wanted to match an empty string (^$) that contains (?=...) at least one ASCII letter ([A-Za-z]) after any 0+ chars other than newline (.*).
You could try [A-Za-z]+.
It matches when there is at least one letter. You want something more specific?
How about
^.*[a-zA-Z]+.*$ ?
Between start and end of line, accept any number of any characters, then at least one a-z/A-Z character, then again any number of any characters.
Related
I need a regex that checks if a password contains at least one Lowercase letter, at least one Upper case letter, at least two numbers and at least one of(_*&$). This is a MVC project.
This what i have
[RegularExpression(#"(?=\.\*\\d{2})(?=\./*[a-z])(?=\.\*[A-Z])(?=.*[_*&$])", ErrorMessage = "The password must contain at least 1 letter, 2 digits and a special symbol (_*&$)")]
There are a lot of issues with the current regex:
You escaped . and *, and now they denote literal . and * chars, while you wanted to use them as special regex metacharacters
To match at least two digits, you can't just use a \d{2} pattern because it does not match non-consecutive digits, you need \d.*\d or a more efficient (?:\D*\d){2}
You are only using lookaheads, non-consuming patterns, but the RegularExpressionAttribute requires a full string to match the pattern.
Thus, you need
#"^(?=(?:\D*\d){2})(?=[^a-z]*[a-z])(?=[^A-Z]*[A-Z])(?=[^_*&$]*[_*&$]).*"
Details
^ - start of string
(?=(?:\D*\d){2}) - two not necessarily consecutive digits
(?=[^a-z]*[a-z]) - at least one lowercase ASCII letter
(?=[^A-Z]*[A-Z]) - at least one uppercase ASCII letter
(?=[^_*&$]*[_*&$]) - at least one special char from the _*&$ set
.* - the whole string (with no line breaks) (it gets consumed).
I am quite new to regex thing and need regex for first name which satisfies following conditions:
First Name must contain letters only. It may contain spaces, hyphens, or apostrophes.
It must begin with letters.
All other characters and numbers are not valid.
Special characters ‘ and – cannot be together (e.g. John’-s is not allowed)
An alphabet should be present before and after the special characters ‘ and – (e.g. John ‘s is not allowed)
Two consecutive spaces are not allowed (e.g. Annia St is not allowed)
Can anyone help? I tried this ^([a-z]+['-]?[ ]?|[a-z]+['-]?)*?[a-z]$ but it's not working as expected.
Regexes are notoriously difficult to write and maintain.
One technique that I've used over the years is to annotate my regexes by using named capture groups. It's not perfect, but can greatly help with the readability and maintainability of your regex.
Here is a regex that meets your requirements.
^(?<firstchar>(?=[A-Za-z]))((?<alphachars>[A-Za-z])|(?<specialchars>[A-Za-z]['-](?=[A-Za-z]))|(?<spaces> (?=[A-Za-z])))*$
It is split down into the following parts:
1) (?<firstchar>(?=[A-Za-z])) This ensures the first character is an alpha character, upper or lowercase.
2) (?<alphachars>[A-Za-z]) We allow more alpha chars.
3) (?<specialchars>[A-Za-z]['-](?=[A-Za-z])) We allow special characters, but only with an alpha character before and after.
4) (?<spaces> (?=[A-Za-z])) We allow spaces, but only one space, which must be followed by alpha characters.
You should use a testing tool when writing regexes, I'd recommend https://regex101.com/
You can see from the screenshot below how this regex performs.
Take the regex I've given you, run it in https://regex101.com/ with samples you'd like to match against, and tweak it to fit your requirements. Hopefully I've given you enough information to be self sufficient in customising it to your needs.
You can use this link to run the regex https://regex101.com/r/O2wFfi/1/
Edit
I've updated to address the issue in your comment, rather than just give you the code, I will explain the problem and how I fixed it.
For your example "Sam D'Joe", if we run the original regex, the following happens.
^(?<firstchar>[A-Za-z])((?<alphachars>[A-Za-z])|(?<specialchars>[A-Za-z]['-][A-Za-z])|(?<spaces> [A-Za-z]))*$
1) ^ matches the start of the string
2) (?<firstchar>[A-Za-z]) matches the first character
3) (?<alphachars>[A-Za-z]) matches every character up to the space
4) (?<spaces> [A-Za-z]) matches the space and the subsequent alpha char
Matches consume the characters that they match
This is where we run into a problem. Our "specialchars" part of the regex matches an alpha char, our special char and then another alpha char ((?<specialchars>[A-Za-z]['-](?=[A-Za-z]))).
The thing you need to know about regexes, is each time you match a character, that character is then consumed. We've already matched the alpha char before the special character, so our regex will never match.
Each step actually looks like this:
1) ^ matches the start of the string
2) (?<firstchar>[A-Za-z]) matches the first character
3) (?<alphachars>[A-Za-z]) matches every character up to the space
4) (?<spaces> [A-Za-z]) matches the space and the subsequent alpha char
and then we're left with the following
We cannot match this, because one of our rules is "An alphabet should be present before and after the special characters ‘ and –".
Lookahead
Regex has a concept called "lookahead". A lookahead allows you to match a character without consuming it!
The syntax for a lookahead is ?= followed by what you want to match. E.g. ?=[A-Z] would look ahead for a single character that is an uppercase letter.
We can fix our regex, by using lookaheads.
1) ^ matches the start of the string
2) (?<firstchar>[A-Za-z]) matches the first character
3) (?<alphachars>[A-Za-z]) matches every character up to the space
4) We now change our "spaces" regex, to lookahead to the alpha char, so we don't consume it. We change (?<spaces> [A-Za-z]) to (?<spaces> ?=[A-Za-z]). This matches the space and looks ahead to the subsequent alpha char, but doesn't consume it.
5) (?<specialchars>[A-Za-z]['-][A-Za-z]) matches the alpha char, the special char, and the subsequent alpha char.
6) We use a wildcard to repeat matching our previous 3 rules multiple times, and we match until the end of the line.
I also added lookaheads to the "firstchar", "specialchars" and "spaces" capture groups, I've bolded the changes below.
^(?<firstchar>(?=[A-Za-z]))((?<alphachars>[A-Za-z])|(?<specialchars>[A-Za-z]['-](?=[A-Za-z]))|(?<spaces> (?=[A-Za-z])))*$
This short regex should do it ^([a-zA-Z]+?)([-\s'][a-zA-Z]+)*?$ ,
([a-zA-Z]+?) - Means the String should start with alphabets.
([-\s'][a-zA-Z]+)*? - Means the string must have hyphen,space or apostrophe followed by alphabets.
^ and $ - denote start and end of string
Here's the link to regex demo.
Try this one
^[^- '](?=(?![A-Z]?[A-Z]))(?=(?![a-z]+[A-Z]))(?=(?!.*[A-Z][A-Z]))(?=(?!.*[- '][- '.]))(?=(?!.*[.][-'.]))[A-Za-z- '.]{2,}$
Demo
I am currently working on a regex which needs to match exactly 8 digits. But sometimes it occurs that there are spaces or dots between those numbers. This is the regex that i am currently using.
([0-9\ ?.?]{7,16})
It works fine most of the time, but the problem I am having is that it sometimes matches number with a lot of spaces tailing it so you will get something like 1234/s/s/s/s (/s stands for space) Or sometimes it is only matching spaces.
What i want is a regex that always matches at least 8 digits and also allows spaces and dots without detecting less then 8 digits. I know it may be stupid question, but I couldn't find anything I need elswhere.
Your ([0-9\ ?.?]{7,16}) expression matches 7 to 16 occurrences of any character that is either a digit, or a space, or a ?, or .. Yes, the ? inside [...] is a literal ?, not a quantifier.
You need to use an expression that will match a digit ([0-9]) and then exactly 7 sequences of a space or period ([ .]) followed with 1 digit, and to make sure you are not matching the digits in 123.156.78.146 you may use special boundaries:
(?<!\d[ .]?)\d(?:[. ]?\d){7}(?![ .]?\d)
if the space or . can only be 0 to 1 in between digits; or - if the space/dot can appear 0 or more times,
(?<!\d[ .]*)\d(?:[. ]*\d){7}(?![ .]*\d)
See the regex demo
The (?<!\d[ .]*) is a negative lookbehind that will fail any match if it starts with a digit that is followed with .(s) or space(s), and the (?![ .]*\d) negative lookahead will fail the match if the 7 digits you need are followed with .(s) or space(s) and a digit.
To solve this, describe the problem to yourself. You want to match one digit followed by seven repetitions of space-or-dot followed by a digit. This leads to a regular expression like \d([ .]?\d){7}. To avoid collecting the seven captures add a :? after the (. To capture the whole string, enclose it in brackets. Adding both changes gives the expression (\d(:?[ .]?\d){7}). If more than one space or dot is allowed between the digits then change the ? to a *.
To get just the eight digits out of the string I suggest using the string captured above and replacing any spaces or dots with nothing.
Say I have a collection of strings "123AB", "456CDEF", "789G", "012-HI". How do I find all the strings that are number(1 or more) followed by alpha(1 or more) with no special characters, where the alpha characters are not AB?
To clarify, the regex applied to the previous collection should yield "456CDEF" and "789G". "123AB" is ignored because the alpha value is AB and "012-HI" is ignored because it contains a non-alpha. The regex for what I'm looking for, minus the special AB rule, is ^[0-9]+[A-Z]+$. Case is irrelevent. I've tried a few variations of the [^ ] rule with no success, since all the patterns I came up with allowed special characters.
To generalize, how can I match a set of alpha values that don't match a certain subset of alpha values, using a single regex pattern?
Note: "123ABC" should be accepted as well. Only strings with AB exactly should be ignored.
You want a lookahead assertion: ^[0-9]+(?!AB$)[A-Z]+$
Without look-ahead:
^[0-9]+([B-Z]|A[AC-Z])[A-Z]*$
Edit:
Withdrawn, because this won't match something like 123A. But I'll leave the answer visible as an example of what won't work.
^\d+(?:a(?:b[a-z]+|[^b][a-z]*)?|[^a][a-z]*)$
start of string, then 1 or more digit, then either (A) a followed by b followed by more letters, or (B) a followed by any letter other than b and optionally any letter after that, or (C) any letter other than a followed by any letter, then end of string
I'm trying to create a regex to verify that a given string only has alpha characters a-z or A-Z. The string can be up to 25 letters long. (I'm not sure if regex can check length of strings)
Examples:
1. "abcdef" = true;
2. "a2bdef" = false;
3. "333" = false;
4. "j" = true;
5. "aaaaaaaaaaaaaaaaaaaaaaaaaa" = false; //26 letters
Here is what I have so far... can't figure out what's wrong with it though
Regex alphaPattern = new Regex("[^a-z]|[^A-Z]");
I would think that would mean that the string could contain only upper or lower case letters from a-z, but when I match it to a string with all letters it returns false...
Also, any suggestions regarding efficiency of using regex vs. other verifying methods would be greatly appreciated.
Regex lettersOnly = new Regex("^[a-zA-Z]{1,25}$");
^ means "begin matching at start of string"
[a-zA-Z] means "match lower case and upper case letters a-z"
{1,25} means "match the previous item (the character class, see above) 1 to 25 times"
$ means "only match if cursor is at end of string"
I'm trying to create a regex to verify that a given string only has alpha
characters a-z or A-Z.
Easily done as many of the others have indicated using what are known as "character classes". Essentially, these allow us to specifiy a range of values to use for matching:
(NOTE: for simplification, I am assuming implict ^ and $ anchors which are explained later in this post)
[a-z] Match any single lower-case letter.
ex: a matches, 8 doesn't match
[A-Z] Match any single upper-case letter.
ex: A matches, a doesn't match
[0-9] Match any single digit zero to nine
ex: 8 matches, a doesn't match
[aeiou] Match only on a or e or i or o or u.
ex: o matches, z doesn't match
[a-zA-Z] Match any single lower-case OR upper-case letter.
ex: A matches, a matches, 3 doesn't match
These can, naturally, be negated as well:
[^a-z] Match anything that is NOT an lower-case letter
ex: 5 matches, A matches, a doesn't match
[^A-Z] Match anything that is NOT an upper-case letter
ex: 5 matches, A doesn't matche, a matches
[^0-9] Match anything that is NOT a number
ex: 5 doesn't match, A matches, a matches
[^Aa69] Match anything as long as it is not A or a or 6 or 9
ex: 5 matches, A doesn't match, a doesn't match, 3 matches
To see some common character classes, go to:
http://www.regular-expressions.info/reference.html
The string can be up to 25 letters long.
(I'm not sure if regex can check length of strings)
You can absolutely check "length" but not in the way you might imagine. We measure repetition, NOT length strictly speaking using {}:
a{2} Match two a's together.
ex: a doesn't match, aa matches, aca doesn't match
4{3} Match three 4's together.
ex: 4 doesn't match, 44 doesn't match, 444 matches, 4434 doesn't match
Repetition has values we can set to have lower and upper limits:
a{2,} Match on two or more a's together.
ex: a doesn't match, aa matches, aaa matches, aba doesn't match, aaaaaaaaa matches
a{2,5} Match on two to five a's together.
ex: a doesn't match, aa matches, aaa matches, aba doesn't match, aaaaaaaaa doesn't match
Repetition extends to character classes, so:
[a-z]{5} Match any five lower-case characters together.
ex: bubba matches, Bubba doesn't match, BUBBA doesn't match, asdjo matches
[A-Z]{2,5} Match two to five upper-case characters together.
ex: bubba doesn't match, Bubba doesn't match, BUBBA matches, BUBBETTE doesn't match
[0-9]{4,8} Match four to eight numbers together.
ex: bubba doesn't match, 15835 matches, 44 doesn't match, 3456876353456 doesn't match
[a3g]{2} Match an a OR 3 OR g if they show up twice together.
ex: aa matches, ba doesn't match, 33 matches, 38 doesn't match, a3 DOESN'T match
Now let's look at your regex:
[^a-z]|[^A-Z]
Translation: Match anything as long as it is NOT a lowercase letter OR an upper-case letter.
To fix it so it meets your needs, we would rewrite it like this:
Step 1: Remove the negation
[a-z]|[A-Z]
Translation: Find any lowercase letter OR uppercase letter.
Step 2: While not stricly needed, let's clean up the OR logic a bit
[a-zA-Z]
Translation: Find any lowercase letter OR uppercase letter. Same as above but now using only a single set of [].
Step 3: Now let's indicate "length"
[a-zA-Z]{1,25}
Translation: Find any lowercase letter OR uppercase letter repeated one to twenty-five times.
This is where things get funky. You might think you were done here and you may well be depending on the technology you are using.
Strictly speaking the regex [a-zA-Z]{1,25} will match one to twenty-five upper or lower-case letters ANYWHERE on a line:
[a-zA-Z]{1,25}
a matches, aZgD matches, BUBBA matches, 243242hello242552 MATCHES
In fact, every example I have given so far will do the same. If that is what you want then you are in good shape but based on your question, I'm guessing you ONLY want one to twenty-five upper or lower-case letters on the entire line. For that we turn to anchors. Anchors allow us to specify those pesky details:
^ beginning of a line
(I know, we just used this for negation earlier, don't get me started)
$ end of a line
We can use them like this:
^a{3} From the beginning of the line match a three times together
ex: aaa matches, 123aaa doesn't match, aaa123 matches
a{3}$ Match a three times together at the end of a line
ex: aaa matches, 123aaa matches, aaa123 doesn't match
^a{3}$ Match a three times together for the ENTIRE line
ex: aaa matches, 123aaa doesn't match, aaa123 doesn't match
Notice that aaa matches in all cases because it has three a's at the beginning and end of the line technically speaking.
So the final, technically correct solution, for finding a "word" that is "up to five characters long" on a line would be:
^[a-zA-Z]{1,25}$
The funky part is that some technologies implicitly put anchors in the regex for you and some don't. You just have to test your regex or read the docs to see if you have implicit anchors.
/// <summary>
/// Checks if string contains only letters a-z and A-Z and should not be more than 25 characters in length
/// </summary>
/// <param name="value">String to be matched</param>
/// <returns>True if matches, false otherwise</returns>
public static bool IsValidString(string value)
{
string pattern = #"^[a-zA-Z]{1,25}$";
return Regex.IsMatch(value, pattern);
}
The string can be up to 25 letters long.
(I'm not sure if regex can check length of strings)
Regexes ceartanly can check length of a string - as can be seen from the answers posted by others.
However, when you are validating a user input (say, a username), I would advise doing that check separately.
The problem is, that regex can only tell you if a string matched it or not. It won't tell why it didn't match. Was the text too long or did it contain unallowed characters - you can't tell. It's far from friendly, when a program says: "The supplied username contained invalid characters or was too long". Instead you should provide separate error messages for different situations.
The regular expression you are using is an alternation of [^a-z] and [^A-Z]. And the expressions [^…] mean to match any character other than those described in the character set.
So overall your expression means to match either any single character other than a-z or other than A-Z.
But you rather need a regular expression that matches a-zA-Z only:
[a-zA-Z]
And to specify the length of that, anchor the expression with the start (^) and end ($) of the string and describe the length with the {n,m} quantifier, meaning at least n but not more than m repetitions:
^[a-zA-Z]{0,25}$
Do I understand correctly that it can only contain either uppercase or lowercase letters?
new Regex("^([a-z]{1,25}|[A-Z]{1,25})$")
A regular expression seems to be the right thing to use for this case.
By the way, the caret ("^") at the first place inside a character class means "not", so your "[^a-z]|[^A-Z]" would mean "not any lowercase letter, or not any uppercase letter" (disregarding that a-z are not all letters).