c# Conditional Regular Expression String Match - c#

I am trying to use c# Regular Expression to match a particular string of characters but I can not figure out how to do it. Any help is appreciated.
The string that I am trying to match is as follows, where A is an uppercase alpha character, X is an upper case alpha-numeric character and # is 0, 1 or 2.
AA-#-XX-X-XXX-XXXXXXX-XXXXXXXX
So any of the following would match the string above.
XY-1
MM-0-AB
MM-0-AB-1-ABC-1234567
VV-2-XX-7-CCC-ABCDEFG-12345678
Any any of the following would NOT match.
QQ-7-AA (Only 0, 1, 2 are allowed at the second level.)
QQ-2-XX-7-CC (Partial characters for that level.)
QQ-2-XX-7-CCC-ABCDEFG- (Can not end in a dash.)
QQ-2-XX-7-CCC-ABCDEFG-123456 (Partial characters for that level.)
So far (not that far really) I have as the pattern to match #"^[A-Z]{2}", but I am unsure how to match conditionally (I'm not even sure if conditionally is the proper term to use) the rest of the string, but only if it is there. Do I need to write 7 different statements for this? Seems unreasonable, but I could be wrong.

Have a look at the Regular Expression Language. You need the following elements:
uppercase alpha character: [A-Z]
upper case alpha-numeric character: [A-Z0-9]
0, 1 or 2: [0-2]
dash: -
match x exactly n times: x{n}
match x zero or one time: x?
define a subexpression: (...)
Examples:
two uppercase alpha characters: [A-Z]{2}
two uppercase alpha characters, followed by a dash: [A-Z]{2}-
two uppercase alpha characters, followed by a dash, followed by 0, 1 or 2: [A-Z]{2}-[0-2]
two uppercase alpha characters, followed by a dash, followed by 0, 1 or 2, but with the subexpression consisting of the dash and 0, 1 or 2 occurring zero or one time:
[A-Z]{2}(-[0-2])?
and so on...
Resulting expression:
^[A-Z]{2}(-[0-2](-[A-Z0-9]{2}(-[A-Z0-9](-[A-Z0-9]{3}(-[A-Z0-9]{7}(-[A-Z0-9]{8})?)?)?)?)?)?$

Related

Regularexpression for duplicate pattern

I am trying to write a regex to handle these cases
contains only alphanumeric with minimum of 2 alpha characters(numbers are optional).
only special character allowed is hyphen.
cannot be all same letter ignoring hyphen.
cannot be all hyphens
cannot be all numeric
My regex: (?=[^A-Za-z]*[A-Za-z]){2}^[\w-]{6,40}$
Above regex works for most of the scenarios except 1) & 3).
Can anyone suggest me to fix this. I am stuck in this.
Regards,
Sajesh
Rule 1 eliminates rule 4 and 5: It can neither contain only hyphens, nor only digits.
/^(?=[a-z\d-]{6,40}$)[\d-]*([a-z]).*?(?!\1)[a-z].*$/i
(?=[a-z\d-]{6,40}$) look ahead for specified characters from 6 to 40
([a-z]).*?(?!\1)[a-z] checks for two letters and at least one different
See this demo at regex101
This pattern with i flag considers A and a as the "same" letter (caseless matching) and will require another alpbhabet. For case sensitive matching here another demo at regex101.
You can use
^(?!\d+$)(?!-+$)(?=(?:[\d-]*[A-Za-z]){2})(?![\d-]*([A-Za-z])(?:[\d-]*\1)+[\d-]*$)[A-Za-z\d-]{6,40}$
See the regex demo. If you use it in C# or PHP, consider replacing ^ with \A and $ with \z to make sure you match the entire string even in case there is a trailing newline.
Details:
^ - start of string
(?!\d+$) - fail the match if the string only consists of digits
(?!-+$) - fail the match if the string only consists of hyphens
(?=(?:[\d-]*[A-Za-z]){2}) - there must be at least two ASCII letters after any zero or more digits or hyphens
(?![\d-]*([A-Za-z])(?:[\d-]*\1)+[\d-]*$) - fail the match if the string contains two or more identical letters (the + after (?:[\d-]*\1) means there can be any one letter)
[A-Za-z\d-]{6,40} - six to forty alphanumeric or hyphen chars
$ - end of string. (\z might be preferable.)

Regular expression that Must have at least one letter

I have a case where I am using a queue of regular expressions to filter out specific items in an Observer pattern. The filter will place the values in specific controls based on their values. However 1 of the controls pattern is that it can accept ANY ASCII Character. Let me list the filters in their order with the RegEx
Column Rule Regex
Receiving 7 DIGITS #"^[1-9]([0-9]{6}$)" --->Works
Count 2 digits, no leading 0 #"^[1-9]([0-9]{0,1})$" --->Works
Producer any ASCII char. #".*" --->too broad
MUST contain a letter
Is there a regular expression that will accept any set of ASCII characters, but 1 of them MUST be a letter (upper or lower case)?
#"^(?=.*[A-Za-z])$" -->Didn't work
examples that would need to go into expression
123 red
red
123 red123
red - 123
red
If you want to match the whole rang of ASCII chars you may use
#"^(?=[^A-Za-z]*[A-Za-z])[\x00-\x7F]*$"
If only printable chars are allowed use
#"^(?=[^A-Za-z]*[A-Za-z])[ -~]*$"
Note the (?=[^A-Za-z]*[A-Za-z]) positive lookahead is located right after ^, that is, it is only triggered at the start of a string. It requires an ASCII letter after any 0 or more chars other than an ASCII letter.
Your ^(?=.*[A-Za-z])$ pattern did not work because you wanted to match an empty string (^$) that contains (?=...) at least one ASCII letter ([A-Za-z]) after any 0+ chars other than newline (.*).
You could try [A-Za-z]+.
It matches when there is at least one letter. You want something more specific?
How about
^.*[a-zA-Z]+.*$ ?
Between start and end of line, accept any number of any characters, then at least one a-z/A-Z character, then again any number of any characters.

Regex check if there is a splitter character between all elements

I've got a regex that I'm trying to use to detect if a certain input is valid. The syntax of the input should be {A|B|C}. {A|B|} should fail.
(?:
(
\{{1}
(?:[A-Z0-1-_.*]+ \| [A-Z0-1-_.*]+)*
\}{1}
)
)
This is what I have so far, but I'm starting to think this isn't the way to go. Even if it did work properly, it wouldn't allow {A} which should be valid.
So basically what I'm trying to do is check if each [A-Z0-1-_.*] element is split by | and that there are no empty elements within the {} brackets.
One concept I'm really struggling with which feels relevant here is having n amount of possible elements. Like let's say, the string to validate is Foo{A}Bar{B|C}Test
The way I would check that has 2 elements. One element to check for alphabetical characters, and another element to check the bracketed characters.
So to check the string above, I would do alphaElem*|BracketElem*|alphaElem*|BracketElem*|alphaElem*
But that's a lot of writing out, and it doesn't scale if the amount of elements increases. Is there some way I can solve this with regex?
You may use
{[A-Z0-1-_.*]+(?:\|[A-Z0-1-_.*]+)*}
Note that the last * modifier can be replaced with a limiting quantifier. E.g. {0,2} to match 0, 1 or 2 times (to match 1, 2 or 3 elements inside {...}).
See the regex demo.
Details
{ - a { char
[A-Z0-1-_.*]+ - 1 or more chars defined in the character class (uppercase ASCII letters, 0, 1, -, _, . or * chars)
(?: - a non-capturing group matching 0 or more occurrences of:
\| - a | char
[A-Z0-1-_.*]+ - 1 or more chars defined in the character class
)* - end of the grouping construct
} - a } char.
Note you do not need to escape { and } chars in a .NET regex, it is "intelligent" enough to parse { as a literal { if there is no matching } with min or min,max values before.
This solution will validate everything you (seem to) want in one pass (see on
regex101):
^\w+({[A-Z0-1-_.*]+(\|[A-Z0-1-_.*]+)*}\w+)*$
It's several layers of possibly-repeating sections.
Here's the breakdown:
^ anchor matching start of text
\w+ matches any amount of "word" characters
{[A-Z0-1-_.*]+(\|[A-Z0-1-_.*]+)*} matches an element in brackets, possibly followed by any number of pipes and other elements within the brackets
({[A-Z0-1-_.*]+(\|[A-Z0-1-_.*]+)*}\w+)* this is the previously-described match, allowed to repeat zero to many times, each time with another "word"
$ anchor matching end of text

Regular expression to match following criterias [duplicate]

I am using the following regular expression without restricting any character length:
var test = /^(a-z|A-Z|0-9)*[^$%^&*;:,<>?()\""\']*$/ // Works fine
In the above when I am trying to restrict the characters length to 15 as below, it throws an error.
var test = /^(a-z|A-Z|0-9)*[^$%^&*;:,<>?()\""\']*${1,15}/ //**Uncaught SyntaxError: Invalid regular expression**
How can I make the above regular expression work with the characters limit to 15?
You cannot apply quantifiers to anchors. Instead, to restrict the length of the input string, use a lookahead anchored at the beginning:
// ECMAScript (JavaScript, C++)
^(?=.{1,15}$)[a-zA-Z0-9]*[^$%^&*;:,<>?()\"']*$
^^^^^^^^^^^
// Or, in flavors other than ECMAScript and Python
\A(?=.{1,15}\z)[a-zA-Z0-9]*[^$%^&*;:,<>?()\"']*\z
^^^^^^^^^^^^^^^
// Or, in Python
\A(?=.{1,15}\Z)[a-zA-Z0-9]*[^$%^&*;:,<>?()\"']*\Z
^^^^^^^^^^^^^^^
Also, I assume you wanted to match 0 or more letters or digits with (a-z|A-Z|0-9)*. It should look like [a-zA-Z0-9]* (i.e. use a character class here).
Why not use a limiting quantifier, like {1,15}, at the end?
Quantifiers are only applied to the subpattern to the left, be it a group or a character class, or a literal symbol. Thus, ^[a-zA-Z0-9]*[^$%^&*;:,<>?()\"']{1,15}$ will effectively restrict the length of the second character class [^$%^&*;:,<>?()\"'] to 1 to 15 characters. The ^(?:[a-zA-Z0-9]*[^$%^&*;:,<>?()\"']*){1,15}$ will "restrict" the sequence of 2 subpatterns of unlimited length (as the * (and +, too) can match unlimited number of characters) to 1 to 15 times, and we still do not restrict the length of the whole input string.
How does the lookahead restriction work?
The (?=.{1,15}$) / (?=.{1,15}\z) / (?=.{1,15}\Z) positive lookahead appears right after ^/\A (note in Ruby, \A is the only anchor that matches only start of the whole string) start-of-string anchor. It is a zero-width assertion that only returns true or false after checking if its subpattern matches the subsequent characters. So, this lookahead tries to match any 1 to 15 (due to the limiting quantifier {1,15}) characters but a newline right at the end of the string (due to the $/\z/\Z anchor). If we remove the $ / \z / \Z anchor from the lookahead, the lookahead will only require the string to contain 1 to 15 characters, but the total string length can be any.
If the input string can contain a newline sequence, you should use [\s\S] portable any-character regex construct (it will work in JS and other common regex flavors):
// ECMAScript (JavaScript, C++)
^(?=[\s\S]{1,15}$)[a-zA-Z0-9]*[^$%^&*;:,<>?()\"']*$
^^^^^^^^^^^^^^^^^
// Or, in flavors other than ECMAScript and Python
\A(?=[\s\S]{1,15}\z)[a-zA-Z0-9]*[^$%^&*;:,<>?()\"']*\z
^^^^^^^^^^^^^^^^^^
// Or, in Python
\A(?=[\s\S]{1,15}\Z)[a-zA-Z0-9]*[^$%^&*;:,<>?()\"']*\Z
^^^^^^^^^^^^^^^^^^

How can I get a regex to check that a string only contains alpha characters [a-z] or [A-Z]?

I'm trying to create a regex to verify that a given string only has alpha characters a-z or A-Z. The string can be up to 25 letters long. (I'm not sure if regex can check length of strings)
Examples:
1. "abcdef" = true;
2. "a2bdef" = false;
3. "333" = false;
4. "j" = true;
5. "aaaaaaaaaaaaaaaaaaaaaaaaaa" = false; //26 letters
Here is what I have so far... can't figure out what's wrong with it though
Regex alphaPattern = new Regex("[^a-z]|[^A-Z]");
I would think that would mean that the string could contain only upper or lower case letters from a-z, but when I match it to a string with all letters it returns false...
Also, any suggestions regarding efficiency of using regex vs. other verifying methods would be greatly appreciated.
Regex lettersOnly = new Regex("^[a-zA-Z]{1,25}$");
^ means "begin matching at start of string"
[a-zA-Z] means "match lower case and upper case letters a-z"
{1,25} means "match the previous item (the character class, see above) 1 to 25 times"
$ means "only match if cursor is at end of string"
I'm trying to create a regex to verify that a given string only has alpha
characters a-z or A-Z.
Easily done as many of the others have indicated using what are known as "character classes". Essentially, these allow us to specifiy a range of values to use for matching:
(NOTE: for simplification, I am assuming implict ^ and $ anchors which are explained later in this post)
[a-z] Match any single lower-case letter.
ex: a matches, 8 doesn't match
[A-Z] Match any single upper-case letter.
ex: A matches, a doesn't match
[0-9] Match any single digit zero to nine
ex: 8 matches, a doesn't match
[aeiou] Match only on a or e or i or o or u.
ex: o matches, z doesn't match
[a-zA-Z] Match any single lower-case OR upper-case letter.
ex: A matches, a matches, 3 doesn't match
These can, naturally, be negated as well:
[^a-z] Match anything that is NOT an lower-case letter
ex: 5 matches, A matches, a doesn't match
[^A-Z] Match anything that is NOT an upper-case letter
ex: 5 matches, A doesn't matche, a matches
[^0-9] Match anything that is NOT a number
ex: 5 doesn't match, A matches, a matches
[^Aa69] Match anything as long as it is not A or a or 6 or 9
ex: 5 matches, A doesn't match, a doesn't match, 3 matches
To see some common character classes, go to:
http://www.regular-expressions.info/reference.html
The string can be up to 25 letters long.
(I'm not sure if regex can check length of strings)
You can absolutely check "length" but not in the way you might imagine. We measure repetition, NOT length strictly speaking using {}:
a{2} Match two a's together.
ex: a doesn't match, aa matches, aca doesn't match
4{3} Match three 4's together.
ex: 4 doesn't match, 44 doesn't match, 444 matches, 4434 doesn't match
Repetition has values we can set to have lower and upper limits:
a{2,} Match on two or more a's together.
ex: a doesn't match, aa matches, aaa matches, aba doesn't match, aaaaaaaaa matches
a{2,5} Match on two to five a's together.
ex: a doesn't match, aa matches, aaa matches, aba doesn't match, aaaaaaaaa doesn't match
Repetition extends to character classes, so:
[a-z]{5} Match any five lower-case characters together.
ex: bubba matches, Bubba doesn't match, BUBBA doesn't match, asdjo matches
[A-Z]{2,5} Match two to five upper-case characters together.
ex: bubba doesn't match, Bubba doesn't match, BUBBA matches, BUBBETTE doesn't match
[0-9]{4,8} Match four to eight numbers together.
ex: bubba doesn't match, 15835 matches, 44 doesn't match, 3456876353456 doesn't match
[a3g]{2} Match an a OR 3 OR g if they show up twice together.
ex: aa matches, ba doesn't match, 33 matches, 38 doesn't match, a3 DOESN'T match
Now let's look at your regex:
[^a-z]|[^A-Z]
Translation: Match anything as long as it is NOT a lowercase letter OR an upper-case letter.
To fix it so it meets your needs, we would rewrite it like this:
Step 1: Remove the negation
[a-z]|[A-Z]
Translation: Find any lowercase letter OR uppercase letter.
Step 2: While not stricly needed, let's clean up the OR logic a bit
[a-zA-Z]
Translation: Find any lowercase letter OR uppercase letter. Same as above but now using only a single set of [].
Step 3: Now let's indicate "length"
[a-zA-Z]{1,25}
Translation: Find any lowercase letter OR uppercase letter repeated one to twenty-five times.
This is where things get funky. You might think you were done here and you may well be depending on the technology you are using.
Strictly speaking the regex [a-zA-Z]{1,25} will match one to twenty-five upper or lower-case letters ANYWHERE on a line:
[a-zA-Z]{1,25}
a matches, aZgD matches, BUBBA matches, 243242hello242552 MATCHES
In fact, every example I have given so far will do the same. If that is what you want then you are in good shape but based on your question, I'm guessing you ONLY want one to twenty-five upper or lower-case letters on the entire line. For that we turn to anchors. Anchors allow us to specify those pesky details:
^ beginning of a line
(I know, we just used this for negation earlier, don't get me started)
$ end of a line
We can use them like this:
^a{3} From the beginning of the line match a three times together
ex: aaa matches, 123aaa doesn't match, aaa123 matches
a{3}$ Match a three times together at the end of a line
ex: aaa matches, 123aaa matches, aaa123 doesn't match
^a{3}$ Match a three times together for the ENTIRE line
ex: aaa matches, 123aaa doesn't match, aaa123 doesn't match
Notice that aaa matches in all cases because it has three a's at the beginning and end of the line technically speaking.
So the final, technically correct solution, for finding a "word" that is "up to five characters long" on a line would be:
^[a-zA-Z]{1,25}$
The funky part is that some technologies implicitly put anchors in the regex for you and some don't. You just have to test your regex or read the docs to see if you have implicit anchors.
/// <summary>
/// Checks if string contains only letters a-z and A-Z and should not be more than 25 characters in length
/// </summary>
/// <param name="value">String to be matched</param>
/// <returns>True if matches, false otherwise</returns>
public static bool IsValidString(string value)
{
string pattern = #"^[a-zA-Z]{1,25}$";
return Regex.IsMatch(value, pattern);
}
The string can be up to 25 letters long.
(I'm not sure if regex can check length of strings)
Regexes ceartanly can check length of a string - as can be seen from the answers posted by others.
However, when you are validating a user input (say, a username), I would advise doing that check separately.
The problem is, that regex can only tell you if a string matched it or not. It won't tell why it didn't match. Was the text too long or did it contain unallowed characters - you can't tell. It's far from friendly, when a program says: "The supplied username contained invalid characters or was too long". Instead you should provide separate error messages for different situations.
The regular expression you are using is an alternation of [^a-z] and [^A-Z]. And the expressions [^…] mean to match any character other than those described in the character set.
So overall your expression means to match either any single character other than a-z or other than A-Z.
But you rather need a regular expression that matches a-zA-Z only:
[a-zA-Z]
And to specify the length of that, anchor the expression with the start (^) and end ($) of the string and describe the length with the {n,m} quantifier, meaning at least n but not more than m repetitions:
^[a-zA-Z]{0,25}$
Do I understand correctly that it can only contain either uppercase or lowercase letters?
new Regex("^([a-z]{1,25}|[A-Z]{1,25})$")
A regular expression seems to be the right thing to use for this case.
By the way, the caret ("^") at the first place inside a character class means "not", so your "[^a-z]|[^A-Z]" would mean "not any lowercase letter, or not any uppercase letter" (disregarding that a-z are not all letters).

Categories

Resources