Tricky Regular Expression - c#

I need to allow only alphanumeric characters (with uppercase) from 0-25 chars length and no lazy all-repetition numeric value.
I've got the first part: Regex.IsMatch(tmpResult, "^[0-9A-Z]{0,25}$"); (that's easy)
111112 - match
AABD333434 - match
55555555 - no match
555 - no match
Could anyone please help me with this?

^(?!(.)\1*$)[0-9A-Z]{0,25}$
The extra (?!(.)\1*$) will reject any strings that is composed of repeating same character.
The (?!…) is a negative lookahead that will cause the primary regex fail if the … is matched, and the (.)\1* will match a string of repeating characters.

You could just do it using a normal method... Once you have it match your first expression there, just use a subroutine to iterate through each character and return true the first time you encounter a character that differs from the first in the string.
It should return true after checking only the first 2 characters for most strings, unless it's an invalid string.
This should be equally as fast as a regex if not faster, if it is well implemented.

Related

Is there a regular expression for matching a string that has no more than 2 repeating characters? [duplicate]

I want to match strings that do not contain more than 3 of the same character repeated in a row. So:
abaaaa [no match]
abawdasd [match]
abbbbasda [no match]
bbabbabba [match]
Yes, it would be much easier and neater to do a regex match for containing the consecutive characters, and then negate that in the code afterwards. However, in this case that is not possible.
I would like to open out the question to x consecutive characters so that it can be extended to the general case to make the question and answer more useful.
Negative lookahead is supported in this case.
Use a negative lookahead with back references:
^(?:(.)(?!\1\1))*$
See live demo using your examples.
(.) captures each character in group 1 and the negative look ahead asserts that the next 2 chars are not repeats of the captured character.
To match strings not containing a character repeated more than 3 times consecutively:
^((.)\2?(?!\2\2))+$
How it works:
^ Start of string
(
(.) Match any character (not a new line) and store it for back reference.
\2? Optionally match one more exact copies of that character.
(?! Make sure the upcoming character(s) is/are not the same character.
\2\2 Repeat '\2' for as many times as you need
)
)+ Do ad nauseam
$ End of string
So, the number of /2 in your whole expression will be the number of times you allow a character to be repeated consecutively, any more and you won't get a match.
E.g.
^((.)\2?(?!\2\2\2))+$ will match all strings that don't repeat a character more than 4 times in a row.
^((.)\2?(?!\2\2\2\2))+$ will match all strings that don't repeat a character more than 5 times in a row.
Please be aware this solution uses negative lookahead, but not all not all regex flavors support it.
I'm answering this question :
Is there a regular expression for matching a string that has no more than 2 repeating characters?
which was marked as an exact duplicate of this question.
Its much quicker to negate the match instead
if (!Regex.Match("hello world", #"(.)\1{2}").Success) Console.WriteLine("No dups");

Regular expression for specific combination of alphabets and numbers

I am trying to create regular expression for following type of strings:
combination of the prefix (XI/ YV/ XD/ YQ/ XZ), numerical digits only, and either no ‘Z’ or a ‘Z’ suffix.
For example, XD35Z should pass but XD01HW should not pass.
So far I tried following:
#"XD\d+Z?" - XD35Z passes but unfortunately it also works for XD01HW
#"XD\d+$Z" - XD01HW fails which is what I want but XD35Z also fails
I have also tried #"XD\d{1,}Z"? but it did not work
I need a single regex which will give me appropriate results for both types of strings.
Try this regex:
^(XI|YV|XD|YQ|XZ){1}\d+Z{0,1}$
I'm using quantifying braces to explicitly limit the allowed numbers of each character/group. And the ^ and $ anchors make sure that the regex matches only the whole line (string).
Broken into logical pieces this regex checks
^(XI|YV|XD|YQ|XZ){1} Starts with exactly one of the allowed prefixes
\d+ Is follow by one or more digits
Z{0,1}$ Ends with between 0 and 1 Z
You're misusing the $ which represents the end of the string in the Regex
It should be : #"^XD\d+Z?$" (notice that it appears at the end of the Regex, after the Z?)
The regex following the behaviour you want is:
^(XI|YV|XD|YQ|XZ)\d+Z?$
Explanation:
combination of the prefix (XI/ YV/ XD/ YQ/ XZ)
^(XI|YV|XD|YQ|XZ)
numerical digits only
\d+
‘Z’ or a ‘Z’ suffix
Z?$

Matching a number preceeded by a know string, followed by an unknown number of characters

[SOME_WORDS:200:1000]
Trying to match just the last 1000 part. Both numbers are variable and can contain an unknown number of characters (although they are expected to contain digits, I cannot rule out that they may also contain other characters). The SOME_WORDS part is known and does not change.
So I begin by doing a positive lookbehind for [SOME_WORDS: followed by a positive lookahead for the trailing ]
That gives us the pattern (?<=\[SOME_WORDS:).*(?=])
And captures the part 200:1000
Now because I don't know how many characters are after SOME_WORDS:, but I know that it ends with another : I use .*: to indicate any character any amount of time followed by :
That gives us the pattern (?<=\[SOME_WORDS:.*:).*(?=])
However at this point the pattern no longer matches anything and this is where I become confused. What am I doing wrong here?
If I assume that the first number will always be 3 characters long I can replace .* with ... to get the pattern (?<=\[SOME_WORDS:...:).*(?=]) and this correctly captures just the 1000 part. However I don't understand why replacing ... with .* makes the pattern not capture anything.
EDIT:
It seems like the online tool I was using to test the regex pattern wasn't working correctly. The pattern (?<=\[SOME_WORDS:.*:).*(?=]) matches the 1000 with no issues when actually done in .net
You usually cannot use a + or a * in a lookbehind, only in a lookahead.
If c# does allow these than you could use a .*? instead of a .* as the .* will eat the second :
Try this:
(?<=\[SOME_WORDS:)(?=\d+:(\d+)])
The match wil be in the first capture group
Quote from http://www.regular-expressions.info/lookaround.html
The bad news is that most regex flavors do not allow you to use just any regex inside a lookbehind, because they cannot apply a regular expression backwards. The regular expression engine needs to be able to figure out how many characters to step back before checking the lookbehind. When evaluating the lookbehind, the regex engine determines the length of the regex inside the lookbehind, steps back that many characters in the subject string, and then applies the regex inside the lookbehind from left to right just as it would with a normal regex.
As Robert Smit mentions this is due to the * being a greedy operator. Greedy operators consume as many characters as they possibly can when they are matched first. They only give up characters if the match fails. If you make the greedy operator lazy(*?), then matching consumes as little number of characters as possible for the match to succeed, so the : is not consumed by *. You can also use [^:]* which is match any character other than :.

Assist me on building my own regex

I'm completely new in this area, I need a regex that follows these rules:
Only numbers and symbols are allowed.
Must start with a number and ends with a number.
Must not contain more than 1 symbol in a row. (for example 123+-4567 is not accepted but 12+345-67 is accepted.
I tried ^[0-9]*[+-*/][0-9]*$ but I think it's a stupid try.
You were close with your attempt. This one should work.
^[0-9]+([+*/-][0-9]+)*$
explanation:
^ matches beginning of the string
[0-9]+ matches 1 or more digits.
[+*/-] matches one from specified symbols
([+*/-][0-9]+)* matches group of symbol followed by at least one digit, repeated 0 or more times
$ matches end of string
We'll build that one from individual parts and then we'll see how we can be smarter about that:
Numbers
\d+
will match an integer. Not terribly fancy, but we need to start somewhere.
Must start with a number and end with a number:
^\d+.*\d+$
Pretty straightforward. We don't know anything about the part in between, though (also the last \d+ will only match a single digit; we might want to fix that eventually).
Only numbers and symbols are allowed. Depending on the complexity of the rest of the regex this might be easier by explicitly spelling it out or using a negative lookahead to make sure there is no non-(number|symbol) somewhere in the string. I'll go for the latter here because we need that again:
(?!.*[^\d+*/-])
Sticking this to the start of the regex makes sure that the regex won't match if there is any non-(number|symbol) character anywhere in the string. Also note that I put the - at the end of the character class. This is because it has a certain special meaning when used between two other characters in a character class.
Must not contain more than one symbol in a row. This is a variation on the one before. We just make sure that there never is more than one symbol by using a negative lookahead to disallow two in sequence:
(?!.*[+/*-]{2})
Putting it all together:
(?!.*[^\d+*/-])(?!.*[+/*-]{2})^\d+.*\d+$
Testing it:
PS Home:\> '123+-4567' -match '(?!.*[^\d+*/-])(?!.*[+/*-]{2})^\d+.*\d+$'
False
PS Home:\> '123-4567' -match '(?!.*[^\d+*/-])(?!.*[+/*-]{2})^\d+.*\d+$'
True
However, I only literally interpreted your rules. If you're trying to match arithmetic expressions that can have several operands and operators in sequence (but without parentheses), then you can approach that problem differently:
Numbers again
\d+
Operators
[+/*-]
A number followed by an operator
\d+[+/*-]
Using grouping and repetition to match a number followed by any number of repetitions of an operator and another number:
\d+([+/*-]\d+)*
Anchoring it so we match the whole string:
^\d+([+/*-]\d+)*$
Generally, for problems where it works, this latter approach works better and leads to more understandable expressions. The former approach has its merits, but most often only in implementing password policies (apart from »cannot repeat any of your previous 30689 passwords«).

.NET Regular expression which check length and non-alphanumeric characters

I need Regexp to validate string has minimum length 6 and it is contains at least one non-alphanumeric character e.g: "eN%{S$u)", "h9YI!>4j", "{9YI!;4j", "eN%{S$usdf)", "dfh9YI!>4j", "ghffg{9YI!;4j".
This one is working well ^.*(?=.{6,})(?=.*\\d).*$" but in cases when string does not contain any numbers(e.g "eN%{S$u)") it is not working.
^(?=.{6})(.*[^0-9a-zA-Z].*)$
We use positive lookahead to assure there are at least 6 characters. Then we match the pattern that looks for at least one non-alphanumeric character ([^0-9a-zA-Z]). The .*'s match any number of any characters around this one non-alphanumeric character, but by the time we've reached here we've already checked that we're matching at least 6.
^.*(?=.{6,})(?=.*\\d).*$"
is the regex you tried. Here are some suggestions:
You don't need to match more than 6 characters in the lookahead. Matching only 6 here does no restrict the rest of the regular expression from matching more than 6.
\d matches a digit, and (?=.*\\d) is a lookahead for one of them. This is why you are experiencing the problems you mentioned with strings like eN%{S$u).
Even if the point above wasn't incorrect and the regular expression here was correct, you can combine the second lookahead with the .* that follows by just using .*\\d.*.
marcog's answer is pretty good, but I'd do it the other way around so that it's easier to add even more conditions (such as having at least one digit or whatever), and I'd use lazy quantifiers because they are cheaper for certain patterns:
^(?=.*?[^0-9a-zA-Z]).{6}
So if you were to add the mentioned additional condition, it would be like this:
^(?=.*?[^0-9a-zA-Z])(?=.*?[0-9]).{6}
As you can see, this pattern is easily extensible. Note that is is designed to be used for checking matches only, its capture is not useful.
Keep it easy.
// long enough and contains something not digit or a-z
x.Length >= 6 && Regex.IsMatch(x, #"[^\da-zA-Z]")
Happy coding.
Edit, pure "regular expression":
This first asserts there are 6 letters of anything in the look-ahead, and then ensures that within the look-ahead there is something that is not alpha-numeric (it will "throw away" up to the first 5 characters trying to match).
(?=.{6}).{0,5}[^\da-zA-Z]
What about that(fixed): ^(?=.{6})(.*[^\w].*)$
Check this out http://www.ultrapico.com/Expresso.htm it is cool tool which could help you a lot in Regexps learning.

Categories

Resources