Regex failing in max length - c#

I want regex which will allow following format
1234567-8
123456B
Now here if user enter second pattern then he should be lock to enter maximum 7 characters so
1234568B
123456V1
this becomes invalid
I have tried
[0-9]{7}-[0-9]|[[0-9]{6}[A-z]{1}]{7,7}
but this fails

For the sample input you provided, you may use ^([0-9]{7}-[0-9]|[0-9]{6}[A-Za-z])$.
A bit more contracted version: ^[0-9]{6}(?:[0-9]-[0-9]|[A-Za-z])$.
Note that 1234567-8 has 7 digits and a hyphen followed with a digit, so the whole string length cannot be limited to just 7 characters all in all.
In .NET and almost all other regex flavors [A-z] is a mistake, as it can match more than just letters.
Placing a quantifier {1} into a character class makes it a simple symbol combination, so [{1}] matches either { or 1 or }.
The {7,7} (={7}) will not limit the whole string length to 7, as you do not have anchors (^ and $) around the expression and you "ruined" the preceding quantifiers by putting them into a character class.

I think this is what you need:
^(\d{7}-\d|\d{6}[A-Z])$
7 digits, dash, digit OR 6 digits, 1 large latin letter.

^\d{6}(?:\d-\d|[A-Z])$
It can satisfy well with 2 your above formats
1234567-8
123456B

Related

Regularexpression for duplicate pattern

I am trying to write a regex to handle these cases
contains only alphanumeric with minimum of 2 alpha characters(numbers are optional).
only special character allowed is hyphen.
cannot be all same letter ignoring hyphen.
cannot be all hyphens
cannot be all numeric
My regex: (?=[^A-Za-z]*[A-Za-z]){2}^[\w-]{6,40}$
Above regex works for most of the scenarios except 1) & 3).
Can anyone suggest me to fix this. I am stuck in this.
Regards,
Sajesh
Rule 1 eliminates rule 4 and 5: It can neither contain only hyphens, nor only digits.
/^(?=[a-z\d-]{6,40}$)[\d-]*([a-z]).*?(?!\1)[a-z].*$/i
(?=[a-z\d-]{6,40}$) look ahead for specified characters from 6 to 40
([a-z]).*?(?!\1)[a-z] checks for two letters and at least one different
See this demo at regex101
This pattern with i flag considers A and a as the "same" letter (caseless matching) and will require another alpbhabet. For case sensitive matching here another demo at regex101.
You can use
^(?!\d+$)(?!-+$)(?=(?:[\d-]*[A-Za-z]){2})(?![\d-]*([A-Za-z])(?:[\d-]*\1)+[\d-]*$)[A-Za-z\d-]{6,40}$
See the regex demo. If you use it in C# or PHP, consider replacing ^ with \A and $ with \z to make sure you match the entire string even in case there is a trailing newline.
Details:
^ - start of string
(?!\d+$) - fail the match if the string only consists of digits
(?!-+$) - fail the match if the string only consists of hyphens
(?=(?:[\d-]*[A-Za-z]){2}) - there must be at least two ASCII letters after any zero or more digits or hyphens
(?![\d-]*([A-Za-z])(?:[\d-]*\1)+[\d-]*$) - fail the match if the string contains two or more identical letters (the + after (?:[\d-]*\1) means there can be any one letter)
[A-Za-z\d-]{6,40} - six to forty alphanumeric or hyphen chars
$ - end of string. (\z might be preferable.)

Regular expression that stops at first letter encountered

I want my regex expression to stop matching numbers of length between 2 and 10 after it encounters a letter.
So far I've come up with (\d{2,10})(?![a-zA-Z]) this. But it continues to match even after letters are encountered.
2216101225 /ROC/PL FCT DIN 24.03.2022 PL ERBICIDE' - this is the text I've been testing the regex on, but it matches 24 03 and 2022 also.
This is tested and intended for C#.
Can you help ? Thanks
Another option is to anchor the pattern and to match any character except chars a-zA-Z or a newline, and then capture the 2-10 digits in a capture group.
Then get the capture group 1 value from the match.
^[^A-Za-z\r\n]*\b([0-9]{2,10})\b
Explanation
^ Start of string
[^A-Za-z\r\n]* Optionally match chars other than a-zA-Z or a newline
\b([0-9]{2,10})\b Capture 2-10 digits between word boundaries in group 1
See a regex demo.
Note that in .NET \d matches all numbers except for only 0-9.
You can use the following .NET regex
(?<=^\P{L}*)(?<!\d)\d{2,10}(?!\d)
(?<=^[^a-zA-Z]*)(?<!\d)\d{2,10}(?!\d)
See the regex demo. Details:
(?<=^\P{L}*) - there must be no letters from the current position till the start of string ((?<=^[^a-zA-Z]*) only supports ASCII letters)
(?<!\d) - no digit immediately on the left is allowed.
\d{2,10} - two to ten digits
(?!\d) - no digit immediately on the right is allowed.

Regex match certain amount of character and allow space inbetween

I am currently working on a regex which needs to match exactly 8 digits. But sometimes it occurs that there are spaces or dots between those numbers. This is the regex that i am currently using.
([0-9\ ?.?]{7,16})
It works fine most of the time, but the problem I am having is that it sometimes matches number with a lot of spaces tailing it so you will get something like 1234/s/s/s/s (/s stands for space) Or sometimes it is only matching spaces.
What i want is a regex that always matches at least 8 digits and also allows spaces and dots without detecting less then 8 digits. I know it may be stupid question, but I couldn't find anything I need elswhere.
Your ([0-9\ ?.?]{7,16}) expression matches 7 to 16 occurrences of any character that is either a digit, or a space, or a ?, or .. Yes, the ? inside [...] is a literal ?, not a quantifier.
You need to use an expression that will match a digit ([0-9]) and then exactly 7 sequences of a space or period ([ .]) followed with 1 digit, and to make sure you are not matching the digits in 123.156.78.146 you may use special boundaries:
(?<!\d[ .]?)\d(?:[. ]?\d){7}(?![ .]?\d)
if the space or . can only be 0 to 1 in between digits; or - if the space/dot can appear 0 or more times,
(?<!\d[ .]*)\d(?:[. ]*\d){7}(?![ .]*\d)
See the regex demo
The (?<!\d[ .]*) is a negative lookbehind that will fail any match if it starts with a digit that is followed with .(s) or space(s), and the (?![ .]*\d) negative lookahead will fail the match if the 7 digits you need are followed with .(s) or space(s) and a digit.
To solve this, describe the problem to yourself. You want to match one digit followed by seven repetitions of space-or-dot followed by a digit. This leads to a regular expression like \d([ .]?\d){7}. To avoid collecting the seven captures add a :? after the (. To capture the whole string, enclose it in brackets. Adding both changes gives the expression (\d(:?[ .]?\d){7}). If more than one space or dot is allowed between the digits then change the ? to a *.
To get just the eight digits out of the string I suggest using the string captured above and replacing any spaces or dots with nothing.

Using Regular Expression to find exact length match multiple times

I need a regular expression to find groups of exactly 8 numbers in a row. The closest I have gotten is:
[0-9]{8}
but it's not exactly what I need. If I had a number that was 9 long it will match the first 8 but I want it to ignore it if it's longer or shorter than 8.
Here are some examples
1234567890 <- no match, it's longer than 8
12345678 <- match: "12345678"
1234567809876543 <- match 1: "12345678", match 2: "09876543" (two groups of 8)
,,111-11-1234,12345678, <- match: "12345678"
To summarize, for every group of exactly 8 numbers make a match.
I'm working with some results of OCR (Optical Character Recognition) and I have to work with the shortcomings of the results so my input can be varied as in the examples above.
Here is some use case data: http://pastebin.com/uijF9K9n
You can use the following regex in .NET:
(?<=^|\D|(?:\d{8})+)\d{8}(?=$|\D|(?:\d{8})+)
See regex demo
It is based on variable-width lookbehind and a lookahead.
Regex breakdown:
(?<=^|\D|(?:\d{8})+) - only if at the string start (^), or preceded with not a digit (\D) or 1 or more sequences of 8 digits ((?:\d{8})+)...
\d{8} - match 8 digits that are followed by...
(?=$|\D|(?:\d{8})+) - either end of string ($) or not a digit (\D) or 1 or more sequences of 8 digits ((?:\d{8})+).
IMPORTANT:
If I got a downvote for the "extra" complexity compared with another answer, note our solutions are different: my regex matches 8-digit number in ID12345678, and the other one does not due to the word boundary.
You can also try this regex
(?:\b|\G)\d{8}(?=(?:\d{8})*\b)
(?:\b|\G) \b match a word boundary | or \G continue where last match attempt ended
\d{8} matches 8 digits [0-9] followed by a lookahead (?=... to check
(?:\d{8})*\b if followed by any amount of {8 digits} until another word boundary
It will match {8 digits} or out of a sequence of such if between two word boundaries.
See demo at regexstorm
\b[0-9]{8}\b this will give you what you want
For more details check this out
http://www.rexegg.com/regex-boundaries.html

alphanumeric with at least 1 character

Ok so I finally figured out that I need the following:
So the regular expression has to perform:
alphanumeric
at least 1 letter
must have between 4 and 8 characters (either a letter/digit).
i.e. an alphanumeric string, that must have at least 1 letter, and be between 4-8 in length. (so it can't be just all numbers or just all letters).
can all of this be done in a single regex?
I'm assuming you mean alphaunumeric, at least one letter, and 4 to 8 characters long.
Try this:
(?=.*[a-zA-Z])[a-zA-Z0-9]{4,8}
(?= - we're using a lookahead, so we can check for something without affecting the rest of the match
.*[a-zA-Z] - match for anything followed by a letter, i.e. check we have at least one letter
[a-zA-Z0-9]{4,8} - This will match a letter or a number 4 to 8 times.
However, you say the intention is for "it can't be just all numbers or just all letters", but requirements 1, 2 and 3 don't accomplish this since it's can be all letters and meet all three requirements. It's possible you want this, with an extra lookahead to confirm there's at least one digit:
(?=.*[a-zA-Z])(?=.*[0-9])[a-zA-Z0-9]{4,8}
The use of a-zA-Z isn't very international friendly, so you may be better off using an escape code for "letter" if available in your flavour of Regular Expressions.
Also, I hope this isn't matching for acceptable passwords, as 4 characters probably isn't long enough.
number 2 and 3 seem to contradict. The following will match alphanumeric between 4 and 8:
/[0-9a-zA-Z]{4,8}/
?Regex.IsMatch("sdf", "(?=.+[a-zA-Z])[a-zA-Z0-9]{4,8}")
false
?Regex.IsMatch("sdfd", "(?=.+[a-zA-Z])[a-zA-Z0-9]{4,8}")
true
?Regex.IsMatch("1234", "(?=.+[a-zA-Z])[a-zA-Z0-9]{4,8}")
false
Warning on **.* and .+:
// At least one letter does not match with .*
?Regex.IsMatch("1111", "(?=.*[a-zA-Z])[a-zA-Z0-9]{4,8}")
false
?Regex.IsMatch("1aaa", "(?=.+[a-zA-Z])[a-zA-Z0-9]{4,8}")
true
[a-zA-Z0-9]{4,8}
The first part specifies alphanumeric, and the 2nd part specifies from 4 to 8 characters.
Try: [a-zA-Z0-9]{4,8}

Categories

Resources