Regular expression that stops at first letter encountered - c#

I want my regex expression to stop matching numbers of length between 2 and 10 after it encounters a letter.
So far I've come up with (\d{2,10})(?![a-zA-Z]) this. But it continues to match even after letters are encountered.
2216101225 /ROC/PL FCT DIN 24.03.2022 PL ERBICIDE' - this is the text I've been testing the regex on, but it matches 24 03 and 2022 also.
This is tested and intended for C#.
Can you help ? Thanks

Another option is to anchor the pattern and to match any character except chars a-zA-Z or a newline, and then capture the 2-10 digits in a capture group.
Then get the capture group 1 value from the match.
^[^A-Za-z\r\n]*\b([0-9]{2,10})\b
Explanation
^ Start of string
[^A-Za-z\r\n]* Optionally match chars other than a-zA-Z or a newline
\b([0-9]{2,10})\b Capture 2-10 digits between word boundaries in group 1
See a regex demo.
Note that in .NET \d matches all numbers except for only 0-9.

You can use the following .NET regex
(?<=^\P{L}*)(?<!\d)\d{2,10}(?!\d)
(?<=^[^a-zA-Z]*)(?<!\d)\d{2,10}(?!\d)
See the regex demo. Details:
(?<=^\P{L}*) - there must be no letters from the current position till the start of string ((?<=^[^a-zA-Z]*) only supports ASCII letters)
(?<!\d) - no digit immediately on the left is allowed.
\d{2,10} - two to ten digits
(?!\d) - no digit immediately on the right is allowed.

Related

Regex to match 7 same digits in a number regardless of position

I want to match an 8 digit number. Currently, I have the following regex but It is failing in some cases.
(\d+)\1{6}
It matches only when a number is different at the end such as 44444445 or 54444444. However, I am looking to match cases where at least 7 digits are the same regardless of their position.
It is failing in cases like
44454444
44544444
44444544
What modification is needed here?
It's probably a bad idea to use this in a performance-sensitive location, but you can use a capture reference to achieve this.
The Regex you need is as follows:
(\d)(?:.*?\1){6}
Breaking it down:
(\d) Capture group of any single digit
.*? means match any character, zero or more times, lazily
\1 means match the first capture group
We enclose that in a non-capturing group {?:
And add a quantifier {6} to match six times
You can sort the digits before matching
string input = "44444445 54444444 44454444 44544444 44444544";
string[] numbers = input.Split(' ');
foreach (var number in numbers)
{
number = String.Concat(str.OrderBy(c => c));
if (Regex.IsMatch(number, #"(\d+)\1{6}"))
// do something
}
Still not a good idea to use regex for this though
The pattern that you tried (\d+)\1{6} matches 6 of the same digits in a row. If you want to stretch the match over multiple same digits, you have to match optional digits in between.
Note that in .NET \d matches more digits than 0-9 only.
If you want to match only digits 0-9 using C# without matching other characters in between the digits:
([0-9])(?:[0-9]*?\1){6}
The pattern matches:
([0-9]) Capture group 1
(?: Non capture group
[0-9]*?\1 Match optional digits 0-9 and a backreference to group 1
){6} Close non capture group and repeat 6 times
See a .NET Regex demo
If you want to match only 8 digits, you can use a positive lookahead (?= to assert 8 digits and word boundaries \b
\b(?=\d{8}\b)[0-9]*([0-9])(?:[0-9]*?\1){6}\d*\b
See another .NET Regex demo

C# regex string that is not another string

I want to match an at least 3 letter word, preceded by any character from class [-_ :] any amount of times, that is not this specific 3 letter word string2.
Ex:
if string2="VER"
in
" ODO VER7"
matched " ODO"
or
"_::ATTPQ VER7"
matched "_::ATTPQ"
but if
" VER7"
it shoudn't match " VER"
so I thought about
Regex.Match(inputString, #"[-_:]*[A-Z]{3,}[^(VER)]", RegexOptions.IgnoreCase);
where
[-_:]* checks for any character in class, appearing 0 or more times
[A-Z] the range of letters that could form the word
{3,} the minimum amount of letters to form the word
[^(VER)] the grouping construct that shouldn't appear
I believe however that [A-Z]{3,} results in any letter at least 3 times (not what i want)
and [^(VER)] not sure what it's doing
Using [^(VER)] means a negated character class where you would match any character except ( ) V E or R
For you example data, you could match 0+ spaces or tabs (or use \s to also match a newline).
Then use a negative lookahead before matching 3 or more times A-Z to assert what is on the right is not VER.
If that is the case, match 3 or more times A-Z followed by a space and VER itself.
^[ \t]*[-_:]*(?!VER)[A-Z]{3,} VER
Regex demo
^\s*[-_:]*(?!VER)[A-Z]{3,}
This regex asserts that between the start and end of the string, there's zero or more of your characters, followed by at least 3 letters. It uses a negative lookahead to make sure that VER (or whatever you want) is not present.
Demo
This would match the preceding class characters [-_ :] of 3 or more letters/numbers
that do not start with VER (as in the samples given) :
[-_ :]+(?!VER)[^\W_]{3,}
https://regex101.com/r/wLw23I/1

Regular expression to match 1-5 symbols when start symbol letter and at least one number

I tried this expression:
/([a-z]+[0-9]+[a-z]*){1,5}$/
but it's works for every word that start with letter and contains at least one number and more then two symbol for example "re1111e" when its not supposed to, what am I doing wrong?
One possible way to write your regex uses a positive lookahead to check for a number:
/(?=[^0-9]*[0-9])[a-z][a-z0-9]{0,4}/
This pattern says to:
(?=[^0-9]*[0-9]) assert that a single digit appears somewhere
[a-z] match an initial letter character
[a-z0-9]{0,4} then match zero to four letter or number characters
In your pattern, the quantifier {1,5} apllies to the group repeating this match [a-z]+[0-9]+[a-z]* 1 - 5 times.
If I am not mistaken, you want to match [a-z] from the start followed by 4 chars from which one of them is at least 1 digit so the minimum amount of characters is 2 and the maximum is 5.
You might use:
^(?=.{2,5}$)[a-z][a-z0-9]*[0-9][a-z0-9]*$
About the pattern
^ Start of string
(?=.{2,5}$ Assert string length 2 - 5 characters
[a-z] Match a-z
[a-z0-9]* Repeat 0+ times matching a-z 0-9
[0-9] Match a digit
[a-z0-9]* Repeat 0+ times matching a-z 0-9
$ End of string
Regex demo

Regex match certain amount of character and allow space inbetween

I am currently working on a regex which needs to match exactly 8 digits. But sometimes it occurs that there are spaces or dots between those numbers. This is the regex that i am currently using.
([0-9\ ?.?]{7,16})
It works fine most of the time, but the problem I am having is that it sometimes matches number with a lot of spaces tailing it so you will get something like 1234/s/s/s/s (/s stands for space) Or sometimes it is only matching spaces.
What i want is a regex that always matches at least 8 digits and also allows spaces and dots without detecting less then 8 digits. I know it may be stupid question, but I couldn't find anything I need elswhere.
Your ([0-9\ ?.?]{7,16}) expression matches 7 to 16 occurrences of any character that is either a digit, or a space, or a ?, or .. Yes, the ? inside [...] is a literal ?, not a quantifier.
You need to use an expression that will match a digit ([0-9]) and then exactly 7 sequences of a space or period ([ .]) followed with 1 digit, and to make sure you are not matching the digits in 123.156.78.146 you may use special boundaries:
(?<!\d[ .]?)\d(?:[. ]?\d){7}(?![ .]?\d)
if the space or . can only be 0 to 1 in between digits; or - if the space/dot can appear 0 or more times,
(?<!\d[ .]*)\d(?:[. ]*\d){7}(?![ .]*\d)
See the regex demo
The (?<!\d[ .]*) is a negative lookbehind that will fail any match if it starts with a digit that is followed with .(s) or space(s), and the (?![ .]*\d) negative lookahead will fail the match if the 7 digits you need are followed with .(s) or space(s) and a digit.
To solve this, describe the problem to yourself. You want to match one digit followed by seven repetitions of space-or-dot followed by a digit. This leads to a regular expression like \d([ .]?\d){7}. To avoid collecting the seven captures add a :? after the (. To capture the whole string, enclose it in brackets. Adding both changes gives the expression (\d(:?[ .]?\d){7}). If more than one space or dot is allowed between the digits then change the ? to a *.
To get just the eight digits out of the string I suggest using the string captured above and replacing any spaces or dots with nothing.

Using Regular Expression to find exact length match multiple times

I need a regular expression to find groups of exactly 8 numbers in a row. The closest I have gotten is:
[0-9]{8}
but it's not exactly what I need. If I had a number that was 9 long it will match the first 8 but I want it to ignore it if it's longer or shorter than 8.
Here are some examples
1234567890 <- no match, it's longer than 8
12345678 <- match: "12345678"
1234567809876543 <- match 1: "12345678", match 2: "09876543" (two groups of 8)
,,111-11-1234,12345678, <- match: "12345678"
To summarize, for every group of exactly 8 numbers make a match.
I'm working with some results of OCR (Optical Character Recognition) and I have to work with the shortcomings of the results so my input can be varied as in the examples above.
Here is some use case data: http://pastebin.com/uijF9K9n
You can use the following regex in .NET:
(?<=^|\D|(?:\d{8})+)\d{8}(?=$|\D|(?:\d{8})+)
See regex demo
It is based on variable-width lookbehind and a lookahead.
Regex breakdown:
(?<=^|\D|(?:\d{8})+) - only if at the string start (^), or preceded with not a digit (\D) or 1 or more sequences of 8 digits ((?:\d{8})+)...
\d{8} - match 8 digits that are followed by...
(?=$|\D|(?:\d{8})+) - either end of string ($) or not a digit (\D) or 1 or more sequences of 8 digits ((?:\d{8})+).
IMPORTANT:
If I got a downvote for the "extra" complexity compared with another answer, note our solutions are different: my regex matches 8-digit number in ID12345678, and the other one does not due to the word boundary.
You can also try this regex
(?:\b|\G)\d{8}(?=(?:\d{8})*\b)
(?:\b|\G) \b match a word boundary | or \G continue where last match attempt ended
\d{8} matches 8 digits [0-9] followed by a lookahead (?=... to check
(?:\d{8})*\b if followed by any amount of {8 digits} until another word boundary
It will match {8 digits} or out of a sequence of such if between two word boundaries.
See demo at regexstorm
\b[0-9]{8}\b this will give you what you want
For more details check this out
http://www.rexegg.com/regex-boundaries.html

Categories

Resources