Regex with mixed character set and different counts for each

Regex with mixed character set and different counts for each - c#

I'm trying to find a number with a fixed length with a regex. The Problem is, that in some cases the number is split into portions devided by spaces or dashes. Examples are:
123456789
123 456 789
12 34567 89
123-456-789
12-345678-9
I think you get what I mean. The Regex I'm currently using would only get the first number:
(?<=^|\D)([0-9]{9})(?=$|\D)
When I add spaces and dashes to my character list like this:
(?<=^|\D)([0-9 -]{9})(?=$|\D)
I still don't get the desired results, as the "numbers" containing them are longer than 9 characters. If I take more characters I would end up with a lot of false results.
What I would need is a way to tell the regex to take numbers, spaces and dashes but with the following restrictions:
The number can only be 9 characters long (without spaces and dashes)
no two spaces or dashes or a mix of them should be in a row
Additionally it would be nice if the dashes and spaces wouldn't be returned, but thats not that important

I suggest using
(?<!\d)\d(?:[ -]?\d){8}(?!\d)
See the regex demo. To only match ASCII digits, pass RegexOptions.ECMAScript option to the regex constructor.
Pattern details:
(?<!\d) - a negative lookbehind that fails the match if there is a digit symbol immediately to the left of the current location (same as (?<=^|\D)) (NOTE: to avoid matching 234.123 4567-89 replace this lookbehind with (?<!\d\.?))
\d - a digit
(?:[ -]?\d){8} - exactly 8 occurrences of a space or - and then any digit
(?!\d) - a negative lookahead that fails the match if there is a digit immediately to the right of the current location (NOTE: to prevent matching 123456789.34, use (?!\.?\d) instead).
C# usage to extract matches:
var results = Regex.Matches(s, #"(?<!\d)\d(?:[ -]?\d){8}(?!\d)", RegexOptions.ECMAScript)
.Cast<Match>()
.Select(m => m.Value)
.ToList();

This regex works:
(\d\s?-?){9}
It's looking for 9 groups of any digit followed by optional whitespace and optional hyphen character.
So it would match all of your examples, but would also match the following:
1 2 3 4 5 6 7 8 9
1 -2 -3 -4 -5 -6 -7 -8 -9 -
etc.
It's a simple regex, but it might not meet your requirements if you want to exclude matches with a trailing space or hyphen. Wiktor Stribiżew's answer provides a more complex regex which may suite your needs more thoroughly.

Related

Regex match certain amount of character and allow space inbetween

I am currently working on a regex which needs to match exactly 8 digits. But sometimes it occurs that there are spaces or dots between those numbers. This is the regex that i am currently using.
([0-9\ ?.?]{7,16})
It works fine most of the time, but the problem I am having is that it sometimes matches number with a lot of spaces tailing it so you will get something like 1234/s/s/s/s (/s stands for space) Or sometimes it is only matching spaces.
What i want is a regex that always matches at least 8 digits and also allows spaces and dots without detecting less then 8 digits. I know it may be stupid question, but I couldn't find anything I need elswhere.

Your ([0-9\ ?.?]{7,16}) expression matches 7 to 16 occurrences of any character that is either a digit, or a space, or a ?, or .. Yes, the ? inside [...] is a literal ?, not a quantifier.
You need to use an expression that will match a digit ([0-9]) and then exactly 7 sequences of a space or period ([ .]) followed with 1 digit, and to make sure you are not matching the digits in 123.156.78.146 you may use special boundaries:
(?<!\d[ .]?)\d(?:[. ]?\d){7}(?![ .]?\d)
if the space or . can only be 0 to 1 in between digits; or - if the space/dot can appear 0 or more times,
(?<!\d[ .]*)\d(?:[. ]*\d){7}(?![ .]*\d)
See the regex demo
The (?<!\d[ .]*) is a negative lookbehind that will fail any match if it starts with a digit that is followed with .(s) or space(s), and the (?![ .]*\d) negative lookahead will fail the match if the 7 digits you need are followed with .(s) or space(s) and a digit.

To solve this, describe the problem to yourself. You want to match one digit followed by seven repetitions of space-or-dot followed by a digit. This leads to a regular expression like \d([ .]?\d){7}. To avoid collecting the seven captures add a :? after the (. To capture the whole string, enclose it in brackets. Adding both changes gives the expression (\d(:?[ .]?\d){7}). If more than one space or dot is allowed between the digits then change the ? to a *.
To get just the eight digits out of the string I suggest using the string captured above and replacing any spaces or dots with nothing.

Regex failing in max length

I want regex which will allow following format
1234567-8
123456B
Now here if user enter second pattern then he should be lock to enter maximum 7 characters so
1234568B
123456V1
this becomes invalid
I have tried
[0-9]{7}-[0-9]|[[0-9]{6}[A-z]{1}]{7,7}
but this fails

For the sample input you provided, you may use ^([0-9]{7}-[0-9]|[0-9]{6}[A-Za-z])$.
A bit more contracted version: ^[0-9]{6}(?:[0-9]-[0-9]|[A-Za-z])$.
Note that 1234567-8 has 7 digits and a hyphen followed with a digit, so the whole string length cannot be limited to just 7 characters all in all.
In .NET and almost all other regex flavors [A-z] is a mistake, as it can match more than just letters.
Placing a quantifier {1} into a character class makes it a simple symbol combination, so [{1}] matches either { or 1 or }.
The {7,7} (={7}) will not limit the whole string length to 7, as you do not have anchors (^ and $) around the expression and you "ruined" the preceding quantifiers by putting them into a character class.

I think this is what you need:
^(\d{7}-\d|\d{6}[A-Z])$
7 digits, dash, digit OR 6 digits, 1 large latin letter.

^\d{6}(?:\d-\d|[A-Z])$
It can satisfy well with 2 your above formats
1234567-8
123456B

Using Regular Expression to find exact length match multiple times

I need a regular expression to find groups of exactly 8 numbers in a row. The closest I have gotten is:
[0-9]{8}
but it's not exactly what I need. If I had a number that was 9 long it will match the first 8 but I want it to ignore it if it's longer or shorter than 8.
Here are some examples
1234567890 <- no match, it's longer than 8
12345678 <- match: "12345678"
1234567809876543 <- match 1: "12345678", match 2: "09876543" (two groups of 8)
,,111-11-1234,12345678, <- match: "12345678"
To summarize, for every group of exactly 8 numbers make a match.
I'm working with some results of OCR (Optical Character Recognition) and I have to work with the shortcomings of the results so my input can be varied as in the examples above.
Here is some use case data: http://pastebin.com/uijF9K9n

You can use the following regex in .NET:
(?<=^|\D|(?:\d{8})+)\d{8}(?=$|\D|(?:\d{8})+)
See regex demo
It is based on variable-width lookbehind and a lookahead.
Regex breakdown:
(?<=^|\D|(?:\d{8})+) - only if at the string start (^), or preceded with not a digit (\D) or 1 or more sequences of 8 digits ((?:\d{8})+)...
\d{8} - match 8 digits that are followed by...
(?=$|\D|(?:\d{8})+) - either end of string ($) or not a digit (\D) or 1 or more sequences of 8 digits ((?:\d{8})+).
IMPORTANT:
If I got a downvote for the "extra" complexity compared with another answer, note our solutions are different: my regex matches 8-digit number in ID12345678, and the other one does not due to the word boundary.

You can also try this regex
(?:\b|\G)\d{8}(?=(?:\d{8})*\b)
(?:\b|\G) \b match a word boundary | or \G continue where last match attempt ended
\d{8} matches 8 digits [0-9] followed by a lookahead (?=... to check
(?:\d{8})*\b if followed by any amount of {8 digits} until another word boundary
It will match {8 digits} or out of a sequence of such if between two word boundaries.
See demo at regexstorm

\b[0-9]{8}\b this will give you what you want
For more details check this out
http://www.rexegg.com/regex-boundaries.html

Get a number with exactly x digits from string

im looking for a regex pattern, which matches a number with a length of exactly x (say x is 2-4) and nothing else.
Examples:
"foo.bar 123 456789", "foo.bar 456789 123", " 123", "foo.bar123 " has to match only "123"
So. Only digits, no spaces, letters or other stuff.
How do I have to do it?
EDIT: I want to use the Regex.Matches() function in c# to extract this 2-4 digit number and use it in additional code.

Any pattern followed by a {m,n} allows the pattern to occur m to n times. So in your case \d{m,n} for required values of m and n. If it has to be exactly an integer, use\d{m}
If you want to match 123 in x123y and not in 1234, use \d{3}(?=\D|$)(?<=(\D|^)\d{3})
It has a look ahead to ensure the character following the 3 digits is a non-digitornothing at all and a look behind to ensure that the character before the 3 digits is a non-digit or nothing at all.

You can achieve this with basic RegEx:
\b(\d\d\d)\b or \b(\d{3})\b - for matching a number with exactly 3 digits
If you want variable digits: \b(\d{2,4})\b (explained demo here)
If you want to capture matches next to words: \D(\d{2,4})\D (explained demo here)
\b is a word boundary (does not match anything, it's a zero-match character)
\d matches only digits
\D matches any character that is NOT a digit
() everything in round brackets will capture a match

How to make this regex allow spaces c#

I have a phone number field with the following regex:
[RegularExpression(#"^[0-9]{10,10}$")]
This checks input is exactly 10 numeric characters, how should I change this regex to allow spaces to make all the following examples validate
1234567890
12 34567890
123 456 7890
cheers!

This works:
^(?:\s*\d\s*){10,10}$
Explanation:
^ - start line
(?: - start noncapturing group
\s* - any spaces
\d - a digit
\s* - any spaces
) - end noncapturing group
{10,10} - repeat exactly 10 times
$ - end line
This way of constructing this regex is also fairly extensible in case you will have to ignore any other characters.

Use this:
^([\s]*\d){10}\s*$
I cheated :) I just modified this regex here:
Regular expression to count number of commas in a string
I tested. It works fine for me.

Use this simple regex
var matches = Regex.Matches(inputString, #"([\s\d]{10})");
EDIT
var matches = Regex.Matches(inputString, #"^((?:\s*\d){10})$");
explain:
^ the beginning of the string
(?: ){10} group, but do not capture (10 times):
\s* whitespace (0 or more times, matching the most amount possible)
\d digits (0-9)
$ before an optional \n, and the end of the string

Depending on your problem, you might consider using a Match Evaluator delegate, as described in http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.matchevaluator.aspx
That would make short work of the issue of counting digits and/or spaces

Something like this i think ^\d{2}\s?\d\s?\d{3}\s?\d{4}$
There are variants : 10 digits or 2 digits space 8 digits or 3 digits space 3 digits space 4 digits.
But if you want only this 3 variants use something like this
^(?:\d{10})|(?:\d{2}\s\d{8})|(?:\d{3}\s\d{3}\s\d{4})$

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex with mixed character set and different counts for each - c#

Related

Regex match certain amount of character and allow space inbetween

Regex failing in max length

Using Regular Expression to find exact length match multiple times

Get a number with exactly x digits from string

How to make this regex allow spaces c#

Categories

Resources