Regex pattern to match numbers in certain conditions - c#

First time posting, please forgive the formatting. Not really a programmer, I work in C# with the Revit and AutoCAD APi's. Important to note, as the Revit API is a bit of mess, so the same code may produce different results in a different API. So I have three basic string patterns where I want to return certain numbers depending on what their prefix & suffix. They could be surrounded by other text than what I show, and the actual numbers and positions within the string may vary.
String 1: (12) #4x2'-0 # 6 EF
String 2: (12) #4 # 6 EF
String 3: STAGGER 2'-0, SPCG AT 6 AT 12 SLAB
The code I'm using:
if (LengthAsString.IsMatch(remarkdata) == true)
{
Regex remarklength = new Regex(#"isnertRegexPatternhere");
if (remarklength.IsMatch(remarkdata))
{
remarkdata = remarklength.Replace(remarkdata, "${0}\u0022");
}
}
remarkdata is the strings from above, and im adding inch marks " after each number match.
The patterns ive tested and their returns:
String 1 String 2 String 3
\d+(?!['-]|([(\d+)])) 0,6 4,6 0,6,12
(?<![#])\d+ 12,2,0,6 12,6 2,9,6,12
\d+(?= #)|(?<=# )\d+ 0,6 6 no matches
expected results: 0,6 6 0,6,12
so im close, but no cigar. Thoughts?
Double Edit: looking for the numbers that aren't preceded by #, nor between (). Ignore # and x, they may or may not be there.

You seem to be looking for
(?<!#)\d+(?!.*(?:['-]|[#x]\d))
See the regex demo
Details
(?<!#) - a negative lookbehind that fails the match if there is a # immediately to the left of the current location
\d+ - 1 or more digits (or [0-9]+ to only match ASCII digits)
(?!.*(?:['-]|[#x]\d)) - a negative lookahead that fails the match once there are any 0+ chars other than newline (.*) followed with ', -, or #/x followed with a digit immediately to the right of the current location.
Note that in case your strings always have balanced non-nested parentheses, and you may have (123) substrings after # or x1, you may also want to add [^()]*\) into the lookahead
(?<!#)\d+(?!.*(?:['-]|[#x]\d)|[^()]*\))
to avoid matching digits inside the parentheses.
See another .NET demo.

Related

Is there a regular expression for matching a string that has no more than 2 repeating characters? [duplicate]

I want to match strings that do not contain more than 3 of the same character repeated in a row. So:
abaaaa [no match]
abawdasd [match]
abbbbasda [no match]
bbabbabba [match]
Yes, it would be much easier and neater to do a regex match for containing the consecutive characters, and then negate that in the code afterwards. However, in this case that is not possible.
I would like to open out the question to x consecutive characters so that it can be extended to the general case to make the question and answer more useful.
Negative lookahead is supported in this case.
Use a negative lookahead with back references:
^(?:(.)(?!\1\1))*$
See live demo using your examples.
(.) captures each character in group 1 and the negative look ahead asserts that the next 2 chars are not repeats of the captured character.
To match strings not containing a character repeated more than 3 times consecutively:
^((.)\2?(?!\2\2))+$
How it works:
^ Start of string
(
(.) Match any character (not a new line) and store it for back reference.
\2? Optionally match one more exact copies of that character.
(?! Make sure the upcoming character(s) is/are not the same character.
\2\2 Repeat '\2' for as many times as you need
)
)+ Do ad nauseam
$ End of string
So, the number of /2 in your whole expression will be the number of times you allow a character to be repeated consecutively, any more and you won't get a match.
E.g.
^((.)\2?(?!\2\2\2))+$ will match all strings that don't repeat a character more than 4 times in a row.
^((.)\2?(?!\2\2\2\2))+$ will match all strings that don't repeat a character more than 5 times in a row.
Please be aware this solution uses negative lookahead, but not all not all regex flavors support it.
I'm answering this question :
Is there a regular expression for matching a string that has no more than 2 repeating characters?
which was marked as an exact duplicate of this question.
Its much quicker to negate the match instead
if (!Regex.Match("hello world", #"(.)\1{2}").Success) Console.WriteLine("No dups");

Regular Expressions C#

I know there are many questions about making regular expressions, but they all seem to be about a single problem than the general usage. I, too, have a problem like to solve. I have tried to learn by reading about regular expressions, but it gets tricky quick. Here's my question:
C#
I need to validate two textboxes that exist on the same form. The math operations I've coded can handle any floating point number. For this particular application I know of three formats the numbers will be in or there is a mistake on the users behalf. I'd like to prevent those mistakes in example if an extra number is accidentally typed or if enter is hit too early, etc.
Here are the formats: "#.####" "##.####" "###.##" where the "#" represents a mandatory digit. The formats starting with a one or two digit whole number must have 4 trailing digits or more. I've capped it at 8, or so I tried to lol.The format starting with a three digit whole number should never be allowed to have more than two digits trailing the decimal.
Here's what I have tried thus far.
Regex acceptedInputRegex = new Regex(#"^\b[0-9]{3}.[0-9]{2}|[0-9]{1,2}.[0-9]{4,8}$");
Regex acceptedInputRegex = new Regex(#"^\b\d{3}.\d{2} | \d{1,2}.\d{4,8}$");
I have tried it in thinking a match was what I wanted to achieve and as if a match to my negated expression means there is a problem. I was unsuccessful in both attempts. This is the code:
if (acceptedInputRegex.IsMatch(txtMyTextBox1.Text) || acceptedInputRegex.IsMatch(txtMyTextBox2.Text))
{
} else
{
MessageBox.Show("Numbers are not in the right format", "Invalid Input!");
return;
}
Are regular expressions what I should be using to solve this problem?
If not, please tell me what you recommend. If so, please help me correct my regex.
Thanks.
You are close, you need to escape the dots and group the alternatives so that the ^ and $ anchors could be applied to both of them:
#"^(?:\d{3}\.\d{2}|\d{1,2}\.\d{4,8})$"
See the regex demo.
Details:
^ - start of string
(?: - start of a non-capturing group matching either of the two alternatives:
\d{3}\.\d{2} - 3 digits, . and 2 digits
| - or
\d{1,2}\.\d{4,8} - 1 or 2 digits, ., 4 to 8 digits
) - end of the non-capturing group
$ - end of string.
To make \d match only ASCII digits, use RegexOptions.ECMAScript option:
var isValid = Regex.IsMatch(s, #"^(?:\d{3}\.\d{2}|\d{1,2}\.\d{4,8})$", RegexOptions.ECMAScript);

Regex match certain amount of character and allow space inbetween

I am currently working on a regex which needs to match exactly 8 digits. But sometimes it occurs that there are spaces or dots between those numbers. This is the regex that i am currently using.
([0-9\ ?.?]{7,16})
It works fine most of the time, but the problem I am having is that it sometimes matches number with a lot of spaces tailing it so you will get something like 1234/s/s/s/s (/s stands for space) Or sometimes it is only matching spaces.
What i want is a regex that always matches at least 8 digits and also allows spaces and dots without detecting less then 8 digits. I know it may be stupid question, but I couldn't find anything I need elswhere.
Your ([0-9\ ?.?]{7,16}) expression matches 7 to 16 occurrences of any character that is either a digit, or a space, or a ?, or .. Yes, the ? inside [...] is a literal ?, not a quantifier.
You need to use an expression that will match a digit ([0-9]) and then exactly 7 sequences of a space or period ([ .]) followed with 1 digit, and to make sure you are not matching the digits in 123.156.78.146 you may use special boundaries:
(?<!\d[ .]?)\d(?:[. ]?\d){7}(?![ .]?\d)
if the space or . can only be 0 to 1 in between digits; or - if the space/dot can appear 0 or more times,
(?<!\d[ .]*)\d(?:[. ]*\d){7}(?![ .]*\d)
See the regex demo
The (?<!\d[ .]*) is a negative lookbehind that will fail any match if it starts with a digit that is followed with .(s) or space(s), and the (?![ .]*\d) negative lookahead will fail the match if the 7 digits you need are followed with .(s) or space(s) and a digit.
To solve this, describe the problem to yourself. You want to match one digit followed by seven repetitions of space-or-dot followed by a digit. This leads to a regular expression like \d([ .]?\d){7}. To avoid collecting the seven captures add a :? after the (. To capture the whole string, enclose it in brackets. Adding both changes gives the expression (\d(:?[ .]?\d){7}). If more than one space or dot is allowed between the digits then change the ? to a *.
To get just the eight digits out of the string I suggest using the string captured above and replacing any spaces or dots with nothing.

Using Regular Expression to find exact length match multiple times

I need a regular expression to find groups of exactly 8 numbers in a row. The closest I have gotten is:
[0-9]{8}
but it's not exactly what I need. If I had a number that was 9 long it will match the first 8 but I want it to ignore it if it's longer or shorter than 8.
Here are some examples
1234567890 <- no match, it's longer than 8
12345678 <- match: "12345678"
1234567809876543 <- match 1: "12345678", match 2: "09876543" (two groups of 8)
,,111-11-1234,12345678, <- match: "12345678"
To summarize, for every group of exactly 8 numbers make a match.
I'm working with some results of OCR (Optical Character Recognition) and I have to work with the shortcomings of the results so my input can be varied as in the examples above.
Here is some use case data: http://pastebin.com/uijF9K9n
You can use the following regex in .NET:
(?<=^|\D|(?:\d{8})+)\d{8}(?=$|\D|(?:\d{8})+)
See regex demo
It is based on variable-width lookbehind and a lookahead.
Regex breakdown:
(?<=^|\D|(?:\d{8})+) - only if at the string start (^), or preceded with not a digit (\D) or 1 or more sequences of 8 digits ((?:\d{8})+)...
\d{8} - match 8 digits that are followed by...
(?=$|\D|(?:\d{8})+) - either end of string ($) or not a digit (\D) or 1 or more sequences of 8 digits ((?:\d{8})+).
IMPORTANT:
If I got a downvote for the "extra" complexity compared with another answer, note our solutions are different: my regex matches 8-digit number in ID12345678, and the other one does not due to the word boundary.
You can also try this regex
(?:\b|\G)\d{8}(?=(?:\d{8})*\b)
(?:\b|\G) \b match a word boundary | or \G continue where last match attempt ended
\d{8} matches 8 digits [0-9] followed by a lookahead (?=... to check
(?:\d{8})*\b if followed by any amount of {8 digits} until another word boundary
It will match {8 digits} or out of a sequence of such if between two word boundaries.
See demo at regexstorm
\b[0-9]{8}\b this will give you what you want
For more details check this out
http://www.rexegg.com/regex-boundaries.html

C# Regex Phone Number Check

I have the following to check if the phone number is in the following format
(XXX) XXX-XXXX. The below code always return true. Not sure why.
Match match = Regex.Match(input, #"((\(\d{3}\) ?)|(\d{3}-))?\d{3}-\d{4}");
// Below code always return true
if (match.Success) { ....}
The general complaint about regex patterns for phone numbers is that they require one to put in the truly optional characters as dashes and other items.
Why can't they be optional and have the pattern not care if they are there or not?
The below pattern makes dashes, periods and parenthesis optional for the user and focuses on the numbers as a result using named captures.
The pattern is commented (using the # and spans multiple lines) so use the Regex option IgnorePatternWhitespace unless one removes the comments. For that flag doesn't affect regex processing, it only allows for commenting of the pattern via the # character and line break .
string pattern = #"
^ # From Beginning of line
(?:\(?) # Match but don't capture optional (
(?<AreaCode>\d{3}) # 3 digit area code
(?:[\).\s]?) # Optional ) or . or space
(?<Prefix>\d{3}) # Prefix
(?:[-\.\s]?) # optional - or . or space
(?<Suffix>\d{4}) # Suffix
(?!\d) # Fail if eleventh number found";
The above pattern just looks for 10 numbers and ignores any filler characters such as a ( or a dash - or a space or a tab or even a .. Examples are
(555)555-5555 (OK)
5555555555 (ok)
555 555 5555(ok)
555.555.5555 (ok)
55555555556 (not ok - match failure - too many digits)
123.456.789 (failure)
Different Variants of same pattern
Pattern without comments no longer need to use IgnorePatternWhiteSpace:
^(?:\(?)(?<AreaCode>\d{3})(?:[\).\s]?)(?<Prefix>\d{3})(?:[-\.\s]?)(?<Suffix>\d{4})(?!\d)
Pattern when not using Named Captures
^(?:\(?)(\d{3})(?:[\).\s]?)(\d{3})(?:[-\.\s]?)(\d{4})(?!\d)
Pattern if ExplicitCapture option is used
^\(?(?<AreaCode>\d{3})[\).\s]?(?<Prefix>\d{3})[-\.\s](?<Suffix>\d{4})(?!\d)
It doesn't always match, but it will match any string that contains three digits, followed by a hyphen, followed by four more digits. It will also match if there's something that looks like an area code on the front of that. So this is valid according to your regex:
%%%%%%%%%%%%%%(999)123-4567%%%%%%%%%%%%%%%%%
To validate that the string contains a phone number and nothing else, you need to add anchors at the beginning and end of the regex:
#"^((\(\d{3}\) ?)|(\d{3}-))?\d{3}-\d{4}$"
Alan Moore did a good explaining what your exp is actually doing. +1
If you want to match exactly "(XXX) XXX-XXXX" and absolutely nothing else, then what you want is
#"^\(\d{3}\) \d{3}-\d{4}$"
Here is the C# code I use. It is designed to get all phone numbers from a page of text. It works for the following patters: 0123456789, 012-345-6789, (012)-345-6789, (012)3456789 012 3456789, 012 345 6789, 012 345-6789, (012) 345-6789, 012.345.6789
List<string> phoneList = new List<string>();
Regex rg = new Regex(#"\(?([0-9]{3})\)?[-. ]?([0-9]{3})[-. ]?([0-9]{4})");
MatchCollection m = rg.Matches(html);
foreach (Match g in m)
{
if (g.Groups[0].Value.Length > 0)
phoneList.Add(g.Groups[0].Value);
}
none of the comments above takes care of international numbers like +33 6 87 17 00 11 (which is a valid phone number for France for example).
I would do it in a two-step approach:
1. Remove all characters that are not numbers or '+' character
2. Check the + sign is at the beginning or not there. Check length (this can be very hard as it depends on local country number schemes).
Now if your number starts with +1 or you are sure the user is in USA, then you can apply the comments above.

Categories

Resources