I'm new to regex but I seem to have things going my way.
https://regex101.com/r/Is8wZK/1 --- group 8 might have more than one word in it... sepereated by a space, but, as u can see, so does group 5, and i've exhausted my one time useage of (.+)
How can I re-write my regex to detect group 8 in exactly the way group 5 is detected?
^(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+((?:[[:alpha:]]+)(?:\s+[[:alpha:]]+)*)\s+(\S+)\s+(\S+)\s+((?:[[:alpha:]]+)(?:\s+[[:alpha:]]+)*)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)$
Link: https://regex101.com/r/v4mEJK/1
Pretty much all you need to do is match a group of alphabetic character and an optional group of spaces plus alphabetic characters in order to capture names which may or may not have more than one word; this is done by using
(?:[[:alpha:]]+)(?:\s+[[:alpha:]]+)*)
for groups 5 and 8.
The rest of the regex could possibly be made more specific, but there isn't really any need to add more complexity unless your input text is significantly more complex than your test case.
FWIW:
It's far better to use \s+ instead of a raw space between groups so you can match other delimiting whitespace.
I redid your generic capture groups into this:
^(\d+\/\d+\/\d+) ([A-Z]\d+) (\d+) (\d+) (.+) (\d+[A-Z]{3}\d+) (\d+) (.+) ([A-Z]) (\d+\.\d+) (\d+\.\d+) (\d+\.\d+)$
Breaking that down:
(\d+\/\d+\/\d+): this matches the date
([A-Z]\d+): this matches a capital followed by some numbers
(\d+): this matches a number
(\d+): this matches a number
(.+): this is the first general group
(\d+[A-Z]{3}\d+): this matches any number followed by 3 capitals followed by any number
(\d+): this matches a number
(.+): this is the second general group
(\d+\.\d+): this matches a number with a decimal point
(\d+\.\d+): this matches a number with a decimal point
(\d+\.\d+): this matches a number with a decimal point
This should help you get what you want.
If you are only interested in groups 5 and 8, try non capturing groups:
^(?:\d+\/\d+\/\d+) (?:[A-Z]\d+) (?:\d+) (?:\d+) (.+) (?:\d+[A-Z]{3}\d+) (?:\d+) (.+) (?:[A-Z]) (?:\d+\.\d+) (?:\d+\.\d+) (?:\d+\.\d+)$
Or only group what you need:
^\d+\/\d+\/\d+ [A-Z]\d+ \d+ \d+ (.+) \d+[A-Z]{3}\d+ \d+ (.+) [A-Z] \d+\.\d+ \d+\.\d+ \d+\.\d+$
Related
I'm trying to build a regex to check if a text input is valid.
The pattern is [NumberBetween1And999]['x'][NumberBetween1And999][','][White space Optional] repeated infinite times.
I need this to make an order from a string: the first number is the product id and the second number is the quantity for the product.
Examples: of good texts:
1x1
2x1,3x1
1x3, 4x1
Should not catch:
1x1,
1,1, 1x1,
9999x1
1x1,99999x1
I'm blocked there: ^(([1-9][0-9]{0,2})x([1-9][0-9]{0,2}),)*$
Thanks for helping me
You can use
^[1-9][0-9]{0,2}x[1-9][0-9]{0,2}(?:,\s*[1-9][0-9]{0,2}x[1-9][0-9]{0,2})*$
The pattern matches:
^ Start of string
[1-9][0-9]{0,2}x[1-9][0-9]{0,2} Match a digit 1-9 and 2 optional digits 0-9, then x and again the digits part
(?: Non capture group to repeat as a whole
,\s* Match a comma and optional whitespace char
[1-9][0-9]{0,2}x[1-9][0-9]{0,2} Match the same pattern as at the beginning
)* Close the non capture group and optionally repeat it to also match a single part without a comma
$ End of string
Regex demo
I'm trying to come up with a regex to deal with a general scenario to capture a number from a string where the number may have one or more non-numeric characters pre/post-fixed to it.
The number can contain zero or one decimal or comma.
If the string contains multiple "sets" of consecutive digits separated by non-digits I would like the regex to fail ("sets" is probably not the correct terminology).
As an example the following inputs would succeed with a match:
abc12.00xyz would match 12.00
0.1$ would be valid and match 0.1
.01 would be valid and match .01
123abc would be valid and match 123
abc123 would be valid and match 123
These inputs would fail to match:
abc12.00xyz322 would fail due to the second "set" of digits, 322 in this example
12t2 would fail due to having two separate "sets" of digits
I've tried many permutations and I'm not making much headway. This is the closest I've come to so far. It matches on the numbers correctly, excluding the non-digits from the match, but it includes all "sets" of numbers in the string.
([\d]*[.,])?[\d]+
Any suggestions would be appreciated.
You can use a capture group:
^[^0-9\r\n]*?([0-9]*\.?[0-9]+)[^0-9\r\n]*$
^ Start of string
[^0-9\r\n]* Optionally match any char except a digit or a newline, as few as possible
([0-9]*\.?[0-9]+) Capture group 1, match optional digits, optional comma and 1+ digits
[^0-9\r\n]* Optionally match any char except a digit or a newline
$ End of string
See a .NET regex demo (Click on the Table tab to see the capture group values)
I have such source text with an optional group in the middle:
GH22-O0-TFS-SFSD 00-1-006.19135
GH22-O0-TFS-SFSD 00-1-006.1.19135
Desired value in the first case will be '19135' and in the second '1.19135'.
Regex has to match the whole string and select all characters after first "." - which is my Group 1. I tried to make subgroups and mark Group 3 as optional but it is not working.
Regex:
.*\.0*(([0-9])(\.0*([0-9]+)))
How it should be changed to capture desired values?
This should work for you:
.*?\.(.*)
This will match the whole string and include everything after the first period in capture group 1 regardless of character type.
You can use
^(.*?)\.0*(\d+)(?:\.0*(\d+))?$
See the regex demo. Details:
^ - start of string
(.*?) - Group 1: any zero or more chars other than an LF char as few as possoble (as *? is a lazy quantifier)
\. - a dot
0* - zero or more zeros
(\d+) - Group 2: any one or more digits
(?:\.0*(\d+))? - an optional occurrence of ., zero or more zeros, then Group 3 capturing one or more digits
$ - end of string.
I hope I understand your goals and this should work:
.*?\.([\d.]+)
.*?\. - loosely capture everything leading up to the first period
([\d.]+) - capture the remaining digits and periods into capture group #1
https://regex101.com/r/0t9Ijy/1
I want to match an 8 digit number. Currently, I have the following regex but It is failing in some cases.
(\d+)\1{6}
It matches only when a number is different at the end such as 44444445 or 54444444. However, I am looking to match cases where at least 7 digits are the same regardless of their position.
It is failing in cases like
44454444
44544444
44444544
What modification is needed here?
It's probably a bad idea to use this in a performance-sensitive location, but you can use a capture reference to achieve this.
The Regex you need is as follows:
(\d)(?:.*?\1){6}
Breaking it down:
(\d) Capture group of any single digit
.*? means match any character, zero or more times, lazily
\1 means match the first capture group
We enclose that in a non-capturing group {?:
And add a quantifier {6} to match six times
You can sort the digits before matching
string input = "44444445 54444444 44454444 44544444 44444544";
string[] numbers = input.Split(' ');
foreach (var number in numbers)
{
number = String.Concat(str.OrderBy(c => c));
if (Regex.IsMatch(number, #"(\d+)\1{6}"))
// do something
}
Still not a good idea to use regex for this though
The pattern that you tried (\d+)\1{6} matches 6 of the same digits in a row. If you want to stretch the match over multiple same digits, you have to match optional digits in between.
Note that in .NET \d matches more digits than 0-9 only.
If you want to match only digits 0-9 using C# without matching other characters in between the digits:
([0-9])(?:[0-9]*?\1){6}
The pattern matches:
([0-9]) Capture group 1
(?: Non capture group
[0-9]*?\1 Match optional digits 0-9 and a backreference to group 1
){6} Close non capture group and repeat 6 times
See a .NET Regex demo
If you want to match only 8 digits, you can use a positive lookahead (?= to assert 8 digits and word boundaries \b
\b(?=\d{8}\b)[0-9]*([0-9])(?:[0-9]*?\1){6}\d*\b
See another .NET Regex demo
I have a dataset where each line contains a number that is enclosed within a set of parenthesis or brackets. e.g.
Jim Bob Smith [1975]
Joe Bob Public (1955)
What I'm having problems with is creating a regex expression that will match the number (without the brackets or parenthesis) that will work under both conditions.
I've tried
(?<=\[).+?(?=\]) and
(?<=\().+?(?=\))
So I need help finding a way to combine the two. Any assistance would be greatly appreciated.
You may use the following .NET regex:
(?:(\()|\[)(.*?)(?(1)\)|])
See the regex demo
Details
(?:(\()|\[) - a non-capturing group that matches and captures into Group 1 a ( char, else just matches a [ char
(.*?) - Group 2: any 0 or more chars other than a newline char, as few as possible (instead of .*?, you might want to use \d+ there to match 1 or more digits, or \d{4} to match just four digits exactly, or even (?:20|19)\d{2} to match a year in the 20th and 21st c.)
(?(1)\)|]) - a conditional construct: if Group 1 was matched, a ) is matched, else, a ] char.
Try
.*?[[(](\d{4})[])]
See here
.*? - non greedy any char
[[(] for either opening quote
(\d{4}) - creates the 4 digit capture group you want.
[])] for either closing quote