Regex C# - optional group in the middle - c#

I have such source text with an optional group in the middle:
GH22-O0-TFS-SFSD 00-1-006.19135
GH22-O0-TFS-SFSD 00-1-006.1.19135
Desired value in the first case will be '19135' and in the second '1.19135'.
Regex has to match the whole string and select all characters after first "." - which is my Group 1. I tried to make subgroups and mark Group 3 as optional but it is not working.
Regex:
.*\.0*(([0-9])(\.0*([0-9]+)))
How it should be changed to capture desired values?

This should work for you:
.*?\.(.*)
This will match the whole string and include everything after the first period in capture group 1 regardless of character type.

You can use
^(.*?)\.0*(\d+)(?:\.0*(\d+))?$
See the regex demo. Details:
^ - start of string
(.*?) - Group 1: any zero or more chars other than an LF char as few as possoble (as *? is a lazy quantifier)
\. - a dot
0* - zero or more zeros
(\d+) - Group 2: any one or more digits
(?:\.0*(\d+))? - an optional occurrence of ., zero or more zeros, then Group 3 capturing one or more digits
$ - end of string.

I hope I understand your goals and this should work:
.*?\.([\d.]+)
.*?\. - loosely capture everything leading up to the first period
([\d.]+) - capture the remaining digits and periods into capture group #1
https://regex101.com/r/0t9Ijy/1

Related

Regex start new match at specific pattern

Hello im kinda new to regex and have a small, maybe simple question.
I have the given text:
17.11.2020 15:32 typical Pat. seems sleeping
Additional test
17.11.2020 15:32 typical Pat. seems sleeping
Additional test
17.11.2020 15:32 typical Pat. seems sleeping
Additional test
My current regex (\d{2}.\d{2}.\d{4}\s\d{2}:\d{2})\s?(.*)
matches only till sleeping but reates 3 matches correctly.
But i need the Additional test text also in the second group.
i tried something like (\d{2}.\d{2}.\d{4}\s\d{2}:\d{2})\s?([,.:\w\s]*) but now i have only one huge match because the second group takes everything until the end.
How can i match everything until a new line with a date starts and create a new match from there on?
If you are sure there is only one additional line to be matched you can use
(?m)^(\d{2}\.\d{2}\.\d{4}\s\d{2}:\d{2})\s*(.*(?:\n.*)?)
See the regex demo. Details:
(?m) - a multiline modifier
^ - start of a line
(\d{2}\.\d{2}\.\d{4}\s\d{2}:\d{2}) - Group 1: a datetime string
\s* - zero or more whitespaces
(.*(?:\n.*)?) - Group 2: any zero or more chars other than a newline char as many as possible and then an optional line, a newline followed with any zero or more chars other than a newline char as many as possible.
If there can be any amount of lines, you may consider
(?m)^(\d{2}\.\d{2}\.\d{4}[\p{Zs}\t]\d{2}:\d{2})[\p{Zs}\t]*(?s)(.*?)(?=\n\d{2}\.\d{2}\.\d{4}|\z)
See this regex demo. Here,
(?m)^(\d{2}\.\d{2}\.\d{4}[\p{Zs}\t]\d{2}:\d{2}) - matches the same as above, just \s is replaced with [\p{Zs}\t] that only matches horizontal whitespace
[\p{Zs}\t]* - 0+ horizontal whitespace chars
(?s) - now, . will match any chars including a newline
(.*?) - Group 2: any zero or more chars, as few as possible
(?=\n\d{2}\.\d{2}\.\d{4}|\z) - up to the leftmost occurrence of a newline, followed with a date string, or up to the end of string.
You are using \s repeatedly using the * quantifier with the character class [,.:\w\s]* and \s also matches newlines and will match too much.
You can just match the rest of the line using (.*\r?\n.*) which would not match a newline, then match a newline and the next line in the same group.
^(\d{2}.\d{2}.\d{4}\s\d{2}:\d{2})\s?(.*\r?\n.*)
Regex demo
If multiple lines can follow, match all following lines that do not start with a date like pattern.
^(\d{2}\.\d{2}\.\d{4})\s*(.*(?:\r?\n(?!\d{2}\.\d{2}\.\d{4}).*)*)
Explanation
^ Start of the string
( Capture group1
\d{2}\.\d{2}\.\d{4} Match a date like pattern
) Close group 1
\s* Match 0+ whitespace chars (Or match whitespace chars without newlines [^\S\r\n]*)
( Capture group 2
.* Match the whole line
(?:\r?\n(?!\d{2}\.\d{2}\.\d{4}).*)* Optionally repeat matching the whole line if it does not start with a date like pattern
) Close group 2
Regex demo

Trying to capture a decimal number out of a string

In a text file, I'm looking for a part of a document that contains a piece like min ISO 1133 0.2-0.35. What I want to capture is the ranged decimal part of that piece of text (0.2-0.35). Since there are other ranged decimal numbers, I cannot simply use a regular expression to look only for the ranged part. Till now, I could make min.*(\d+)((?:\.)?)(\d*)-(\d+)((?:\.)?)(\d*) but the result is not correct and I'm stuck. Can anyone please help me with this?
Below, you can see the final result (yellow part):
Maybe the following would work for you?
\bmin\s.*?(\d+(?:\.\d+)?)-(\d+(?:\.\d+)?)
See the online demo
The answer is currently based on the assumption (looking at your current attempt) you'd want these ranges in seperate groups. However, if not, this answer can be swiftly transformed to capture the whole substring (or see #TheFourthBird's answer).
\b - Match word boundary.
min - Literally match 'min'.
\s - Match a whitespace character.
.*? - Match any character other than newline up to (lazy):
( - Open 1st capture group
\d+ - At least a single digit.
(?: - Open non-capturing group.
\.\d+ - Match a literal dot and at least a single digit.
)? - Close non-capturing group and make it optional.
) - Close 1st capture group.
- Match a literal hyphen.
( - Open 2nd capture group
\d+ - At least a single digit.
(?: - Open non-capturing group.
\.\d+ - Match a literal dot and at least a single digit.
)? - Close non-capturing group and make it optional.
) - Close 2nd capture group.
You could get the decimal part matching 1+ digit in the optional part and making the quantifier non greedy. The value is in capture group 1.
\bmin [A-Z]+ [0-9]+ ([0-9]+(?:\.[0-9]+)?-[0-9](?:\.[0-9]+)?)\b
Regex demo
Or a bit more specific pattern
\bmin [A-Z]+ [0-9]+ ([0-9]+(?:\.[0-9]+)?-[0-9]+(?:\.[0-9]+)?)\b
Regex demo

Regex to match number in a string enclosed within brackets or Parenthesis

I have a dataset where each line contains a number that is enclosed within a set of parenthesis or brackets. e.g.
Jim Bob Smith [1975]
Joe Bob Public (1955)
What I'm having problems with is creating a regex expression that will match the number (without the brackets or parenthesis) that will work under both conditions.
I've tried
(?<=\[).+?(?=\]) and
(?<=\().+?(?=\))
So I need help finding a way to combine the two. Any assistance would be greatly appreciated.
You may use the following .NET regex:
(?:(\()|\[)(.*?)(?(1)\)|])
See the regex demo
Details
(?:(\()|\[) - a non-capturing group that matches and captures into Group 1 a ( char, else just matches a [ char
(.*?) - Group 2: any 0 or more chars other than a newline char, as few as possible (instead of .*?, you might want to use \d+ there to match 1 or more digits, or \d{4} to match just four digits exactly, or even (?:20|19)\d{2} to match a year in the 20th and 21st c.)
(?(1)\)|]) - a conditional construct: if Group 1 was matched, a ) is matched, else, a ] char.
Try
.*?[[(](\d{4})[])]
See here
.*? - non greedy any char
[[(] for either opening quote
(\d{4}) - creates the 4 digit capture group you want.
[])] for either closing quote

Regular expression for updating version number

I have a version numbers as given below.
020. 000. 1234. 43567 (please note the whitespace after the dot(.))
020,000,1234,43567
20.0.1234.43567
20,0,1234,43567
I want a regular expression for updating the numbers after last two dots(.) to for example 1298 and 45678 (any number)
020. 000. 1298. 43568 (please note the whitespace after the dot(.))
020,000,1298,45678
20.0.1298.45678
20,0,1298,45678
Thanks,
resultString = Regex.Replace(subjectString,
#"(\d+) # any number
([.,]\s*) # dot or comma, optional whitespace
(\d+) # etc.
([.,]\s*)
\d+
([.,]\s*)
\d+",
"$1$2$3${4}1298${5}43568", RegexOptions.IgnorePatternWhitespace);
Note the ${4} instead of $4 because otherwise the following 1 would be interpreted as belonging to the group number ($41).
Also note the difference between (\d+) and (\d)+. While both match 1234, the first one will capture 1234 into the group created by the parentheses. The second one will capture only 4 because the previous captures will be overwritten by the next.
To replace version with 1298 and 43568
var regex = new Regex(#"(?<=^(?:\d+[.,]\s*){2})\d+(?<seperator>[.,]\s*)\d+$");
regex.Replace(source, "1298${seperator}43568");
This is because
(?<=) doesn't includethe group in the match but requires it to exist before the match
^ match start of string followed by at least one digit
(?:\d+[.,]\s*) non capturing group, match at least one digit followed by a . or , followed by 0 or more spaces
{2} previous match should occur twice
\d+ first part of the capture, 1 or more digits
(?<seperator>[.,]\s*) get the seperator of a . or , followed by optional spaces into a named capture group called seperator
\d+ capture one or more digits
$ match end of string
in the replacement string you are just providing the replacement version and using ${seperator} to insert the original seperator.
If you are not bothered about preserving the seperator you can just do
var regex = new Regex(#"(?<=^(?:\d+[.,]\s*){2})\d+[.,]\s*\d+$");
regex.Replace(source, "1298.43568");

Regex matching recurring patterns

I want to validate input in a C# TextBox by using regular expressions. The expected input is in this format:
CCCCC-CCCCC-CCCCC-CCCCC-CCCCC-CCCCC-C
So I've got six elements of five separated characters and one separated character at the end.
Now my regex matches any character between five and 255 chars: .{5,255}
How do I need to modify it in order to match the format mentioned above?
Update: -
If you want to match any character, then you can use: -
^(?:[a-zA-Z0-9]{5}-){6}[a-zA-Z0-9]$
Explanation: -
(?: // Non-capturing group
[a-zA-Z0-9]{5} // Match any character or digit of length 5
- // Followed by a `-`
){6} // Match the pattern 6 times (ABCD4-) -> 6 times
[a-zA-Z0-9] // At the end match any character or digit.
Note: - The below regex will only match pattern like you posted: -
CCCCC-CCCCC-CCCCC-CCCCC-CCCCC-CCCCC-C
You can try this regex: -
^(?:([a-zA-Z0-9])\1{4}-){6}\1$
Explanation: -
(?: // Non-capturing group
( // First capture group
[a-zA-Z0-9] // Match any character or digit, and capture in group 1
)
\1{4} // Match the same character as in group 1 - 4 times
- // Followed by a `-`
){6} // Match the pattern 6 times (CCCCC-) -> 6 times
\1 // At the end match a single character.
Untested, but I think this will work:
([A-Za-z0-9]{5}-){6}[A-Za-z0-9]
For your example, in general replace C to the character class you want:
^(C{5}-){6}C$
^([a-z]{5}-){6}[a-z]$ # Just letter, use case insensitive modifier
^([a-z0-9]{5}-){6}[a-z0-9]$ # Letters and digits..
Try this:
^(C{5}-){6}C$
The ^ and $ denote the begiining and end of the string repectively and make sure that no additional characters are entered.

Categories

Resources