Trying to capture a decimal number out of a string - c#

In a text file, I'm looking for a part of a document that contains a piece like min ISO 1133 0.2-0.35. What I want to capture is the ranged decimal part of that piece of text (0.2-0.35). Since there are other ranged decimal numbers, I cannot simply use a regular expression to look only for the ranged part. Till now, I could make min.*(\d+)((?:\.)?)(\d*)-(\d+)((?:\.)?)(\d*) but the result is not correct and I'm stuck. Can anyone please help me with this?
Below, you can see the final result (yellow part):

Maybe the following would work for you?
\bmin\s.*?(\d+(?:\.\d+)?)-(\d+(?:\.\d+)?)
See the online demo
The answer is currently based on the assumption (looking at your current attempt) you'd want these ranges in seperate groups. However, if not, this answer can be swiftly transformed to capture the whole substring (or see #TheFourthBird's answer).
\b - Match word boundary.
min - Literally match 'min'.
\s - Match a whitespace character.
.*? - Match any character other than newline up to (lazy):
( - Open 1st capture group
\d+ - At least a single digit.
(?: - Open non-capturing group.
\.\d+ - Match a literal dot and at least a single digit.
)? - Close non-capturing group and make it optional.
) - Close 1st capture group.
- Match a literal hyphen.
( - Open 2nd capture group
\d+ - At least a single digit.
(?: - Open non-capturing group.
\.\d+ - Match a literal dot and at least a single digit.
)? - Close non-capturing group and make it optional.
) - Close 2nd capture group.

You could get the decimal part matching 1+ digit in the optional part and making the quantifier non greedy. The value is in capture group 1.
\bmin [A-Z]+ [0-9]+ ([0-9]+(?:\.[0-9]+)?-[0-9](?:\.[0-9]+)?)\b
Regex demo
Or a bit more specific pattern
\bmin [A-Z]+ [0-9]+ ([0-9]+(?:\.[0-9]+)?-[0-9]+(?:\.[0-9]+)?)\b
Regex demo

Related

Regex pattern infinite number of times except last one different

I'm trying to build a regex to check if a text input is valid.
The pattern is [NumberBetween1And999]['x'][NumberBetween1And999][','][White space Optional] repeated infinite times.
I need this to make an order from a string: the first number is the product id and the second number is the quantity for the product.
Examples: of good texts:
1x1
2x1,3x1
1x3, 4x1
Should not catch:
1x1,
1,1, 1x1,
9999x1
1x1,99999x1
I'm blocked there: ^(([1-9][0-9]{0,2})x([1-9][0-9]{0,2}),)*$
Thanks for helping me
You can use
^[1-9][0-9]{0,2}x[1-9][0-9]{0,2}(?:,\s*[1-9][0-9]{0,2}x[1-9][0-9]{0,2})*$
The pattern matches:
^ Start of string
[1-9][0-9]{0,2}x[1-9][0-9]{0,2} Match a digit 1-9 and 2 optional digits 0-9, then x and again the digits part
(?: Non capture group to repeat as a whole
,\s* Match a comma and optional whitespace char
[1-9][0-9]{0,2}x[1-9][0-9]{0,2} Match the same pattern as at the beginning
)* Close the non capture group and optionally repeat it to also match a single part without a comma
$ End of string
Regex demo

Regex C# - optional group in the middle

I have such source text with an optional group in the middle:
GH22-O0-TFS-SFSD 00-1-006.19135
GH22-O0-TFS-SFSD 00-1-006.1.19135
Desired value in the first case will be '19135' and in the second '1.19135'.
Regex has to match the whole string and select all characters after first "." - which is my Group 1. I tried to make subgroups and mark Group 3 as optional but it is not working.
Regex:
.*\.0*(([0-9])(\.0*([0-9]+)))
How it should be changed to capture desired values?
This should work for you:
.*?\.(.*)
This will match the whole string and include everything after the first period in capture group 1 regardless of character type.
You can use
^(.*?)\.0*(\d+)(?:\.0*(\d+))?$
See the regex demo. Details:
^ - start of string
(.*?) - Group 1: any zero or more chars other than an LF char as few as possoble (as *? is a lazy quantifier)
\. - a dot
0* - zero or more zeros
(\d+) - Group 2: any one or more digits
(?:\.0*(\d+))? - an optional occurrence of ., zero or more zeros, then Group 3 capturing one or more digits
$ - end of string.
I hope I understand your goals and this should work:
.*?\.([\d.]+)
.*?\. - loosely capture everything leading up to the first period
([\d.]+) - capture the remaining digits and periods into capture group #1
https://regex101.com/r/0t9Ijy/1

Regex - Extract second position digit from string

I have a regex:
var thisMatch = Regex.Match(result, #"(?-s).+(?=[\r\n]+The information appearing in this document)", RegexOptions.IgnoreCase);
This returns the line before "The information appearing in this document" just fine.
The output of my regex is
10 880 $10,000 $800 $25 $10
I need to extract 880, which will always be in second position (the number before 880 could be vary, so \d{0,2} shouldn't be allowed).
How can I grab the second position number?
You can use something like
(?<=^\S+[\p{Zs}\t]+)\d+(?=.*[\r\n]+The information appearing in this document)
See the .NET regex demo. In C#:
var output = Regex.Match(result, #"(?<=^\S+[\p{Zs}\t]+)\d+(?=.*[\r\n]+The information appearing in this document)", RegexOptions.Multiline)?.Value;
Or, you could capture the number and grab it from a group with
^\S+[\p{Zs}\t]+(\d+).*[\r\n]+The information appearing in this document
See this regex demo. In C#:
var output = Regex.Match(result, #"^\S+[\p{Zs}\t]+(\d+).*[\r\n]+The information appearing in this document", RegexOptions.Multiline)?.Groups[1].Value;
Regex details:
(?<= - start of a positive lookbehind that requires its pattern to match immediately to the left of the current location:
^ - start of a line (due to the RegexOptions.Multiline)
\S+ - one or more non-whitespace chars
[\p{Zs}\t]+ - one or more horizontal whitespaces
) - end of the lookbehind
\d+ - one or more digits (use \S+ if you are sure this will always be the non-whitespace char streak)
(?= - start of a positive lookahead that requires its pattern to match immediately to the right of the current location:
.* - the rest of the line (as . does not match an LF char)
[\r\n]+ - one or more CR/LF chars
The information appearing in this document - literal text
) - end of the lookahead.
If you insert
\d+\s(\d+)
this will capture a leading number (\d+), separated by a whitespace (\s) from the number you're looking for ((\d+)), captured in a capture group so you can easily access it.
Check the tab Split List in this online demo

Regex start new match at specific pattern

Hello im kinda new to regex and have a small, maybe simple question.
I have the given text:
17.11.2020 15:32 typical Pat. seems sleeping
Additional test
17.11.2020 15:32 typical Pat. seems sleeping
Additional test
17.11.2020 15:32 typical Pat. seems sleeping
Additional test
My current regex (\d{2}.\d{2}.\d{4}\s\d{2}:\d{2})\s?(.*)
matches only till sleeping but reates 3 matches correctly.
But i need the Additional test text also in the second group.
i tried something like (\d{2}.\d{2}.\d{4}\s\d{2}:\d{2})\s?([,.:\w\s]*) but now i have only one huge match because the second group takes everything until the end.
How can i match everything until a new line with a date starts and create a new match from there on?
If you are sure there is only one additional line to be matched you can use
(?m)^(\d{2}\.\d{2}\.\d{4}\s\d{2}:\d{2})\s*(.*(?:\n.*)?)
See the regex demo. Details:
(?m) - a multiline modifier
^ - start of a line
(\d{2}\.\d{2}\.\d{4}\s\d{2}:\d{2}) - Group 1: a datetime string
\s* - zero or more whitespaces
(.*(?:\n.*)?) - Group 2: any zero or more chars other than a newline char as many as possible and then an optional line, a newline followed with any zero or more chars other than a newline char as many as possible.
If there can be any amount of lines, you may consider
(?m)^(\d{2}\.\d{2}\.\d{4}[\p{Zs}\t]\d{2}:\d{2})[\p{Zs}\t]*(?s)(.*?)(?=\n\d{2}\.\d{2}\.\d{4}|\z)
See this regex demo. Here,
(?m)^(\d{2}\.\d{2}\.\d{4}[\p{Zs}\t]\d{2}:\d{2}) - matches the same as above, just \s is replaced with [\p{Zs}\t] that only matches horizontal whitespace
[\p{Zs}\t]* - 0+ horizontal whitespace chars
(?s) - now, . will match any chars including a newline
(.*?) - Group 2: any zero or more chars, as few as possible
(?=\n\d{2}\.\d{2}\.\d{4}|\z) - up to the leftmost occurrence of a newline, followed with a date string, or up to the end of string.
You are using \s repeatedly using the * quantifier with the character class [,.:\w\s]* and \s also matches newlines and will match too much.
You can just match the rest of the line using (.*\r?\n.*) which would not match a newline, then match a newline and the next line in the same group.
^(\d{2}.\d{2}.\d{4}\s\d{2}:\d{2})\s?(.*\r?\n.*)
Regex demo
If multiple lines can follow, match all following lines that do not start with a date like pattern.
^(\d{2}\.\d{2}\.\d{4})\s*(.*(?:\r?\n(?!\d{2}\.\d{2}\.\d{4}).*)*)
Explanation
^ Start of the string
( Capture group1
\d{2}\.\d{2}\.\d{4} Match a date like pattern
) Close group 1
\s* Match 0+ whitespace chars (Or match whitespace chars without newlines [^\S\r\n]*)
( Capture group 2
.* Match the whole line
(?:\r?\n(?!\d{2}\.\d{2}\.\d{4}).*)* Optionally repeat matching the whole line if it does not start with a date like pattern
) Close group 2
Regex demo

Fail validation if there is a peroid (.) not in the specific format?

I'm scanning a string and a period is allowed but if there is a period it has to be in the following format alphanumber.numeric or numeric.numeric. Here are some possible acceptable formats:
5555.1312
ajfdkd.555
Here is what i have so far:
private const string containsPeroidRegularExpress = #"([a-zA-Z]+\.[0-9]+)|([0-9]+\.[0-9]+)";;
validator.RuleFor(x => x.myString)
.Matches(containsPeroidRegularExpress)
.When(x => x.myString.Contains("."), ApplyConditionTo.CurrentValidator)
When you have an example like this it works fine:
This is my example 1 555.1212
But in this example it does not
This is my example 2 555.1212 .
You can see the extra period at the end of the 2nd example. It should fail validation because the extra peroid is not in the specified format stated above. The 1st example should pass validation. Both pass the validation though.
Your pattern is still capturing exactly what you want, however it doesn't "know" that it needs to keep going.
private const string containsPeroidRegularExpress =
#"^([a-zA-Z]+\.[0-9]+)$|^([0-9]+\.[0-9]+)$";
The $ tells it to check right up until the end of the line (I also added ^ to tell it to start at the beginning for completeness so that ". 555.1212" doesn't pass as well).
I definitely won't say this is the best solution. As others mention, you can definitely simplify it. However regex isn't my forte...
I also noticed you mention that the pattern could be alphanumber.numeric. Your pattern does not allow both alpha and numeric characters mixed in the first part. You could use the following:
private const string containsPeroidRegularExpress =
#"^([a-zA-Z0-9]+\.[0-9]+)$|^([0-9]+\.[0-9]+)$";
You might check that after matching the value, there is no space followed by a dot on the right.
You can shorten the pattern a bit by either matching 1+ digits or 1 chars a-zA-Z, and then match a dot and 1+ digits
(?<!\.[^\S\r\n]+)\b[a-zA-Z0-9]+\.[0-9]+\b(?![^\S\r\n]+\.)
The pattern matches
(?<! Negative lookbehind, assert what is on the left is not
\.[^\S\r\n]+ Match a dot and 1+ whitespace chars without a newline
) Close lookbehind
\b Word boundary
(?: Non capture group
[a-zA-Z]+|[0-9]+ Match either 1+ chars a-zA-Z or 1+ digits
) Close group
\.[0-9]+ Match a dot and 1+ digits 0-9
\b Word boundary
(?! Negative lookahead, assert that on the right is not
[^\S\r\n]+\. Match 1+ whitespaces without newlines followed by a dot
) Close lookahead
Regex demo
If you want to match mixed char a-zA-Z and digits:
(?<!\.[^\S\r\n]+)\b[a-zA-Z0-9]+\.[0-9]+\b(?![^\S\r\n]+\.)
Regex demo

Categories

Resources