Regex start new match at specific pattern

Regex start new match at specific pattern - c#

Hello im kinda new to regex and have a small, maybe simple question.
I have the given text:
17.11.2020 15:32 typical Pat. seems sleeping
Additional test
17.11.2020 15:32 typical Pat. seems sleeping
Additional test
17.11.2020 15:32 typical Pat. seems sleeping
Additional test
My current regex (\d{2}.\d{2}.\d{4}\s\d{2}:\d{2})\s?(.*)
matches only till sleeping but reates 3 matches correctly.
But i need the Additional test text also in the second group.
i tried something like (\d{2}.\d{2}.\d{4}\s\d{2}:\d{2})\s?([,.:\w\s]*) but now i have only one huge match because the second group takes everything until the end.
How can i match everything until a new line with a date starts and create a new match from there on?

If you are sure there is only one additional line to be matched you can use
(?m)^(\d{2}\.\d{2}\.\d{4}\s\d{2}:\d{2})\s*(.*(?:\n.*)?)
See the regex demo. Details:
(?m) - a multiline modifier
^ - start of a line
(\d{2}\.\d{2}\.\d{4}\s\d{2}:\d{2}) - Group 1: a datetime string
\s* - zero or more whitespaces
(.*(?:\n.*)?) - Group 2: any zero or more chars other than a newline char as many as possible and then an optional line, a newline followed with any zero or more chars other than a newline char as many as possible.
If there can be any amount of lines, you may consider
(?m)^(\d{2}\.\d{2}\.\d{4}[\p{Zs}\t]\d{2}:\d{2})[\p{Zs}\t]*(?s)(.*?)(?=\n\d{2}\.\d{2}\.\d{4}|\z)
See this regex demo. Here,
(?m)^(\d{2}\.\d{2}\.\d{4}[\p{Zs}\t]\d{2}:\d{2}) - matches the same as above, just \s is replaced with [\p{Zs}\t] that only matches horizontal whitespace
[\p{Zs}\t]* - 0+ horizontal whitespace chars
(?s) - now, . will match any chars including a newline
(.*?) - Group 2: any zero or more chars, as few as possible
(?=\n\d{2}\.\d{2}\.\d{4}|\z) - up to the leftmost occurrence of a newline, followed with a date string, or up to the end of string.

You are using \s repeatedly using the * quantifier with the character class [,.:\w\s]* and \s also matches newlines and will match too much.
You can just match the rest of the line using (.*\r?\n.*) which would not match a newline, then match a newline and the next line in the same group.
^(\d{2}.\d{2}.\d{4}\s\d{2}:\d{2})\s?(.*\r?\n.*)
Regex demo
If multiple lines can follow, match all following lines that do not start with a date like pattern.
^(\d{2}\.\d{2}\.\d{4})\s*(.*(?:\r?\n(?!\d{2}\.\d{2}\.\d{4}).*)*)
Explanation
^ Start of the string
( Capture group1
\d{2}\.\d{2}\.\d{4} Match a date like pattern
) Close group 1
\s* Match 0+ whitespace chars (Or match whitespace chars without newlines [^\S\r\n]*)
( Capture group 2
.* Match the whole line
(?:\r?\n(?!\d{2}\.\d{2}\.\d{4}).*)* Optionally repeat matching the whole line if it does not start with a date like pattern
) Close group 2
Regex demo

Related

Regex C# - optional group in the middle

I have such source text with an optional group in the middle:
GH22-O0-TFS-SFSD 00-1-006.19135
GH22-O0-TFS-SFSD 00-1-006.1.19135
Desired value in the first case will be '19135' and in the second '1.19135'.
Regex has to match the whole string and select all characters after first "." - which is my Group 1. I tried to make subgroups and mark Group 3 as optional but it is not working.
Regex:
.*\.0*(([0-9])(\.0*([0-9]+)))
How it should be changed to capture desired values?

This should work for you:
.*?\.(.*)
This will match the whole string and include everything after the first period in capture group 1 regardless of character type.

You can use
^(.*?)\.0*(\d+)(?:\.0*(\d+))?$
See the regex demo. Details:
^ - start of string
(.*?) - Group 1: any zero or more chars other than an LF char as few as possoble (as *? is a lazy quantifier)
\. - a dot
0* - zero or more zeros
(\d+) - Group 2: any one or more digits
(?:\.0*(\d+))? - an optional occurrence of ., zero or more zeros, then Group 3 capturing one or more digits
$ - end of string.

I hope I understand your goals and this should work:
.*?\.([\d.]+)
.*?\. - loosely capture everything leading up to the first period
([\d.]+) - capture the remaining digits and periods into capture group #1
https://regex101.com/r/0t9Ijy/1

Regex - Extract second position digit from string

I have a regex:
var thisMatch = Regex.Match(result, #"(?-s).+(?=[\r\n]+The information appearing in this document)", RegexOptions.IgnoreCase);
This returns the line before "The information appearing in this document" just fine.
The output of my regex is
10 880 $10,000 $800 $25 $10
I need to extract 880, which will always be in second position (the number before 880 could be vary, so \d{0,2} shouldn't be allowed).
How can I grab the second position number?

You can use something like
(?<=^\S+[\p{Zs}\t]+)\d+(?=.*[\r\n]+The information appearing in this document)
See the .NET regex demo. In C#:
var output = Regex.Match(result, #"(?<=^\S+[\p{Zs}\t]+)\d+(?=.*[\r\n]+The information appearing in this document)", RegexOptions.Multiline)?.Value;
Or, you could capture the number and grab it from a group with
^\S+[\p{Zs}\t]+(\d+).*[\r\n]+The information appearing in this document
See this regex demo. In C#:
var output = Regex.Match(result, #"^\S+[\p{Zs}\t]+(\d+).*[\r\n]+The information appearing in this document", RegexOptions.Multiline)?.Groups[1].Value;
Regex details:
(?<= - start of a positive lookbehind that requires its pattern to match immediately to the left of the current location:
^ - start of a line (due to the RegexOptions.Multiline)
\S+ - one or more non-whitespace chars
[\p{Zs}\t]+ - one or more horizontal whitespaces
) - end of the lookbehind
\d+ - one or more digits (use \S+ if you are sure this will always be the non-whitespace char streak)
(?= - start of a positive lookahead that requires its pattern to match immediately to the right of the current location:
.* - the rest of the line (as . does not match an LF char)
[\r\n]+ - one or more CR/LF chars
The information appearing in this document - literal text
) - end of the lookahead.

If you insert
\d+\s(\d+)
this will capture a leading number (\d+), separated by a whitespace (\s) from the number you're looking for ((\d+)), captured in a capture group so you can easily access it.
Check the tab Split List in this online demo

Fail validation if there is a peroid (.) not in the specific format?

I'm scanning a string and a period is allowed but if there is a period it has to be in the following format alphanumber.numeric or numeric.numeric. Here are some possible acceptable formats:
5555.1312
ajfdkd.555
Here is what i have so far:
private const string containsPeroidRegularExpress = #"([a-zA-Z]+\.[0-9]+)|([0-9]+\.[0-9]+)";;
validator.RuleFor(x => x.myString)
.Matches(containsPeroidRegularExpress)
.When(x => x.myString.Contains("."), ApplyConditionTo.CurrentValidator)
When you have an example like this it works fine:
This is my example 1 555.1212
But in this example it does not
This is my example 2 555.1212 .
You can see the extra period at the end of the 2nd example. It should fail validation because the extra peroid is not in the specified format stated above. The 1st example should pass validation. Both pass the validation though.

Your pattern is still capturing exactly what you want, however it doesn't "know" that it needs to keep going.
private const string containsPeroidRegularExpress =
#"^([a-zA-Z]+\.[0-9]+)$|^([0-9]+\.[0-9]+)$";
The $ tells it to check right up until the end of the line (I also added ^ to tell it to start at the beginning for completeness so that ". 555.1212" doesn't pass as well).
I definitely won't say this is the best solution. As others mention, you can definitely simplify it. However regex isn't my forte...
I also noticed you mention that the pattern could be alphanumber.numeric. Your pattern does not allow both alpha and numeric characters mixed in the first part. You could use the following:
private const string containsPeroidRegularExpress =
#"^([a-zA-Z0-9]+\.[0-9]+)$|^([0-9]+\.[0-9]+)$";

You might check that after matching the value, there is no space followed by a dot on the right.
You can shorten the pattern a bit by either matching 1+ digits or 1 chars a-zA-Z, and then match a dot and 1+ digits
(?<!\.[^\S\r\n]+)\b[a-zA-Z0-9]+\.[0-9]+\b(?![^\S\r\n]+\.)
The pattern matches
(?<! Negative lookbehind, assert what is on the left is not
\.[^\S\r\n]+ Match a dot and 1+ whitespace chars without a newline
) Close lookbehind
\b Word boundary
(?: Non capture group
[a-zA-Z]+|[0-9]+ Match either 1+ chars a-zA-Z or 1+ digits
) Close group
\.[0-9]+ Match a dot and 1+ digits 0-9
\b Word boundary
(?! Negative lookahead, assert that on the right is not
[^\S\r\n]+\. Match 1+ whitespaces without newlines followed by a dot
) Close lookahead
Regex demo
If you want to match mixed char a-zA-Z and digits:
(?<!\.[^\S\r\n]+)\b[a-zA-Z0-9]+\.[0-9]+\b(?![^\S\r\n]+\.)
Regex demo

Trying to capture a decimal number out of a string

In a text file, I'm looking for a part of a document that contains a piece like min ISO 1133 0.2-0.35. What I want to capture is the ranged decimal part of that piece of text (0.2-0.35). Since there are other ranged decimal numbers, I cannot simply use a regular expression to look only for the ranged part. Till now, I could make min.*(\d+)((?:\.)?)(\d*)-(\d+)((?:\.)?)(\d*) but the result is not correct and I'm stuck. Can anyone please help me with this?
Below, you can see the final result (yellow part):

Maybe the following would work for you?
\bmin\s.*?(\d+(?:\.\d+)?)-(\d+(?:\.\d+)?)
See the online demo
The answer is currently based on the assumption (looking at your current attempt) you'd want these ranges in seperate groups. However, if not, this answer can be swiftly transformed to capture the whole substring (or see #TheFourthBird's answer).
\b - Match word boundary.
min - Literally match 'min'.
\s - Match a whitespace character.
.*? - Match any character other than newline up to (lazy):
( - Open 1st capture group
\d+ - At least a single digit.
(?: - Open non-capturing group.
\.\d+ - Match a literal dot and at least a single digit.
)? - Close non-capturing group and make it optional.
) - Close 1st capture group.
- Match a literal hyphen.
( - Open 2nd capture group
\d+ - At least a single digit.
(?: - Open non-capturing group.
\.\d+ - Match a literal dot and at least a single digit.
)? - Close non-capturing group and make it optional.
) - Close 2nd capture group.

You could get the decimal part matching 1+ digit in the optional part and making the quantifier non greedy. The value is in capture group 1.
\bmin [A-Z]+ [0-9]+ ([0-9]+(?:\.[0-9]+)?-[0-9](?:\.[0-9]+)?)\b
Regex demo
Or a bit more specific pattern
\bmin [A-Z]+ [0-9]+ ([0-9]+(?:\.[0-9]+)?-[0-9]+(?:\.[0-9]+)?)\b
Regex demo

Regex matching recurring patterns

I want to validate input in a C# TextBox by using regular expressions. The expected input is in this format:
CCCCC-CCCCC-CCCCC-CCCCC-CCCCC-CCCCC-C
So I've got six elements of five separated characters and one separated character at the end.
Now my regex matches any character between five and 255 chars: .{5,255}
How do I need to modify it in order to match the format mentioned above?

Update: -
If you want to match any character, then you can use: -
^(?:[a-zA-Z0-9]{5}-){6}[a-zA-Z0-9]$
Explanation: -
(?: // Non-capturing group
[a-zA-Z0-9]{5} // Match any character or digit of length 5
- // Followed by a `-`
){6} // Match the pattern 6 times (ABCD4-) -> 6 times
[a-zA-Z0-9] // At the end match any character or digit.
Note: - The below regex will only match pattern like you posted: -
CCCCC-CCCCC-CCCCC-CCCCC-CCCCC-CCCCC-C
You can try this regex: -
^(?:([a-zA-Z0-9])\1{4}-){6}\1$
Explanation: -
(?: // Non-capturing group
( // First capture group
[a-zA-Z0-9] // Match any character or digit, and capture in group 1
)
\1{4} // Match the same character as in group 1 - 4 times
- // Followed by a `-`
){6} // Match the pattern 6 times (CCCCC-) -> 6 times
\1 // At the end match a single character.

Untested, but I think this will work:
([A-Za-z0-9]{5}-){6}[A-Za-z0-9]

For your example, in general replace C to the character class you want:
^(C{5}-){6}C$
^([a-z]{5}-){6}[a-z]$ # Just letter, use case insensitive modifier
^([a-z0-9]{5}-){6}[a-z0-9]$ # Letters and digits..

Try this:
^(C{5}-){6}C$
The ^ and $ denote the begiining and end of the string repectively and make sure that no additional characters are entered.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex start new match at specific pattern - c#

Related

Regex C# - optional group in the middle

Regex - Extract second position digit from string

Fail validation if there is a peroid (.) not in the specific format?

Trying to capture a decimal number out of a string

Regex matching recurring patterns

Categories

Resources