Regex: Check if there are more than x line breaks - c#

I need to validate a string according to the occurence of line breaks.
The input is okay if there are no more than say 6 line breaks.
The input it not okay if there are more than say 6 line breaks.
Of course between the line breaks can (but does not have to) occur other characters.
I need to solve this solely within the regular expression because I cannot add any additional code.
I tought about something like this:
/^(\r\n|\r|\n){0,6}$/ // not working :[

You can use
Regex.IsMatch(input, #"^.*(?:\n.*){0,6}\z")
Or, if your line endings can be single CR/LF, you should bear in mind that in a .NET regex, . - without the RegexOptions.Singleline option - matches any chars but LF, and matches CR chars, so you will need to use something like
Regex.IsMatch(input, #"^[^\r\n]*(?:(?:\r\n?|\n)[^\r\n]*){0,6}\z")
The regex matches
^ - start of string
.* - any zero or more chars other than line feed (\n) char as many as possible (= a line)
(?:\n.*){0,6} - zero to six consecutive occurrences of an LF char and then any zero or more chars other than an LF char as many as possible
\z - the very end of string.
The second pattern matches
^ - start of string
[^\r\n]* - zero or more chars other than LF and CR as many as possible
(?:(?:\r\n?|\n)[^\r\n]*){0,6} - zero to six occurrences of
(?:\r\n?|\n) - either CRLF, or CR, or LF
[^\r\n]* - zero or more chars other than LF and CR as many as possible
\z - the very end of string.

Related

Repetitive pattern but the last one is different - Regex c#

I have this pattern:
^([a-zA-Z0-9]+ )+$
It is supposed to match sentences like:
sfjgsjsg_sbskdf_dsjkfshfsh
sdfhs_skjhsijdgh_dsnjbkg_sdkfsbk_nasjksdj_nsdjkfs
I don't know the word size nor how many words will be in each line.
The problem is that upper pattern identify only sentences like:
sfjgsjsg_sbskdf_dsjkfshfsh_
sdfhs_skjhsijdgh_dsnjbkg_sdkfsbk_nasjksdj_nsdjkfs_
Being _->(space)
You can use
^[a-zA-Z0-9]+(?: [a-zA-Z0-9]+)*$
Or, if any whitespace is meant:
^[a-zA-Z0-9]+(?:\s[a-zA-Z0-9]+)*$
If there can be only one occurrence of horizontal spaces:
^[a-zA-Z0-9]+(?:[\p{Zs}\t][a-zA-Z0-9]+)*$
and if there can be more than one:
^[a-zA-Z0-9]+(?:[\p{Zs}\t]+[a-zA-Z0-9]+)*$
Note that leading/trailing whitespace support can be added by placing *, [\p{Zs}\t]or \s* next to the ^ (right after it) and $ (right before it) anchors.
Details:
^ - start of string
[a-zA-Z0-9]+ - one or more ASCII alphanumeric chars
- a space ([\p{Zs}\t] is any whitespace other than line break chars, \s matches any whitespaces)
(?: [a-zA-Z0-9]+)* - zero or more repetitions of a space and one or more ASCII alphanumeric chars
$ - end of string.

Regex start new match at specific pattern

Hello im kinda new to regex and have a small, maybe simple question.
I have the given text:
17.11.2020 15:32 typical Pat. seems sleeping
Additional test
17.11.2020 15:32 typical Pat. seems sleeping
Additional test
17.11.2020 15:32 typical Pat. seems sleeping
Additional test
My current regex (\d{2}.\d{2}.\d{4}\s\d{2}:\d{2})\s?(.*)
matches only till sleeping but reates 3 matches correctly.
But i need the Additional test text also in the second group.
i tried something like (\d{2}.\d{2}.\d{4}\s\d{2}:\d{2})\s?([,.:\w\s]*) but now i have only one huge match because the second group takes everything until the end.
How can i match everything until a new line with a date starts and create a new match from there on?
If you are sure there is only one additional line to be matched you can use
(?m)^(\d{2}\.\d{2}\.\d{4}\s\d{2}:\d{2})\s*(.*(?:\n.*)?)
See the regex demo. Details:
(?m) - a multiline modifier
^ - start of a line
(\d{2}\.\d{2}\.\d{4}\s\d{2}:\d{2}) - Group 1: a datetime string
\s* - zero or more whitespaces
(.*(?:\n.*)?) - Group 2: any zero or more chars other than a newline char as many as possible and then an optional line, a newline followed with any zero or more chars other than a newline char as many as possible.
If there can be any amount of lines, you may consider
(?m)^(\d{2}\.\d{2}\.\d{4}[\p{Zs}\t]\d{2}:\d{2})[\p{Zs}\t]*(?s)(.*?)(?=\n\d{2}\.\d{2}\.\d{4}|\z)
See this regex demo. Here,
(?m)^(\d{2}\.\d{2}\.\d{4}[\p{Zs}\t]\d{2}:\d{2}) - matches the same as above, just \s is replaced with [\p{Zs}\t] that only matches horizontal whitespace
[\p{Zs}\t]* - 0+ horizontal whitespace chars
(?s) - now, . will match any chars including a newline
(.*?) - Group 2: any zero or more chars, as few as possible
(?=\n\d{2}\.\d{2}\.\d{4}|\z) - up to the leftmost occurrence of a newline, followed with a date string, or up to the end of string.
You are using \s repeatedly using the * quantifier with the character class [,.:\w\s]* and \s also matches newlines and will match too much.
You can just match the rest of the line using (.*\r?\n.*) which would not match a newline, then match a newline and the next line in the same group.
^(\d{2}.\d{2}.\d{4}\s\d{2}:\d{2})\s?(.*\r?\n.*)
Regex demo
If multiple lines can follow, match all following lines that do not start with a date like pattern.
^(\d{2}\.\d{2}\.\d{4})\s*(.*(?:\r?\n(?!\d{2}\.\d{2}\.\d{4}).*)*)
Explanation
^ Start of the string
( Capture group1
\d{2}\.\d{2}\.\d{4} Match a date like pattern
) Close group 1
\s* Match 0+ whitespace chars (Or match whitespace chars without newlines [^\S\r\n]*)
( Capture group 2
.* Match the whole line
(?:\r?\n(?!\d{2}\.\d{2}\.\d{4}).*)* Optionally repeat matching the whole line if it does not start with a date like pattern
) Close group 2
Regex demo

Get string after the last comma or the last number using Regex in C#

How can I get the string after the last comma or last number using regex for this examples:
"Flat 1, Asker Horse Sports", -- get string after "," result: "Asker
Horse Sports"
"9 Walkers Barn" -- get string after "9" result:
Walkers Barn
I need that regex to support both cases or to different regex rules, each / case.
I tried /,[^,]*$/ and (.*),[^,]*$ to get the strings after the last comma but no luck.
You can use
[^,\d\s][^,\d]*$
See the regex demo (and a .NET regex demo).
Details
[^,\d\s] - any char but a comma, digit and whitespace
[^,\d]* - any char but a comma and digit
$ - end of string.
In C#, you may also tell the regex engine to search for the match from the end of the string with the RegexOptions.RightToLeft option (to make regex matching more efficient. although it might not be necessary in this case if the input strings are short):
var output = Regex.Match(text, #"[^,\d\s][^,\d]*$", RegexOptions.RightToLeft)?.Value;
You were on the right track the capture group in (.*),[^,]*$, but the group should be the part that you are looking for.
If there has to be a comma or digit present, you could match until the last occurrence of either of them, and capture what follows in the capturing group.
^.*[\d,]\s*(.+)$
^ Start of string
.* Match any char except a newline 0+ times
[\d,] Match either , or a digit
\s* Match 0+ whitespace chars
(.+) Capture group 1, match any char except a newline 1+ times
$ End of string
.NET regex demo | C# demo

Regex Conditional Values

If I have a string like the following that can have two possible values (although the value JB37 can be variable)
String One\r\nString Two\r\n
String One\r\nJB37\r\n
And I only want to capture the string if the value following String One\r\n does NOT equal String Two\r\n, how would I code that in Regex?
So normally without any condition, this is what I want:
String One\r\n(.+?)\r\n
With regex, you may resort to a negative lookahead:
String One\r\n(?!String Two(?:\r\n|$))(.*?)(?:\r\n|$)
See the regex demo
You may also use [^\r\n] instead of .:
String One\r\n(?!String Two(?:\r\n|$))([^\r\n]*)
If you use RegexOptions.Multiline, you will also be able to use
(?m)String One\r\n(?!String Two\r?$)(.*?)\r?$
See yet another demo.
Details
(?m) - a RegexOptions.Multiline option that makes ^ match start of a line and $ end of line positions
String One\r\n - String One text followed with a CRLF line ending
(?!String Two\r?$) - a negative lookahead that fails the match if immediately to the right of the current location, there is String Two at the end of the line
(.*?) - Capturing group 1: any zero or more chars other than line break chars, as few as possible, up to the leftmost occurrence of
\r?$ - an optional CR and end of the line (note that in a .NET regex, $ matches only in front of LF, not CR, in the multiline mode, thus, \r? is necessary).
C# demo:
var m = Regex.Match(s, #"(?m)String One\r\n(?!String Two\r?$)(.*?)\r?$");
if (m.Success)
{
Console.WriteLine(m.Groups[1].Value);
}
If CR can be missing, add ? after each \r in the pattern.

Regex for special case

I need to create a regex expression for the following scenario.
It can have only numbers and only one dot or comma.
First part can have one to three digits.
The second part can be a dot or a comma.
The third part can have one to two digits.
The valid scenarios are
123,12
123.12
123,1
123
12,12
12.12
1,12
1.12
1,1
1.1
1
I came up so far with this expression
\d{1,3}(?:[.,]\d{1,2})?
but it doesn't work well. For example the input is 11:11 is marked as valid.
You need to put anchors around your expression:
^\d{1,3}(?:[.,]\d{1,2})?$
^ will match the start of the string
$ will match the end of the string
If those anchors are missing, it will partially match on your string, since the last part is optional, means on "11:11" it can match on the digits before the colon and a second match will be on the digits after the colon.
Try to use ^ and $:
^\d{1,3}(?:[.,]\d{1,2})?$
^ The match must start at the beginning of the string or line.
$ The match must occur at the end of the string or before \n at the end of the line or string.

Categories

Resources