Regex option "Multiline"

Regex option "Multiline" - c#

I have a regex to match date format with comma.
yyyy/mm/dd or yyyy/mm
For example:
2016/09/02,2016/08,2016/09/30
My code:
string data="21535300/11/11\n";
Regex reg = new Regex(#"^(20\d{2}/(0[1-9]|1[012])(/(0[1-9]|[12]\d|30|31))?,?)*$",
RegexOptions.Multiline);
if (!reg.IsMatch(data))
"Error".Dump();
else
"True".Dump();
I use option multiline.
If string data have "\n".
Any character will match this regex.
For example:
string data="test\n"
string data="2100/1/1"
I find option definition in MSDN. It says:
It changes the interpretation of the ^ and $ language elements so that they match the beginning and end of a line, instead of the beginning and end of the input string.
I didn't understand why this problem has happened.
Anyone can explan it?
Thanks.

Your regex can match an empty line that you get once you add a newline at the end of the string. "test\n" contains 2 lines, and the second one gets matched.
See your regex pattern in a free-spacing mode:
^ # Matches the start of a line
( # Start of Group 1
20\d{2}/
(0[1-9]|1[012])
(/
(0[1-9]|[12]\d|30|31)
)?,?
)* # End of group 1 - * quantifier makes it match 0+ times
$ # End of line
If you do not want it to match an empty line, replace the last )* with )+.
An alternative is to use a more unrolled pattern like
^20\d{2}/(0[1-9]|1[012])(/(0[1-9]|[12]\d|3[01]))?(,20\d{2}/(0[1-9]|1[012])(/(0[1-9]|[12]\d|3[01]))?)*$
See the regex demo. Inside the code, it is advisable to use a block and build the pattern dynamically:
string date = #"20\d{2}/(0[1-9]|1[012])(/(0[1-9]|[12]\d|3[01]))?";
Regex reg = new Regex(string.Format("^{0}(,{0})*$", date), RegexOptions.Multiline);
As you can see, the first block (after the start of the line ^ anchor) is obligatory here, and thus an empty line will never get matched.

Related

Regex start new match at specific pattern

Hello im kinda new to regex and have a small, maybe simple question.
I have the given text:
17.11.2020 15:32 typical Pat. seems sleeping
Additional test
17.11.2020 15:32 typical Pat. seems sleeping
Additional test
17.11.2020 15:32 typical Pat. seems sleeping
Additional test
My current regex (\d{2}.\d{2}.\d{4}\s\d{2}:\d{2})\s?(.*)
matches only till sleeping but reates 3 matches correctly.
But i need the Additional test text also in the second group.
i tried something like (\d{2}.\d{2}.\d{4}\s\d{2}:\d{2})\s?([,.:\w\s]*) but now i have only one huge match because the second group takes everything until the end.
How can i match everything until a new line with a date starts and create a new match from there on?

If you are sure there is only one additional line to be matched you can use
(?m)^(\d{2}\.\d{2}\.\d{4}\s\d{2}:\d{2})\s*(.*(?:\n.*)?)
See the regex demo. Details:
(?m) - a multiline modifier
^ - start of a line
(\d{2}\.\d{2}\.\d{4}\s\d{2}:\d{2}) - Group 1: a datetime string
\s* - zero or more whitespaces
(.*(?:\n.*)?) - Group 2: any zero or more chars other than a newline char as many as possible and then an optional line, a newline followed with any zero or more chars other than a newline char as many as possible.
If there can be any amount of lines, you may consider
(?m)^(\d{2}\.\d{2}\.\d{4}[\p{Zs}\t]\d{2}:\d{2})[\p{Zs}\t]*(?s)(.*?)(?=\n\d{2}\.\d{2}\.\d{4}|\z)
See this regex demo. Here,
(?m)^(\d{2}\.\d{2}\.\d{4}[\p{Zs}\t]\d{2}:\d{2}) - matches the same as above, just \s is replaced with [\p{Zs}\t] that only matches horizontal whitespace
[\p{Zs}\t]* - 0+ horizontal whitespace chars
(?s) - now, . will match any chars including a newline
(.*?) - Group 2: any zero or more chars, as few as possible
(?=\n\d{2}\.\d{2}\.\d{4}|\z) - up to the leftmost occurrence of a newline, followed with a date string, or up to the end of string.

You are using \s repeatedly using the * quantifier with the character class [,.:\w\s]* and \s also matches newlines and will match too much.
You can just match the rest of the line using (.*\r?\n.*) which would not match a newline, then match a newline and the next line in the same group.
^(\d{2}.\d{2}.\d{4}\s\d{2}:\d{2})\s?(.*\r?\n.*)
Regex demo
If multiple lines can follow, match all following lines that do not start with a date like pattern.
^(\d{2}\.\d{2}\.\d{4})\s*(.*(?:\r?\n(?!\d{2}\.\d{2}\.\d{4}).*)*)
Explanation
^ Start of the string
( Capture group1
\d{2}\.\d{2}\.\d{4} Match a date like pattern
) Close group 1
\s* Match 0+ whitespace chars (Or match whitespace chars without newlines [^\S\r\n]*)
( Capture group 2
.* Match the whole line
(?:\r?\n(?!\d{2}\.\d{2}\.\d{4}).*)* Optionally repeat matching the whole line if it does not start with a date like pattern
) Close group 2
Regex demo

find terminate word using regular expression

I want to find if word terminate with 's or 'm or 're using regular expression in c#.
if (Regex.IsMatch(word, "/$'s|$'re|$'m/"))
textbox1.text=word;

The /$'s|$'re|$'m/ .NET regex matches 3 alternatives:
/$'s - / at the end of a string after which 's should follow (this will never match as there can be no text after the end of a string)
$'re - end of string and then 're must follow (again, will never match)
$'m/ - end of string with 'm/ to follow (again, will never match).
In a .NET regex, regex delimiters are not used, thus the first and last / are treated as literal chars that the engine tries to match.
The $ anchor signalize the end of a string and using anything after it makes the pattern match no string (well, unless you have a trailing \n after it, but that is an edge case that rarely causes any trouble). Just FYI: to match the very end of string in a .NET regex, use \z.
What you attempted to write was
Regex.IsMatch(word, "'(?:s|re|m)$")
Or, if you put single character alternatives into a single character class:
Regex.IsMatch(word, "'(?:re|[sm])$")
See the regex demo.
Details
' - a single quote
(?: - start of a non-capturing group:
re - the re substring
| - or
[sm] - a character class matching s or m
) - end of the non-capturing group
$ - end of string.

Regex Conditional Values

If I have a string like the following that can have two possible values (although the value JB37 can be variable)
String One\r\nString Two\r\n
String One\r\nJB37\r\n
And I only want to capture the string if the value following String One\r\n does NOT equal String Two\r\n, how would I code that in Regex?
So normally without any condition, this is what I want:
String One\r\n(.+?)\r\n

With regex, you may resort to a negative lookahead:
String One\r\n(?!String Two(?:\r\n|$))(.*?)(?:\r\n|$)
See the regex demo
You may also use [^\r\n] instead of .:
String One\r\n(?!String Two(?:\r\n|$))([^\r\n]*)
If you use RegexOptions.Multiline, you will also be able to use
(?m)String One\r\n(?!String Two\r?$)(.*?)\r?$
See yet another demo.
Details
(?m) - a RegexOptions.Multiline option that makes ^ match start of a line and $ end of line positions
String One\r\n - String One text followed with a CRLF line ending
(?!String Two\r?$) - a negative lookahead that fails the match if immediately to the right of the current location, there is String Two at the end of the line
(.*?) - Capturing group 1: any zero or more chars other than line break chars, as few as possible, up to the leftmost occurrence of
\r?$ - an optional CR and end of the line (note that in a .NET regex, $ matches only in front of LF, not CR, in the multiline mode, thus, \r? is necessary).
C# demo:
var m = Regex.Match(s, #"(?m)String One\r\n(?!String Two\r?$)(.*?)\r?$");
if (m.Success)
{
Console.WriteLine(m.Groups[1].Value);
}
If CR can be missing, add ? after each \r in the pattern.

Can't understand why regex doesn't work with start/end markers of the string

Here is my test regex with options IgnoreCase and Singleline :
^\s*((?<test1>[-]?\d{0,10}.\d{3})(?<test2>\d)?(?<test3>\d)?){1,}$
and input data:
24426990.568 128364695.70706 -1288.460
If I omit ^ (match start of line) and $ (match end of line)
\s*((?<test1>[-]?\d{0,10}.\d{3})(?<test2>\d)?(?<test3>\d)?){1,}
then everything works perfectly.
Why it doesn't work with string start/end markers (^/$)?
Thanks in advance.

The start and end is literally the start and end of the input string when in single line mode.
It only means the start of the line and the end of the line in multiline mode.
Please note that this means the entire input string.
So if you use:
24426990.568 128364695.70706 -1288.460
as your input string, then the beginning is the first white space and the end of the string will be the 0
As your pattern matches exactly one instance of what you are looking for the regex will fail when used with ^ and $. This is because it is looking for one instance of that pattern in the input string, but there are three.
You have two options:
Remove the ^ and $
Change the pattern to match at least one time

.NET's Regex class and newline

Why doesn't .NET regex treat \n as end of line character?
Sample code:
string[] words = new string[] { "ab1", "ab2\n", "ab3\n\n", "ab4\r", "ab5\r\n", "ab6\n\r" };
Regex regex = new Regex("^[a-z0-9]+$");
foreach (var word in words)
{
Console.WriteLine("{0} - {1}", word, regex.IsMatch(word));
}
And this is the response I get:
ab1 - True
ab2
- True
ab3
- False
- False
ab5
- False
ab6
- False
Why does the regex match ab2\n?
Update:
I don't think Multiline is a good solution, that is, I want to validate login to match only specified characters, and it must be single line. If I change the constructor for MultiLine option ab1, ab2, ab3 and ab6 match the expression, ab4 and ab5 don't match it.

If the string ends with a line break the RegexOptions.Multiline will not work. The $ will just ignore the last line break since there is nothing after that.
If you want to match till the very end of the string and ignore any line breaks use \z
Regex regex = new Regex(#"^[a-z0-9]+\z", RegexOptions.Multiline);
This is for both MutliLine and SingleLine, that doesn't matter.

The .NET regex engine does treat \n as end-of-line. And that's a problem if your string has Windows-style \r\n line breaks. With RegexOptions.Multiline turned on $ matches between \r and \n rather than before \r.
$ also matches at the very end of the string just like \z. The difference is that \z can match only at the very end of the string, while $ also matches before a trailing \n. When using RegexOptions.Multiline, $ also matches before any \n.
If you're having trouble with line breaks, a trick is to first to a search-and-replace to replace all \r with nothing to make sure all your lines end with \n only.

From RegexOptions:
Multiline mode. Changes the meaning of ^ and $ so they match at the beginning and end, respectively, of any line, and not just the beginning and end of the entire string.
So basically if you pass a RegexOptions.Multiline to the Regex constructor you are instructing that instance to treat the final $ as a match for newline characters - not simply the end of the string itself.

Use regex options, System.Text.RegularExpressions.RegexOptions:
string[] words = new string[] { "ab1", "ab2\n", "ab3\n\n", "ab4\r", "ab5\r\n", "ab6\n\r" };
Regex regex = new Regex("^[a-z0-9]+$");
foreach (var word in words)
{
Console.WriteLine("{0} - {1}", word,
regex.IsMatch(word,"^[a-z0-9]+$",
System.Text.RegularExpressions.RegexOptions.Singleline |
System.Text.RegularExpressions.RegexOptions.IgnoreCase |
System.Text.RegularExpressions.RegexOptions.IgnorePatternWhitespace));
}

Could be the ususal windows/linux line ending differences. But it's still strange that \n\n gets a false this way... Did you try with the RegexOptions.Multiline flag set?

Just to give more details to Smazy answer. This an extract from:
Regular Expressions Cookbook by Jan Goyvaerts and Steven Levithan. Copyright 2009 Jan Goyvaerts and Steven Levithan, 978-0-596-2068-7
The difference between ‹\Z› and ‹\z›
comes into play when the last
character in your subject text is a
line break. In that case, ‹\Z› can
match at the very end of the subject
text, after the final line break, as
well as immediately before that line
break. The benefit is that you can
search for ‹omega\Z› without having to
worry about stripping off a trailing
line break at the end of your subject
text. When reading a file line by
line, some tools include the line
break at the end of the line, whereas
others don’t; ‹\Z› masks this
difference. ‹\z› matches only at the
very end of the subject text, so it
will not match text if a trailing line
break follows. The anchor ‹$› is
equivalent to ‹\Z›, as long as you do
not turn on the “^ and $ match at line
breaks” option. This option is off by
default for all regex flavors except
Ruby. Ruby does not offer a way to
turn this option off. Just like ‹\Z›,
‹$› matches at the very end of the
subject text, as well as before the
final line break, if any.
Of course, I wouldn't have found it without Smazy answer.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex option "Multiline" - c#

Related

Regex start new match at specific pattern

find terminate word using regular expression

Regex Conditional Values

Can't understand why regex doesn't work with start/end markers of the string

.NET's Regex class and newline

Categories

Resources