Regex match through multiple lines

Regex match through multiple lines - c#

For example we have following string:
Something
AnotherThing
Something AnotherThing
If I use RegexOptions.Singleline with pattern Something.+?AnotherThing then I get two matches when I want to match first and second lines only. I want to use something like FirstLine#endofline##startofline#AnotherLine. So i use:
var regex = new Regex(#"Something$^AnotherThing", RegexOptions.Multiline);
but it doesn't work. I know that I can use some hack with Singleline to match first two lines (and not the last one), but the question: Is it even possible to match exact two texts in exact 2 lines without Singleline specifier, with Multiline option only? And why does it behaves like this.

How about:
Something\r?\nAnotherThing
\r? in case the string doesn't come from Windows.
The reason Something$^AnotherThing doesn't work with the RegexOptions.Multiline option, is because ^ and $ match at line breaks, not the line breaks themselves, so the following would work:
new Regex(#"Something$\r?\n^AnotherThing", RegexOptions.Multiline);

Try to match using the carriage return and line break characters, e.g.
Something\r?\nAnotherThing
Basically, carriage returns are causing you trouble (you're not alone). Do you know which OS your text is coming from? If it's from Windows then there will be a \r before the \n, which you need to account for.

Related

How do I select all including sensitive case (regex) in c#?

I have a problem with a regex command,
I have a file with a tons of lines and with a lot of sensitive characters,
this is an Example with all sensitive case 0123456789/*-+.&é"'(-è_çà)=~#{[|`\^#]}²$*ù^%µ£¨¤,;:!?./§<>AZERTYUIOPMLKJHGFDSQWXCVBNazertyuiopmlkjhgfdsqwxcvbn
I tried many regex commands but never get the expected result,
I have to select everything from Example to the end
I tried this command on https://www.regextester.com/ :
\sExample(.*?)+
Image of the result here
And when I tried it in C# the only result I get was : Example
I don't understand why --'

Here's a quick chat about greedy and pessimistic:
Here is test data:
Example word followed by another word and then more
Here are two regex:
Example.*word
Example.*?word
The first is greedy. Regex will match Example then it will take .* which consumes everything all the way to the END of the string and the works backwards spitting a character at a time back out, trying to make the match succeed. It will succeed when Example word followed by another word is matched, the .* having matched word followed by another (and the spaces at either end)
The second is pessimistic; it nibbled forwards along the string one character at a time, trying to match. Regex will match Example then it'll take one more character into the .*? wildcard, then check if it found word - which it did. So pessimistic matching will only find a single space and the full match in pessimistic mode is Example word
Because you say you want the whole string after Example I recommend use of a greedy quantifier so it just immediately takes the whole string that remains and declares a match, rather than nibbling forwards one at a time (slow)
This, then, will match (and capture) everything after Example:
\sExample(.*)
The brackets make a capture group. In c# we can name the group using ?<namehere> at the start of the brackets and then everything that .* matches can be retrieved with:
Regex r = new Regex("\sExample(?<x>.*)");
Match m = r.Match("Exampleblahblah");
Console.WriteLine(m.Groups["x"].Value); //prints: blahblah
Note that if your data contains newlines you should note that . doesn't match a newline, unless you enable RegexOptions.SingleLine when you create the regex

RegEx to find non-existence of white space prefix but not include the character in the match?

So i have the following RegEx for the purpose of finding and adding whitespace:
(\S)(\()
So for a string like "SomeText(Somemoretext)" I want to update this to "SomeText (Somemoretext)" it matches "t(" and so my replace eliminates the "t" from the string which is not good. I also do not know what the character could be, I'm merely trying to find the non-existence of whitespace.
Is there a better expression to use or is there a way to exclude the found character from the match returned so that I can safely replace without catching characters i do not want to replace?
Thanks

I find lookarounds hard to read and would prefer using substitutions in the replacement string instead:
var s = Regex.Replace("test1() test2()", #"(\S)\(", "$1 (");
Debug.Assert(s == "test1 () test2 ()");
$1 inserts the first capture group from the regex into the replacement string which is the non-space character before the opening parenthesis (.

If you need to detect the absence of space before a specific character (such as bracket) after a word, how about the following?
\b(?=[^\s])\(
This will detect words ( [a-zA-z0-9_] that are followed by a bracket, without a space).
(if I got your problem correctly) you can replace the full match with ( and get exactly what you need.
In case you need to look for absence spaces before a symbol (like a bracket) in any kind of text (as in the text may be non-word, such as punctuation) you might want to use the following instead.
^(?:\S*)(\()(?:\S*)$
When using this, your result will be in group 1, instead of just full match (which now contains the whole line, if a line is matched).

Prevent Regex from devouring optional part of the match

I'v searched extensively but I can't find a simple answer to this and my Regex experience is limited. I'd appreciate a simple solution that is explained, please.
I have a very large string and I need to substitute certain words in it as follows:
Example: wherever you find the string "LINK-ABC" make it "LINK_ABC".
I wrote my Regex Match and Replace strings:
#"LINK-ABC", #"LINK_ABC" and it worked.
But there were a couple of things I had not recognized.
There COULD be words in the file like this:
LINK-ABC-DEF LINK-ABC-GHI-JKL ... and so on.
So I get "LINK_ABC-DEF" etc. (which is NOT what I want; this should have remained intact...)
Once I realized the problem it seemed that what I REALLY wanted was to recognize ONLY the word being matched and leave any cases where it was in combination with something else, unchanged. It seemed to me that if I checked for a space or period on the Match word, that should do it, so...
#"LINK-ABC[ |\\.]",#"LINK_ABC"
... and now I have stumbled.
Sample string:
link-xxx link-aaa-sss link-xxx-bbb link-xxx link-xxx.
Match/Replace string:
link-xxx[ |\\.],link_xxx
Result string:
link_xxxlink-aaa-sss link-xxx-bbb link_xxxlink_xxx
The replacements are correct, BUT the trailing comma or period has been "devoured" and so the result string is wrong.
Is there a way that I can match so that if it matches on space, the replacement will have a space and if it matches on a period, the replacement will have a period? I s'pose I could do 2 separate matches but I'd like to increase my understanding of Regex and do it more elegantly if it is possible.

You should be able to achieve the behavior you want with "capture groups"
var matchstring = #"link-xxx([ \.]|$)";
var fixstr = #"link_xxx$1";
The parenthesis around the last part of the matchstring will retain whatever matched inside it, and the $1 in the fixstr will substitute whatever was captured by that group.
I've also modified your punctuation section a little bit, presuming you want to replace a match if it happens to be the last word in the input (by adding the |$). A | inside a character class [] is a literal | character, so I removed it assuming you don't actually expect that in your input.

How to match a whole line with or without '\n' using regex

I want to match all comments in a text file and I use the following regex to match single line comment:
//(.*?)\r?\n
But it could not match the last line if the last line is a single comment line such as:
// test
so, how to write a single regex to match a whole line that with or without '\n' in C#, thanks!

You could write your regex as:
//(.*?)\r?$
The $ sign will match the end of the line.

You could consider this instead:
//(.*)\s*$
This will exclude any trailing whitespace (including newlines) from your capture group. I assume you wouldn't care about capturing trailing spaces.
The limitations of a regex to match comments are important to keep in mind: it will match some non-comment items in code. For example, consider this line:
get_web_page('http://www.foo.com');
If you only want to match comments on their own line, you could do this:
^\s*//(.*)\s*$
If you need to match comments that come after code, as well, the above problem can't be overcome easily with a regex.
Update: I am assuming that your code iterates through the file line by line, the most common case. However, if you have the entire file in a string and are matching the regex against that, you can enable multi-line mode for this to work.

^/s*?//([^\s]*?)/s*?$
a whole line start with '^' and end with '$', and some other white space characters '/s'

You dont need to specify new line because . matches any character except new line.
//(.*)

The last line does not contain a newline at the end, that is why your //(.*?)\r?\n regex fails.
You need to use a non-capturing group with a $ anchor as an alternative:
//(.*?)(?:\r?\n|$)
^^^^^^^^^^
See the .NET regex demo. Results:

Regex match matches one too many characters

I have a need to perform a somewhat strange regular expression replacement. I've just about got it worked out, but not quite.
I need to remove multiple substrings from a string where the substrings to remove are surrounded by square braces [] except where the square braces are followed by two hashtags []##.
For instance, if the original string is:
[phase]This is []a test [I]## of the emergency broadcast system. [28]##[test]xyz
Then the expected output after the regex replace would be:
This is a test [I]## of the emergency broadcast system. [28]##xyz
So far, I've tried a few things, but the closest regex pattern string I've come up with is "\[[^\]]*\][^##]". The problem with this is that it matches one more character than it should. For instance, using the test string above, and doing the regex replace with an empty string, it returns:
his is test [I]## of the emergency broadcast system. [28]##yz
What is the regex pattern string for which I'm searching?

Your problem is that your additional part,
[^##]
will match the next character that is not a # character. You need a negative lookahead:
\[[^\]]*\](?!##)

Replace your "\[[^\]]*\][^##]" with "(\[[^\]]*\])(([^#]{2})?)".
First, your [^##] does not follow the rule "except where the square braces are followed by two hashtags []##". So it has to be changed to two non-sharpsigh chars.
Second, try replacing like this:
var s= Regex.Replace(input, pattern, "$2");

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.