Regex .net multiple line - c#

I have these examples:
{I18n.get("Testing 123...")}
{I18n.get('Testing 123...')}
{I18n.get(
"Testing 123..."
)}
{I18n.get("Testing 123..."
)}
{I18n.get(
"Testing 123...")}
I want to extract the 'Testing 123...' in .Net using C# Regex. What I did was:
Regex r = new Regex(#"(?:I18n.get\(""(.+?)""\))", RegexOptions.IgnoreCase | RegexOptions.Singleline);
var matches = r.Matches(txt)
.Select(xx=> xx.Groups)
.Select(xx=> xx.Last().Value)
.ToList();
When is single line works perfect, but when is multiple line it fails...
And how would be possible to match in a single Regex when the txt is with double quotes " or when is with single quotes ' ?

You may use
var r = new Regex(#"I18n\.get\(\s*(""|')(.*?)\1\s*\)", RegexOptions.IgnoreCase | RegexOptions.Singleline);
var results = r.Matches(txt).Cast<Match>().Select(x => x.Groups[2].Value).ToList();
See the regex demo.
Details
I18n\.get\( - a literal I18n.get( text
\s* - 0+ whitespaces
("|') - Group 1: " or '
(.*?) - Group 2: any char, 0 or more occurrences, as few as possible
\1 - same value as captured in Group 1
\s* - 0+ whitespaces
\) - a ) char.

I contend it doesn't matter which open/close quotes should be in the regex when you don't intend to actually parse it as a quoted string, right ?
I mean with all the embed escapes etc...
Use what you know as the delimiters, the text literals
I18n.get( stuff here )
You could use a sub validation that there's an inner quote, but since you're
not parsing quotes, it's not going to be strictly valid anyway.
Here, we just use it to trim so it's not matched and be part of element.
Here it is, the whole match is the value you're looking for, toss it into an array.
#"(?s)(?<=I18n\s*\.\s*get\s*\(\s*['""]\s*).*?(?=\s*['""]\s*\))"

Related

Regex match text not proceded by quotation mark (ignore whitespaces)

I have following text:
SELECT
U_ArrObjJson(
s."Description", s."DateStart", sp.*
) as "Result"
FROM "Supplier" s
OUTER APPLY(
SELECT
U_ArrObjJson,
'U_ArrObjJson(',
' <- THE PROBLEM IS HERE
U_ArrObjJson(
p."Id", p."Description", p."Price"
) as "Products"
FROM "Products" p
WHERE p."SupplierId" = s."Id"
) sp
What I need to do is find instances of U_ArrObjJson function which are not proceded quotation mark. I end up with following expression:
(?<!\')\bU_ArrObjJson\b[\n\r\s]*[\(]+
The problem is that the last occurence of U_ArrObjJson is proceded by single quotation mark but there are spaces and new lines indicators between quotation mark and instance of name I looking for.
This expression I need to use with dotnet Regex in my method:
var matches = new Regex(#"(?<!\')\bU_ArrObjJson\b[\n\r\s]*[\(]+", RegexOptions.IgnoreCase | RegexOptions.CultureInvariant).Matches(template);
How can I modify my expression to ignore preceded spaces?
Since .NET's regex supports non-fixed width Lookbehinds, you can just add \s* to the Lookbehind:
(?<!\'\s*)\bU_ArrObjJson\s*\(+
Demo.
Notes:
[\n\r\s] can be replaced with just \s here because the latter matches any whitespace character (including EOL). So, \n\r is redundant here.
As indicated by Wiktor Stribiżew in the comments, the second \b is also redundant because the function name will either be followed by a whitespace or a ( character. In both cases, a word boundary is implicitly required.
Unless you actually want to match the function name followed by multiple ( characters, you probably should also remove the + at the end.

How to remove a pattern that may or may not exist at the end using regex

I want to capture without including a certain pattern (anything in parenthesis) that may or may not exist at the end of the string. I want to capture everything but the string "(exclude)" in the following 3 examples:
**aaaaaa**
**bbbbbb** (exclude)
**cccccc**
I tried the following regex:
(.+)(?:\(.+\)){0,1}
You may use your matching approach with
^(.+?)(?:\(.*\))?$
See the regex demo. Basically, you need to add anchors to your pattern and use a lazy quantifier with the first dot matching pattern.
Details
^ - start of the string
(.+?) - Group 1: one or more chars other than newline as few as possible (*? allows the regex engine to test the next optional subpattern first, and only expand this one upon no match)
(?:\(.*\))? - an optional sequence of
\( - a ( char
.* - any 0+ chars other than newline as many as possible
\) - a ) char
$ - end of string.
In C#:
var m = Regex.Match(s, #"^(.+?)(?:\(.*\))?$");
var result = string.Empty;
if (m.Success) {
result = m.Groups[1].Value;
}
You may also remove a substring in parentheses at the end of the string if it has no other parentheses inside using
var res = Regex.Replace(s, #"\s*\([^()]*\)\s*$", "");
See another demo. Here, \s*\([^()]*\)\s*$ matches 0+ whitespaces, (, any 0+ chars other than ( and ) ([^()]*) and then 0+ whitespaces at the end of the string.

C# Regex Match NOT inside self defined tags

I use tags in the form of
[[MyTag]]Some Text[[/MyTag]]
To find these tags within the whole text I use the following expression (this is not related to this question here, but for info):
\[\[(?<key>.*\w)]\](?<keyvalue>.*?)\[\[/\1\]\]
Now I like to match and replace only text (MYSEARCHTEXT) which is NOT inside of these self definied tags.
Example:
[[Tag1]]Here I don't want to replace MYSEARCHTEXT[[/Tag1]]
But here MYSEARCHTEXT (1) should be replaced. And here MYSEARCHTEXT (2) needs to be replaced too.
[[AnotherTag]]Here I don't want to replace MYSEARCHTEXT[[/AnotherTag]]
And here I need to replace MYSEARCHTEXT (3) also.
MYSEARCHTEXT is a word or phrase and needs to be found 3 times in this example.
Maybe this can work? If I understood the problem correctly this will match MYSEARCHTEXT outside of your tags and your matches will be in the groups. This uses the positive lookahead
https://regex101.com/r/C8Kuiz/2
(?:\[\[Tag1.*?\/Tag1\]\])\n?(?:.*)(?=(MYSEARCHTEXT))
I have an idea that can simplify this. Use the following regular expression to match the tagged text:
\[.+?\][^\[\]]*?MYSEARCHTEXT[^\[\]]*?\[.+?\]\]
Then replace the MYSEARCHTEXT within the string preserving the captured groups.
You may use the following solution that uses your pattern version with an added alternative in a Regex.Replace method where a match evaluator is used as the replacement argument:
var pat = #"(?s)(\[\[(\w+)]].*?\[\[/\2]])|MYSEARCHTEXT";
var s = "[[Tag1]]Here I don't want to replace MYSEARCHTEXT[[/Tag1]]\nBut here MYSEARCHTEXT (1) should be replaced. And here MYSEARCHTEXT (2) needs to be replaced too.\n[[AnotherTag]]Here I don't want to replace MYSEARCHTEXT[[/AnotherTag]]\nAnd here I need to replace MYSEARCHTEXT (3) also.";
var res = Regex.Replace(s, pat, m =>
m.Groups[1].Success ? m.Groups[1].Value : "NEW_VALUE");
Console.WriteLine(res);
See the C# demo
Result:
[[Tag1]]Here I don't want to replace MYSEARCHTEXT[[/Tag1]]
But here NEW_VALUE (1) should be replaced. And here NEW_VALUE (2) needs to be replaced too.
[[AnotherTag]]Here I don't want to replace MYSEARCHTEXT[[/AnotherTag]]
And here I need to replace NEW_VALUE (3) also.
Pattern details
(?s) - a RegexOptions.Singleline inline modifier option (a . matches any char now)
(\[\[(\w+)]].*?\[\[/\2]]) - Group 1:
\[\[ - a [[ substring
(\w+) - Group 2: one or more word chars
]] - a ]] substring
.*? - any 0+ chars, as few as possible
\[\[/ - a [[/ substring
\2 - same text as captured into Group 2
]] - a literal ]] substring
| - or
MYSEARCHTEXT - some pattern to replace.
When Group 1 matches (m.Groups[1].Success ?) this value is put back, else the NEW_VALUE is inserted into the resulting string.
The best way is to match both seperately as a positive match.
Then decide which to replace and which to write back based on which
matched. (Someone posted this solution already, so I won't duplicate it)
The alternative is to forego that entirely and qualify the text
in the form of a lookahead after searchtext.
This shows how to do it that way.
var pat = #"(?s)MYSEARCHTEXT(?=(?:(?!\[\[/?\w+\]\]).)*?(?:\[\[\w+\]\]|$))";
var res = Regex.Replace(s, pat, "NEW_VALUE");
Demo: https://ideone.com/KOtNik
Formatted:
(?s) # Dot-all modifier
MYSEARCHTEXT
(?= # Qualify the text with an assertion
(?: # Get non-tag characters
(?! \[\[ /? \w+ \]\] )
.
)*?
(?: # Up to -
\[\[ \w+ \]\] # An open tag
| $ # or, end of string
)
)

Regex - how to match multiple properly quoted substrings

I am trying to use a Regex to extract quote-wrapped strings from within a (C#) string which is a comma-separated list of such strings. I need to extract all properly quoted substrings, and ignore those that are missing a quote mark
eg given this string
"animal,dog,cat","ecoli, verification,"streptococcus"
I need to extract "animal,dog,cat" and "streptococcus".
I've tried various regex solutions in this forum but they all seem to find the first substring only, or incorrectly match "ecoli, verification," and ignore "streptococcus"
Is this solvable?
TIA
Try this:
string input = "\"animal,dog,cat\",\"ecoli, verification,\"streptococcus\"";
string pattern = "\"([^\"]+?[^,])\"";
var matches = Regex.Matches(input, pattern);
foreach (Match m in matches)
Console.WriteLine(m.Groups[1].Value);
P.S. But I agree with the commentators: fix the source.
I suggest this:
"(?>[^",]*(?>,[^",]+)*)"
Explanation:
" # Match a starting quote
(?> # Capture in an atomic group to avoid catastrophic backtracking:
[^",]* # - any number of characters except commas or quotes
(?> # - optionally followed by another (atomic) group:
, # - which starts with a comma
[^",]+ # - and contains at least one character besides comma or quotes.
)* # - (as said above, that group is optional but may occur many times)
) # End of the outer atomic group
" # Match a closing quote
Test it live on regex101.com.

Regex for matching C# string literals

I am trying to write a regular expression that will match a string that contains name-value pairs of the form:
<name> = <value>, <name> = <value>, ...
Where <value> is a C# string literal. I already know the s that I need to find via this regular expression. So far I have the following:
regex = new Regex(fieldName + #"\s*=\s*""(.*?)""");
This works well, but it of course fails to match in the case where the string I am trying to match contans a <value> with an escaped quote. I am struggling to work out how to solve this, I think I need a lookahead, but need a few pointers. As an example, I would like to be able to match the value of the 'difficult' named value below:
difficult = "\\\a\b\'\"\0\f \t\v", easy = "one"
I would appreciate a decent explanation with your answers, I want to learn, rather than copy ;-)
Try this to capture the key and value:
(\w+)\s*=\s*(#"(?:[^"]|"")*"|"(?:\\.|[^\\"])*")
As a bonus, it also works on verbatim strings.
C# Examples:https://dotnetfiddle.net/vQP4rn
Here's an annotated version:
string pattern = #"
(\w+)\s*=\s* # key =
( # Capturing group for the string
#"" # verbatim string - match literal at-sign and a quote
(?:
[^""]|"""" # match a non-quote character, or two quotes
)* # zero times or more
"" #literal quote
| #OR - regular string
"" # string literal - opening quote
(?:
\\. # match an escaped character,
|[^\\""] # or a character that isn't a quote or a backslash
)* # a few times
"" # string literal - closing quote
)";
MatchCollection matches = Regex.Matches(s, pattern,
RegexOptions.IgnorePatternWhitespace);
Note that the regular string allows all characters to be escaped, unlike in C#, and allows newlines. It should be easy to correct if you need validation, but it should be file for parsing.
This should match only the string literal part (you can tack on whatever else you want to the beginning/end):
Regex regex = new Regex("\"((\\.)|[^\\\\\"])*\"");
and if you want a pattern which doesn't allow "multi-line" string literals (as C# string literals really are):
Regex regex = new Regex("\"((\\[^\n\r])|[^\\\\\"\n\r])*\"");
You can use this:
#" \s* = \s* (?<!\\)"" (.* ) (?<!\\)"""
It's almost like yours, but instead of using "", I used (?<!\\)"" to match when suffix \ is not present, so it won't match escaped quotes.

Categories

Resources