Regex match text not proceded by quotation mark (ignore whitespaces)

Regex match text not proceded by quotation mark (ignore whitespaces) - c#

I have following text:
SELECT
U_ArrObjJson(
s."Description", s."DateStart", sp.*
) as "Result"
FROM "Supplier" s
OUTER APPLY(
SELECT
U_ArrObjJson,
'U_ArrObjJson(',
' <- THE PROBLEM IS HERE
U_ArrObjJson(
p."Id", p."Description", p."Price"
) as "Products"
FROM "Products" p
WHERE p."SupplierId" = s."Id"
) sp
What I need to do is find instances of U_ArrObjJson function which are not proceded quotation mark. I end up with following expression:
(?<!\')\bU_ArrObjJson\b[\n\r\s]*[\(]+
The problem is that the last occurence of U_ArrObjJson is proceded by single quotation mark but there are spaces and new lines indicators between quotation mark and instance of name I looking for.
This expression I need to use with dotnet Regex in my method:
var matches = new Regex(#"(?<!\')\bU_ArrObjJson\b[\n\r\s]*[\(]+", RegexOptions.IgnoreCase | RegexOptions.CultureInvariant).Matches(template);
How can I modify my expression to ignore preceded spaces?

Since .NET's regex supports non-fixed width Lookbehinds, you can just add \s* to the Lookbehind:
(?<!\'\s*)\bU_ArrObjJson\s*\(+
Demo.
Notes:
[\n\r\s] can be replaced with just \s here because the latter matches any whitespace character (including EOL). So, \n\r is redundant here.
As indicated by Wiktor Stribiżew in the comments, the second \b is also redundant because the function name will either be followed by a whitespace or a ( character. In both cases, a word boundary is implicitly required.
Unless you actually want to match the function name followed by multiple ( characters, you probably should also remove the + at the end.

Related

Regexp: Match value if condition occurs

I have a string like
Value = ('1 OR 2') OR Value = ('THREE OR FOUR')
and I want to split it by OR (that one is not in quotes).
How can I do it with regexp? It has to match only if I have an even number of quotes before OR.
Is it possible?
I tried use [\w\W]*?'[\w\W]*(\sOR\s) but it works incorrect, it takes only last OR, even if it is inside quotes.

Using [\w\W] can match any character including '
You could make use of lookaround with an infinite quantifier in C# and match optional pairs of single quotes.
If you want all pairs of single quotes in the whole string, you can also assert them to the right.
If you don't want to cross matching newline, you can use [^'\r\n]* instead of [^']*
(?<=^(?:[^']*'[^']*')*[^']*)\bOR\b(?=(?:[^']*'[^']*')*[^']*$)
(?<= Positive lookbehind
^(?:[^']*'[^']*')*[^']* Match optional pairs or single quotes from the start of the string
) Close lookbehind
\bOR\b Match OR between word boundaries
(?= Positive lookahead
(?:[^']*'[^']*')*[^']*$ Match optional pairs of quotes till the end of the string
) Close lookahead
Regex demo

Using a positive lookbehind ensures that OR is only matched if it is preceded by an even number of single quotes (and surrounded by whitespace as in your regex).
(?<=^(?:[^']*'[^']*')*[^']*)\sOR\s

How about trying to match everything that is valid and use Regex.Matches to get all the sub-strings?
var splitRE = new Regex(#"([^'OR]+|O[^R]|'[^']*'|(?<!O)R|(?<=\w)OR|OR(?=\w))+", RegexOptions.Compiled);
var ans = splitRE.Matches(s);
Basically the pattern matches anything not a single-quote, O, or R OR matches O and following not an R OR matches a single-quoted string OR matches an R not preceded by an O OR matches an OR preceded by a word character OR matches an OR followed by a word character.

Regex .net multiple line

I have these examples:
{I18n.get("Testing 123...")}
{I18n.get('Testing 123...')}
{I18n.get(
"Testing 123..."
)}
{I18n.get("Testing 123..."
)}
{I18n.get(
"Testing 123...")}
I want to extract the 'Testing 123...' in .Net using C# Regex. What I did was:
Regex r = new Regex(#"(?:I18n.get\(""(.+?)""\))", RegexOptions.IgnoreCase | RegexOptions.Singleline);
var matches = r.Matches(txt)
.Select(xx=> xx.Groups)
.Select(xx=> xx.Last().Value)
.ToList();
When is single line works perfect, but when is multiple line it fails...
And how would be possible to match in a single Regex when the txt is with double quotes " or when is with single quotes ' ?

You may use
var r = new Regex(#"I18n\.get\(\s*(""|')(.*?)\1\s*\)", RegexOptions.IgnoreCase | RegexOptions.Singleline);
var results = r.Matches(txt).Cast<Match>().Select(x => x.Groups[2].Value).ToList();
See the regex demo.
Details
I18n\.get\( - a literal I18n.get( text
\s* - 0+ whitespaces
("|') - Group 1: " or '
(.*?) - Group 2: any char, 0 or more occurrences, as few as possible
\1 - same value as captured in Group 1
\s* - 0+ whitespaces
\) - a ) char.

I contend it doesn't matter which open/close quotes should be in the regex when you don't intend to actually parse it as a quoted string, right ?
I mean with all the embed escapes etc...
Use what you know as the delimiters, the text literals
I18n.get( stuff here )
You could use a sub validation that there's an inner quote, but since you're
not parsing quotes, it's not going to be strictly valid anyway.
Here, we just use it to trim so it's not matched and be part of element.
Here it is, the whole match is the value you're looking for, toss it into an array.
#"(?s)(?<=I18n\s*\.\s*get\s*\(\s*['""]\s*).*?(?=\s*['""]\s*\))"

What's the regex for literals intermixed with variables containing literals?

I have a custom C# app (.NET 4.7.1) that needs to evaluate various and sundry text strings. As one of many cases, I have the following string in the midst of other text:
OR S:D00Q0600 ) OR
I need to find these precise situations (each string segment will be surrounded by a single space, or be at the beginning or end of a line) in which there is an OR followed by a string containing a :, followed by a ), followed by another OR. The ORs are literal and the : within the string is literal, and the ) is literal -- but the D00Q0600 is variable and will be different every time.
And when that precise situation occurs I need to replace the string with:
OR S:D00Q0600 OR
(Simply remove the ) - from that little snippet only - not the whole string)
So to break it down a little cleaner:
Find an OR (always uppercase)
...followed by a space followed by a string with a :
...followed by a space followed by a )
...followed by a space followed by an OR
When found, remove the ) in that position
Do not remove any other )s which will often exist in the entire string
In many cases, the ) is correct and must remain; only in the case described above should it be removed.
S:D00Q0600 can be of variable length. It could also be (for example) S:D00Q or S:D00Q0600XYZ, etc.
How can I construct the type of C# regex that would solve this?

You can use this regex and do replace with what matches with group 1 and group 2. This ensures that only when this regex matches, the replace occurs.
(OR [A-Z]:[A-Z0-9]+ )\) (OR)
Check here,
https://regex101.com/r/0EZiu6/1/
Edit 1:
Modified your c# code and now this works.
string pattern = #"(OR [A-Z]:[A-Z0-9]+ )\) (OR)";
string substitution = #"$1$2";
string input = #"OR S:D00Q0600 ) OR ok sir how )r u OR S:D11Q06 ) OR i ()am fine OR D:D67Q06S0A23DR ) OR";
RegexOptions options = RegexOptions.Multiline;
Regex regex = new Regex(pattern, options);
string result = regex.Replace(input, substitution);
Console.WriteLine("Before Replace: " + input);
Console.WriteLine("After Replace: " + result);
I have just replaced \1 \2 with $1 $2 and added print statement in last to print the result before and after replace.
Following is the output of this program which is exactly as you desired.
Before Replace: OR S:D00Q0600 ) OR ok sir how )r u OR S:D11Q06 ) OR i ()am fine OR D:D67Q06S0A23DR ) OR
After Replace: OR S:D00Q0600 OR ok sir how )r u OR S:D11Q06 OR i ()am fine OR D:D67Q06S0A23DR OR

For the single example of
OR S:D00Q0600 ) OR
... this regex works:
(\bOR S:........ )\)( OR\b)
with the replacing groups being $1 and $2.
The regex assumes that the length of the middle string will always be seven characters. If you have more/different input data, please update your question with examples where this regex fails.
Explanation
(\bOR S:........ )\)( OR\b)
\b assert position at a word boundary (transition from non-word to word, or from word to non-word)
OR S: matches the characters literally (case sensitive)
. matches any character (except for line terminators)
matches the character literally (case sensitive)
\) matches the character ) literally (case sensitive)
Regex101

C# Regex Match NOT inside self defined tags

I use tags in the form of
[[MyTag]]Some Text[[/MyTag]]
To find these tags within the whole text I use the following expression (this is not related to this question here, but for info):
\[\[(?<key>.*\w)]\](?<keyvalue>.*?)\[\[/\1\]\]
Now I like to match and replace only text (MYSEARCHTEXT) which is NOT inside of these self definied tags.
Example:
[[Tag1]]Here I don't want to replace MYSEARCHTEXT[[/Tag1]]
But here MYSEARCHTEXT (1) should be replaced. And here MYSEARCHTEXT (2) needs to be replaced too.
[[AnotherTag]]Here I don't want to replace MYSEARCHTEXT[[/AnotherTag]]
And here I need to replace MYSEARCHTEXT (3) also.
MYSEARCHTEXT is a word or phrase and needs to be found 3 times in this example.

Maybe this can work? If I understood the problem correctly this will match MYSEARCHTEXT outside of your tags and your matches will be in the groups. This uses the positive lookahead
https://regex101.com/r/C8Kuiz/2
(?:\[\[Tag1.*?\/Tag1\]\])\n?(?:.*)(?=(MYSEARCHTEXT))

I have an idea that can simplify this. Use the following regular expression to match the tagged text:
\[.+?\][^\[\]]*?MYSEARCHTEXT[^\[\]]*?\[.+?\]\]
Then replace the MYSEARCHTEXT within the string preserving the captured groups.

You may use the following solution that uses your pattern version with an added alternative in a Regex.Replace method where a match evaluator is used as the replacement argument:
var pat = #"(?s)(\[\[(\w+)]].*?\[\[/\2]])|MYSEARCHTEXT";
var s = "[[Tag1]]Here I don't want to replace MYSEARCHTEXT[[/Tag1]]\nBut here MYSEARCHTEXT (1) should be replaced. And here MYSEARCHTEXT (2) needs to be replaced too.\n[[AnotherTag]]Here I don't want to replace MYSEARCHTEXT[[/AnotherTag]]\nAnd here I need to replace MYSEARCHTEXT (3) also.";
var res = Regex.Replace(s, pat, m =>
m.Groups[1].Success ? m.Groups[1].Value : "NEW_VALUE");
Console.WriteLine(res);
See the C# demo
Result:
[[Tag1]]Here I don't want to replace MYSEARCHTEXT[[/Tag1]]
But here NEW_VALUE (1) should be replaced. And here NEW_VALUE (2) needs to be replaced too.
[[AnotherTag]]Here I don't want to replace MYSEARCHTEXT[[/AnotherTag]]
And here I need to replace NEW_VALUE (3) also.
Pattern details
(?s) - a RegexOptions.Singleline inline modifier option (a . matches any char now)
(\[\[(\w+)]].*?\[\[/\2]]) - Group 1:
\[\[ - a [[ substring
(\w+) - Group 2: one or more word chars
]] - a ]] substring
.*? - any 0+ chars, as few as possible
\[\[/ - a [[/ substring
\2 - same text as captured into Group 2
]] - a literal ]] substring
| - or
MYSEARCHTEXT - some pattern to replace.
When Group 1 matches (m.Groups[1].Success ?) this value is put back, else the NEW_VALUE is inserted into the resulting string.

The best way is to match both seperately as a positive match.
Then decide which to replace and which to write back based on which
matched. (Someone posted this solution already, so I won't duplicate it)
The alternative is to forego that entirely and qualify the text
in the form of a lookahead after searchtext.
This shows how to do it that way.
var pat = #"(?s)MYSEARCHTEXT(?=(?:(?!\[\[/?\w+\]\]).)*?(?:\[\[\w+\]\]|$))";
var res = Regex.Replace(s, pat, "NEW_VALUE");
Demo: https://ideone.com/KOtNik
Formatted:
(?s) # Dot-all modifier
MYSEARCHTEXT
(?= # Qualify the text with an assertion
(?: # Get non-tag characters
(?! \[\[ /? \w+ \]\] )
.
)*?
(?: # Up to -
\[\[ \w+ \]\] # An open tag
| $ # or, end of string
)
)

Regular expression in .net to exclude a particular word and multiple whitespaces and line breaks

I need a regex for finding a substring like
from xyzTableName with ( index =...
and
from xyzTableName ( index =...
If with keyword is not there then it should return a match and if with exists after FROM keyword and before ( then there should be no match. All the other words between from and ( must be ignored.
I have tried with below expression :
#"\bfrom.*[\s\t\n]+(?<!with)[\s\t\n]([\s\t\n]+index"
And some variants of same. I was able to work it out when there are only normal/single whitespaces. But when I tried with multiple white-spaces and line-breaks, It failed.

Try this pattern: \bfrom\b(?!.+\bwith\b)[^(]+\(\s*index
string input = #"from xyzTableName
with ( index =...";
string pattern = #"\bfrom\b(?!.+\bwith\b)[^(]+\(\s*index";
bool result = Regex.IsMatch(input, pattern,
RegexOptions.Singleline | RegexOptions.IgnoreCase);
The above returns false. Change the input to remove the word "with" and it will return true. By using RegexOptions.Singleline the . metacharacter will match all characters, including newlines (\n).
Pattern breakdown:
\bfrom\b: exactly matches the word "from" and uses word-boundary metacharacters
(?!.+\bwith\b): negative look-ahead to check for "with" and the match will fail if it does
[^(]+: negative character class to match any character that is not an opening parenthesis, at least once.
\(\s*index: match an opening parenthesis (note that it has to be escaped), any whitespace, then the word "index"

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex match text not proceded by quotation mark (ignore whitespaces) - c#

Related

Regexp: Match value if condition occurs

Regex .net multiple line

What's the regex for literals intermixed with variables containing literals?

C# Regex Match NOT inside self defined tags

Regular expression in .net to exclude a particular word and multiple whitespaces and line breaks

Categories

Resources