Regexp: Match value if condition occurs - c#

I have a string like
Value = ('1 OR 2') OR Value = ('THREE OR FOUR')
and I want to split it by OR (that one is not in quotes).
How can I do it with regexp? It has to match only if I have an even number of quotes before OR.
Is it possible?
I tried use [\w\W]*?'[\w\W]*(\sOR\s) but it works incorrect, it takes only last OR, even if it is inside quotes.

Using [\w\W] can match any character including '
You could make use of lookaround with an infinite quantifier in C# and match optional pairs of single quotes.
If you want all pairs of single quotes in the whole string, you can also assert them to the right.
If you don't want to cross matching newline, you can use [^'\r\n]* instead of [^']*
(?<=^(?:[^']*'[^']*')*[^']*)\bOR\b(?=(?:[^']*'[^']*')*[^']*$)
(?<= Positive lookbehind
^(?:[^']*'[^']*')*[^']* Match optional pairs or single quotes from the start of the string
) Close lookbehind
\bOR\b Match OR between word boundaries
(?= Positive lookahead
(?:[^']*'[^']*')*[^']*$ Match optional pairs of quotes till the end of the string
) Close lookahead
Regex demo

Using a positive lookbehind ensures that OR is only matched if it is preceded by an even number of single quotes (and surrounded by whitespace as in your regex).
(?<=^(?:[^']*'[^']*')*[^']*)\sOR\s

How about trying to match everything that is valid and use Regex.Matches to get all the sub-strings?
var splitRE = new Regex(#"([^'OR]+|O[^R]|'[^']*'|(?<!O)R|(?<=\w)OR|OR(?=\w))+", RegexOptions.Compiled);
var ans = splitRE.Matches(s);
Basically the pattern matches anything not a single-quote, O, or R OR matches O and following not an R OR matches a single-quoted string OR matches an R not preceded by an O OR matches an OR preceded by a word character OR matches an OR followed by a word character.

Related

Regex Expression for Finding Words Surrounded By {{ }}

I have a string that has variables inserted within them. They are surround by double curly braces, i.e. {{VARIABLE}}.
What Regex expression could be use to return the variable names within the double curly braces?
You can use lookahead and lookbehind assertions to match text that comes after and before certain patterns. You can also use a negative character class to match characters that aren't }, so that your matched string isn't too greedy.
(?<=\{\{)[^}]+(?=\}\})
You can see this pattern in action here
You could also use a capturing group:
\{\{(.+?)}}
Regex demo
If there can not be anything before or after the placeholder and the placeholder itself can contain a { or } you might use:
(?<!\S)\{\{(.+?)}}(?!\S)
Explanation
(?<!\S) Assert what is on the left is not a non whitespace char
\{\{ Match {{
(.+?) Capture in group 1 matching any char 1+ times non greedy
}} Match literally
(?!\S) Assert what is on the right is not a non whitespace char
Regex demo

How to remove a pattern that may or may not exist at the end using regex

I want to capture without including a certain pattern (anything in parenthesis) that may or may not exist at the end of the string. I want to capture everything but the string "(exclude)" in the following 3 examples:
**aaaaaa**
**bbbbbb** (exclude)
**cccccc**
I tried the following regex:
(.+)(?:\(.+\)){0,1}
You may use your matching approach with
^(.+?)(?:\(.*\))?$
See the regex demo. Basically, you need to add anchors to your pattern and use a lazy quantifier with the first dot matching pattern.
Details
^ - start of the string
(.+?) - Group 1: one or more chars other than newline as few as possible (*? allows the regex engine to test the next optional subpattern first, and only expand this one upon no match)
(?:\(.*\))? - an optional sequence of
\( - a ( char
.* - any 0+ chars other than newline as many as possible
\) - a ) char
$ - end of string.
In C#:
var m = Regex.Match(s, #"^(.+?)(?:\(.*\))?$");
var result = string.Empty;
if (m.Success) {
result = m.Groups[1].Value;
}
You may also remove a substring in parentheses at the end of the string if it has no other parentheses inside using
var res = Regex.Replace(s, #"\s*\([^()]*\)\s*$", "");
See another demo. Here, \s*\([^()]*\)\s*$ matches 0+ whitespaces, (, any 0+ chars other than ( and ) ([^()]*) and then 0+ whitespaces at the end of the string.

Parsing text between quotes with .NET regular expressions

I have the following input text:
#"This is some text #foo=bar #name=""John \""The Anonymous One\"" Doe"" #age=38"
I would like to parse the values with the #name=value syntax as name/value pairs. Parsing the previous string should result in the following named captures:
name:"foo"
value:"bar"
name:"name"
value:"John \""The Anonymous One\"" Doe"
name:"age"
value:"38"
I tried the following regex, which got me almost there:
#"(?:(?<=\s)|^)#(?<name>\w+[A-Za-z0-9_-]+?)\s*=\s*(?<value>[A-Za-z0-9_-]+|(?="").+?(?=(?<!\\)""))"
The primary issue is that it captures the opening quote in "John \""The Anonymous One\"" Doe". I feel like this should be a lookbehind instead of a lookahead, but that doesn't seem to work at all.
Here are some rules for the expression:
Name must start with a letter and can contain any letter, number, underscore, or hyphen.
Unquoted must have at least one character and can contain any letter, number, underscore, or hyphen.
Quoted value can contain any character including any whitespace and escaped quotes.
Edit:
Here's the result from regex101.com:
(?:(?<=\s)|^)#(?<name>\w+[A-Za-z0-9_-]+?)\s*=\s*(?<value>(?<!")[A-Za-z0-9_-]+|(?=").+?(?=(?<!\\)"))
(?:(?<=\s)|^) Non-capturing group
# matches the character # literally
(?<name>\w+[A-Za-z0-9_-]+?) Named capturing group name
\s* match any white space character [\r\n\t\f ]
= matches the character = literally
\s* match any white space character [\r\n\t\f ]
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
(?<value>(?<!")[A-Za-z0-9_-]+|(?=").+?(?=(?<!\\)")) Named capturing group value
1st Alternative: [A-Za-z0-9_-]+
[A-Za-z0-9_-]+ match a single character present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
A-Z a single character in the range between A and Z (case sensitive)
a-z a single character in the range between a and z (case sensitive)
0-9 a single character in the range between 0 and 9
_- a single character in the list _- literally
2nd Alternative: (?=").+?(?=(?<!\\)")
(?=") Positive Lookahead - Assert that the regex below can be matched
" matches the characters " literally
.+? matches any character (except newline)
Quantifier: +? Between one and unlimited times, as few times as possible, expanding as needed [lazy]
(?=(?<!\\)") Positive Lookahead - Assert that the regex below can be matched
(?<!\\) Negative Lookbehind - Assert that it is impossible to match the regex below
\\ matches the character \ literally
" matches the characters " literally
You can use a very useful .NET regex feature where multiple same-named captures are allowed. Also, there is an issue with your (?<name>) capture group: it allows a digit in the first position, which does not meet your 1st requirement.
So, I suggest:
(?si)(?:(?<=\s)|^)#(?<name>\w+[a-z0-9_-]+?)\s*=\s*(?:(?<value>[a-z0-9_-]+)|(?:"")?(?<value>.+?)(?=(?<!\\)""))
See demo
Note that you cannot debug .NET-specific regexes at regex101.com, you need to test them in .NET-compliant environment.
Use string methods.
Split
string myLongString = ""#"This is some text #foo=bar #name=""John \""The Anonymous One\"" Doe"" #age=38"
string[] nameValues = myLongString.Split('#');
From there either use Split function with "=" or use IndexOf("=").

Regular expression in .net to exclude a particular word and multiple whitespaces and line breaks

I need a regex for finding a substring like
from xyzTableName with ( index =...
and
from xyzTableName ( index =...
If with keyword is not there then it should return a match and if with exists after FROM keyword and before ( then there should be no match. All the other words between from and ( must be ignored.
I have tried with below expression :
#"\bfrom.*[\s\t\n]+(?<!with)[\s\t\n]([\s\t\n]+index"
And some variants of same. I was able to work it out when there are only normal/single whitespaces. But when I tried with multiple white-spaces and line-breaks, It failed.
Try this pattern: \bfrom\b(?!.+\bwith\b)[^(]+\(\s*index
string input = #"from xyzTableName
with ( index =...";
string pattern = #"\bfrom\b(?!.+\bwith\b)[^(]+\(\s*index";
bool result = Regex.IsMatch(input, pattern,
RegexOptions.Singleline | RegexOptions.IgnoreCase);
The above returns false. Change the input to remove the word "with" and it will return true. By using RegexOptions.Singleline the . metacharacter will match all characters, including newlines (\n).
Pattern breakdown:
\bfrom\b: exactly matches the word "from" and uses word-boundary metacharacters
(?!.+\bwith\b): negative look-ahead to check for "with" and the match will fail if it does
[^(]+: negative character class to match any character that is not an opening parenthesis, at least once.
\(\s*index: match an opening parenthesis (note that it has to be escaped), any whitespace, then the word "index"

Using Regular Expression Match a String that contains numbers letters and dashes

I need to match this string 011Q-0SH3-936729 but not 345376346 or asfsdfgsfsdf
It has to contain characters AND numbers AND dashes
Pattern could be 011Q-0SH3-936729 or 011Q-0SH3-936729-SDF3 or 000-222-AAAA or 011Q-0SH3-936729-011Q-0SH3-936729-011Q-0SH3-936729-011Q-0SH3-936729 and I want it to be able to match anyone of those. Reason for this is that I don't really know if the format is fixed and I have no way of finding out either so I need to come up with a generic solution for a pattern with any number of dashes and the pattern recurring any number of times.
Sorry this is probably a stupid question, but I really suck at Regular expressions.
TIA
foundMatch = Regex.IsMatch(subjectString,
#"^ # Start of the string
(?=.*\p{L}) # Assert that there is at least one letter
(?=.*\p{N}) # and at least one digit
(?=.*-) # and at least one dash.
[\p{L}\p{N}-]* # Match a string of letters, digits and dashes
$ # until the end of the string.",
RegexOptions.IgnorePatternWhitespace);
should do what you want. If by letters/digits you meant "only ASCII letters/digits" (and not international/Unicode letters, too), then use
foundMatch = Regex.IsMatch(subjectString,
#"^ # Start of the string
(?=.*[A-Z]) # Assert that there is at least one letter
(?=.*[0-9]) # and at least one digit
(?=.*-) # and at least one dash.
[A-Z0-9-]* # Match a string of letters, digits and dashes
$ # until the end of the string.",
RegexOptions.IgnorePatternWhitespace | RegexOptions.IgnoreCase);
EDIT:
this will match any of the key provided in your comments:
^[0-9A-Z]+(-[0-9A-Z]+)+$
this means the key starts with the digit or letter and have at leats one dash symbol:
Without more info about the regularity of the dashes or otherwise, this is the best we can do:
Regex.IsMatch(input,#"[A-Z0-9\-]+\-[A-Z0-9]")
Although this will also match -A-0
Most naive implementation EVER (might get you started):
([0-9]|[A-Z])+(-)([0-9]|[A-Z])+(-)([0-9]|[A-Z])+
Tested with Regex Coach.
EDIT:
That does match only three groups; here another, slightly better:
([0-9A-Z]+\-)+([0-9A-Z]+)
Are you applying the regex to a whole string (i.e., validating or filtering)? If so, Tim's answer should put you right. But if you're plucking matches from a larger string, it gets a bit more complicated. Here's how I would do that:
string input = #"Pattern could be 011Q-0SH3-936729 or 011Q-0SH3-936729-SDF3 or 000-222-AAAA or 011Q-0SH3-936729-011Q-0SH3-936729-011Q-0SH3-936729-011Q-0SH3-936729 but not 345-3763-46 or ASFS-DFGS-FSDF or ASD123FGH987.";
Regex pluckingRegex = new Regex(
#"(?<!\S) # start of 'word'
(?=\S*\p{L}) # contains a letter
(?=\S*\p{N}) # contains a digit
(?=\S*-) # contains a hyphen
[\p{L}\p{N}-]+ # gobble up letters, digits and hyphens only
(?!\S) # end of 'word'
", RegexOptions.IgnorePatternWhitespace);
foreach (Match m in pluckingRegex.Matches(input))
{
Console.WriteLine(m.Value);
}
output: 011Q-0SH3-936729
011Q-0SH3-936729-SDF3
000-222-AAAA
011Q-0SH3-936729-011Q-0SH3-936729-011Q-0SH3-936729-011Q-0SH3-936729
The negative lookarounds serve as 'word' boundaries: they insure the matched substring starts either at the beginning of the string or after a whitespace character ((?<!\S)), and ends either at the end of the string or before a whitespace character ((?!\S)).
The three positive lookaheads work just like Tim's, except they use \S* to skip whatever precedes the first letter/digit/hyphen. We can't use .* in this case because that would allow it to skip to the next word, or the next, etc., defeating the purpose of the lookahead.

Categories

Resources