Regular expression, match anything not enclosed in - c#

Given the string foobarbarbarfoobar, I want to have everything between foo. So I used this expression for that and the result is: barbarbar. It's working great.
(?<=foo).*(?=foo)
Now I also want the opposite. So given the string foobarbarbarfoobar I want everything that is not enclosed by foo. I tried the following regular expression:
(?<!foo).*(?!foo)
I expected bar as result but instead it returns a match for foobarbarbarfoobar. It doesn't make sense to me. What am I missing?
The explanation from: https://regex101.com/ looks good to me?
(?<!foo).*(?!foo)
(?<!foo) Negative Lookbehind - Assert that it is impossible to match the regex below
foo matches the characters foo literally (case sensitive)
.* matches any character (except newline)
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
(?!foo) Negative Lookahead - Assert that it is impossible to match the regex below
foo matches the characters foo literally (case sensitive)
Any help is really appreciated

I'm hoping someone finds a better approach, but this abomination may do what you want:
(.*)foo(?<=foo).*(?=foo)foo(.*)
The text before the first foo is in capture group 1 (with your provided example this would be empty) and after is in capture group 2 (would be 'bar' in this case)
If you want the 'foo's included on either end, use this instead: (.*)(?<=foo).*(?=foo)(.*). This would result in 'foo' in group 1, and 'foobar' in group 2.

I found a solution for it:
^((?!foo).)+
Explanation from regex101
^ assert position at start of the string
1st Capturing group ((?!foo).)+
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed
Note: A repeated capturing group will only capture the last iteration. Put a capturing group around the repeated group to capture all iterations or use a non-capturing group instead if you're not interested in the data
(?!foo) Negative Lookahead - Assert that it is impossible to match the regex below
foo matches the characters foo literally (case sensitive)
. matches any character (except newline)

Related

Regular Expression get text between braces including other braces

I have a "main"-string like:
((Gripper|Open==true OR RIT|Turning==false) AND Robot|PosX >=3 OR (Test|Close==false OR (Gripper|Open==false AND RIT|Turning==false)))
I want to get three sub strings in the best case:
1: (Gripper|Open==true OR RIT|Turning==false)
2: Robot|PosX >=3
3: (Test|Close==false OR (Gripper|Open==false AND RIT|Turning==false))
But only two (the one in braces [1,3]) would be fine too, since they can be replaced in the main-string, getting the 3rd[2] as a result.
Ideally with the help of regex.
All the sub strings go into a class as children so I can apply the regex for each child and get their sub strings as well.
1: Test|Close==false
2: (Gripper|Open==false AND RIT|Turning==false)
For child number three (where the first result without the braces would be optional again.
I tried something similar to Regular expression to extract text between braces and putting positions of the matches onto a stack, but not with the expected results.
The best regex I found so far is
([^()]+(?:[^()]+)+) or
([^()]+(?:)+)
(seriously, regex is powerful, but I have no idea what the above statements really do) which gives me
1. Gripper|Open == true OR RIT|Turning==false
2. AND Robot|PosX >=3 OR
3. Test|Close==false OR
4. Gripper|Open==false AND RIT|Turning==false
But still, 3+4 should be in only one group as
Test|Close==false OR (Gripper|Open==false AND RIT|Turning==false)
Does anyone know how to achieve this?
It seems like you are looking for balanced parenthesis where the matches start with 2 words divided by a pipe and then an operator followed by an equals sign
In C# you might match either the balanced parenthesis or match a pattern that does not contain them using an alternation.
(?:\(\w+\|\w+\s*[<>!=]{1,2}[^()]*(?>[^()]+|(?<o>)\(|(?<-o>)\))*(?(o)(?!)|)\)|\w+\|\w+\s*[<>!=]{1,2}\S+)
(?: Non capture group
\(\w+\|\w+\s* Match ( then 2 words divided by a pipe and 0+ whitespace chars
[<>!=]{1,2}[^()]* Match any of the operators and match any char except ()
(?> Atomic group
[^()]+ Match 1+ times any char except ()
| Or
(?<o>)\( Add to stack
| Or
(?<-o>)\) Remove from stack
)* Close atomic group and repeat 0+ times
(?(o)(?!)|)\) Conditional with capturing group, evaluate the final subpattern
| Or
\w+\|\w+\s*[<>!=]{1,2}\S+ Match 2 words divided by a pipe and match operators
) Close non capture group
Regex demo
You may try with that:
(?<=\))(?!\()[^()]+|\((?!\()[^)]+\)
Regex101
Explanation:
(?<=\))(?!\()[^()]+ OR \((?!\()[^)]+\)
The first part before 'OR' basically matches AND Robot|PosX >=3 OR
(?<=\)) negative lookbehind: match current character if the
previous character is not )
(?!\() negative lookahead : match current character if the next
charcter is not ( or )
[^()]+ matches anything that is Neither ( nor ).
The last part after OR matches anything that starts with ( and ends with ) while ignoring any opening braces inside it.

Matching only first occurrence in C# regex

I want to match only first Hash="" in this.
Hash="123"Hash="AEBB1247209BC9E10EA2054F1813DFD7BB9EEF23FEF7C867FCFCEC69CA0C2A6D"Hash="1"
I tried with regex (Hash="[0-9,A-F]+")?
But it always matches all 3 Hash="". Is there any way to fix this? I am using C# Regex library.
Just use Match(), not Matches().
The correct answer would be to use regex.match() instead of regex.matches().
The first one returns the first match only, the second one returns all matches.
However, there is a regexp that does the job and returns only the first one, whatever you call match() or matches() :
(?<!Hash="[0-9A-F]+".*)Hash="[0-9A-F]+"
Note : this works in C#, but won't work in most other languages (like Java)
(?<!Hash="[0-9A-F]+".*) is a negative lookahead : it should not be possible to match Hash="[0-9A-F]+".* on the left side of current position (the part already read). In other words, it only matches the first Hash.
Two things regarding your current regex : (Hash="[0-9,A-F]+")?
The , between 0-9 and A-F is not a separator between the 2 ranges. It just adds , in the valid character set, which is probably not you goal (you'll accept hashes like 0123,456)
The final ? indicates that the whole group is optional. So, if it cannot be matched, it's still a success. As a consequence, strings like abcd will match (and matches() will return 5 matches of length 0: before a, between a and b, between b and c, etc...)
You can use the below regex to achieve your functionality:
(Hash="([0-9,A-F]+)[^"]*).*"
Explanation:
1st Capturing Group (Hash="([0-9,A-F]+)[^"]*)
Hash=" matches the characters Hash=" literally (case sensitive)
2nd Capturing Group ([0-9,A-F]+):
Match a single character present in the list below [0-9,A-F]+
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
0-9 a single character in the range between 0 (index 48) and 9 (index 57) (case sensitive)
, matches the character , literally (case sensitive)
A-F a single character in the range between A (index 65) and F (index 70) (case sensitive)
Match a single character not present in the list below [^"]*
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
" matches the character " literally (case sensitive)
.* matches any character (except for line terminators)
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
" matches the character " literally (case sensitive)
Working example: https://regex101.com/r/XTqhyU/2

Regex match a string that is not part of a larger word [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 3 years ago.
I'm stumped on how to even go about this.
I am trying to match the string "ashi" but not if the word containing it is in a small list of known false positives like "flashing", "lashing", "smashing". The false positive words can appear in the string as well as long as the string "ashi" (not as part of one of the false positive words) is in the string it should return true.
I'm using C# and I was trying to go about it without using regular expressions, but I am having no luck.
These strings should return true
...somethingashisomething...
...something2!ashi*&something...
... something ashi something flashing...
These strings should return false
...somethingflashingsomething...
...smashingthesomething...
...the lashings are too tight...
Another option might be to use a negative lookbehind with a nested lookahead to match words that start with fl but not if they are followed by ashing to match ashi but not flashing.
(?<!\bfl(?=ashing\b))ashi
Explanation
(?<! Negative lookbehind, assert what is directly on the right is not
\bfl Word boundary, match fb
(?= Positive lookahead, assert what is directly on the right is
ashing\b Match ashing and word boundary
) Close positive lookahead
) Close positive lookbehind.
ashi Match literally
.NET Regex demo
Update
If you want to match and not match the updated values, you could use an alternation (?:sm|f?l) in the negative lookbehind to match sm or an optional f followed by l
(?<!(?:sm|f?l)(?=ashing))ashi
.NET regex demo | C# demo
You can make use of a capturing group:
(flashing)|ashi
If the first group is not empty, you matched flashing literally
The following will match ashi but not within flashing. I interpreted "word" loosely, so flashing is not required to be isolated as a separate word with space/punctuation delimiters.
(?<=(?<prefix>fl)|)ashi(?(prefix)(?!ng))
It is sufficient to return true/false over the entire pattern and won't require checking specific capture groups. In other words, it is usable with Regex.IsMatch().
Pattern details:
(?<= # Zero-width positive lookbehind: match but don't consume characters
(?<prefix>fl) # Named capture group to match "fl" at start of "flashing"
| # Alternate blank capture - will succeed if "fl" is not present
) # End lookbehind
ashi # match literal "ashi"
(?(prefix) # Conditional: Only match if named group prefix has successful capture (i.e. "fl" was matched)
(?!ng) # Zero-width negative loohahead: Fail match if "ng" follows
) # Close conditional (there is no false part, so match succeeds if "fl" was not present)
If flashing is only excluded as an isolated word, just add word boundary operators. This will match something like flashingwithnospace, whereas the first pattern would fail on that string:
(?<=(?<prefix>\bfl)|)ashi(?(prefix)(?!ng\b))
(FYI, the pattern will work in isolation, but if it is combined within another pattern, especially inside a repeating construction, it may not work due to the conditional on the named capture group. Once the named capture group has succeeded, the conditional will remain true while matching the larger pattern, even if it were to encounter another occurrence of ashi.)
The question gives the examples
...somethingashisomething...
...something2!ashi*&something...
... something ashi something...
The second and third examples can be found by including the word boundary \b in the search, i.e. search for \bashi\b. Finding the first example requires more knowledge of what the two enclosing somethings are. If they are alphanumeric then you need to specify the problem in much more detail.

RegEx : Find match based on 1st two chars

I am new to RegEx and thus have a question on RegEx. I am writing my code in C# and need to come up with a regex to find matching strings.
The possible combination of strings i get are,
XYZF44DT508755
ABZF44DT508755
PQZF44DT508755
So what i need to check is whether the string starts with XY or AB or PQ.
I came up with this one and it doesn't work.
^((XY|AB|PQ).){2}
Note: I don't want to use regular string StartsWith()
UPDATE:
Now if i want to try a new matching condition like this -
If string starts with "XY" or "AB" or "PQ" and 3rd character is "Z" and 4th character is "F"
How to write the RegEx for that?
You can modify you expression to the following and use the IsMatch() method.
Regex.IsMatch(input, "^(?:XY|AB|PQ)")
The outer capturing group in conjuction with . (any single character) is trying to match a third character and then repeat the sequence twice because of the range quantifier {2} ...
According to your updated edit, you can simply place "ZF" after the grouping construct.
Regex.IsMatch(input, "^(?:XY|AB|PQ)ZF")
You want to test for just ^(XY|AB|PQ). Your RegEx means: Search for either XY, AB or PQ, then a random character, and repeat the whole sequence twice, for example "XYKPQL" would match your RegEx.
This is a screenshot of the matches on regex101:
^ forces the start of line,
(...) creates a matching group and
XY|AB|PQ matches either XY, AB or PQ.
If you want the next two characters to be ZF, just append ZF to the RegEx so it becomes ^(XY|AB|PQ)ZF.
Check out regex101, a great way to test your RegExes.
You were on the right track. ^(XY|AB|PQ) should match your string correctly.
The problem with ^((XY|AB|PQ).){2} is following the entire group with {2}. This means exactly 2 occurrences. That would be 2 occurrences of your first 2 characters, plus . (any single character), meaning this would match strings like XY_AB_. The _ could be anything.
It may have been your intention with the . to match a larger string. In this case you might try something along the lines of ^((XY|AB|PQ)\w*). The \w* will match 0 or more occurrences of "word characters", so this should match all of XYZF44DT508755 up to a space, line break, punctuation, etc., and not just the XY at the beginning.
There are some good tools out there for understanding regexes, one of my favorites is debuggex.
UPDATE
To answer your updated question:
If string starts with "XY" or "AB" or "PQ" and 3rd character is "Z" and 4th character is "F"
The regex would be (assuming you want to match the entire "word").
^((XY|AB|PQ)ZF\w*)
Debuggex Demo

Regular expression match text between tag

I need a help with regular expression as I do not have good knowledge in it.
I have regular expression as:
Regex myregex = new Regex("testValue=\"(.+?)\"");
What does (.+?) indicate?
The string it matches is "testValue=123e4567" and returns 123e4567 as output.
Now I need help in regular expression to match a string "<helpMe>123e4567</helpMe>" where I need 123e4567 as output. How do I write a regular expression for it?
This means:
( Begin captured group
. Match any character
+ One or more times
? Non-greedy quantifier
) End captured group
In the case of your regex, the non-greedy quantifier ? means that your captured group will begin after the first double-quote, and then end immediately before the very next double-quote it encounters. If it were greedy (without the ?), the group would extend to the very last double-quote it encounters on that line (i.e., "greedily" consuming as much of the line as possible).
For your "helpMe" example, you'd want this regex:
<helpMe>(.+?)</helpMe>
Given this string:
<div>Something<helpMe>ABCDE</helpMe></div>
You'd get this match:
ABCDE
The value of the non-greedy quantifier is evident in this variation:
Regex: <helpMe>(.+)</helpMe>
String: <div>Something<helpMe>ABCDE</helpMe><helpMe>FGHIJ</helpMe></div>
The greedy capture would look like this:
ABCDE</helpMe><helpMe>FGHIJ
There are some useful interactive tools to play with these variations:
Regex Tester
Regex Pal
Ken Redler has a great answer regarding your first question. For the second question try:
<(helpMe)>(.*?)</\1>
Using the back reference \1 you can find values between the set of matching tags. The first group finds the tag name, the second group matches the content itself, and the \1 back reference re-uses the first group's match (in this case the tag name).
Also, in C# you can use named groups, like: <(helpMe)>(?<value>.*?)</\1> where now match.Groups["value"].Value contains your value.
What does (.+?) indicate?
It means match any character (.) one or more times (+?)
A simple regex to match your second string would be
<helpMe>([a-z0-9]+)<\/helpMe>
This will match any character of a-z and any digit inside <helpme> and </helpMe>.
The pharanteses are used to capture a group. This is useful if you need to reference the value inside this group later.

Categories

Resources