Regexp group matching one more character then group should? - c#

I'm guessing that I'm confused about how groups work in regexp.
My regexp replaces more characters then it should.
Here is my string:
...test - Copy\asd.test2\asd.keke
And here is my pattern:
.?(asd\.)
It matches "\asd." but I want it to match just "asd."
What am I getting wrong here?

What are you trying to achieve using .? if you don't want to match it?
To check for characters outside of your match, you can use lookaround assertion. E.g. checking for a backslash before the match, you'd use
(?<=\\)asd\.

Related

RegEx to find non-existence of white space prefix but not include the character in the match?

So i have the following RegEx for the purpose of finding and adding whitespace:
(\S)(\()
So for a string like "SomeText(Somemoretext)" I want to update this to "SomeText (Somemoretext)" it matches "t(" and so my replace eliminates the "t" from the string which is not good. I also do not know what the character could be, I'm merely trying to find the non-existence of whitespace.
Is there a better expression to use or is there a way to exclude the found character from the match returned so that I can safely replace without catching characters i do not want to replace?
Thanks
I find lookarounds hard to read and would prefer using substitutions in the replacement string instead:
var s = Regex.Replace("test1() test2()", #"(\S)\(", "$1 (");
Debug.Assert(s == "test1 () test2 ()");
$1 inserts the first capture group from the regex into the replacement string which is the non-space character before the opening parenthesis (.
If you need to detect the absence of space before a specific character (such as bracket) after a word, how about the following?
\b(?=[^\s])\(
This will detect words ( [a-zA-z0-9_] that are followed by a bracket, without a space).
(if I got your problem correctly) you can replace the full match with ( and get exactly what you need.
In case you need to look for absence spaces before a symbol (like a bracket) in any kind of text (as in the text may be non-word, such as punctuation) you might want to use the following instead.
^(?:\S*)(\()(?:\S*)$
When using this, your result will be in group 1, instead of just full match (which now contains the whole line, if a line is matched).

Regex and taking care of possible whitespace

I need help with my regex expression. I am trying to match keyword this but only when it is in parenthesis (this). So far I have:
(\()\bthis\b(\))
But looks like it also matches the parenthesis wrapping the word, while I only need to grab the word itself. Another issue is that it won't work if there are whitespaces inside the parenthesis: ( this )
What about group matching with that kind of REGEXP expression:
\((this)\)
Also if you want to match when there are white spaces (spaces, tabs, etc.):
\(\s*(this)\s*\)
Try it out here : Regex101. All the details about each character I'm using in the regex are detailed on that site.
You can retrieve the this value matched in the group by code. Please, check out the documentation related to the language you're using for that.

Regex - Get matches of #[SomeText] in a string

I want to get all matches of #[SomeText] pattern in a string.
For example, for this string:
here is #[text1] some text #[text2]
I want #[text1] and #[text2].
I'm using Regex Hero to check my pattern matching online,
and my pattern works fine when there's one expression to match,
For example:
here is #[text1] text
but with more then one, I get both matches with the text in the middle.
This is my regex:
#\[.*\]
I would appreciate assistance in isolating the occurrences.
The problem here is that you are using greedy quantifier (*). To capture all you need, you should use lazy quantifier (*?) with a global modifier:
/(#\[.*?\])/g
Take a look here https://regex101.com/r/pH0gA5/1
This should work :
#\[(.*?)\]
Details :
(.*?) : match everything in a non-greedy way and capture it.
Because the *? quantifier is lazy (non-greedy), it matches as few characters as possible to allow the overall match attempt to succeed, i.e. text1. For the match attempt that starts at a given position, a lazy quantifier gives you the shortest match.
.* is greedy by default, so it only finds one match, treating "text1] and #[text2" as the text between the two square brackets.
If you add a questions mark after the .* then it will find the minimum number of characters before reaching a ].
So the regex \#[.*?] do what you want.

Regular expression match text between tag

I need a help with regular expression as I do not have good knowledge in it.
I have regular expression as:
Regex myregex = new Regex("testValue=\"(.+?)\"");
What does (.+?) indicate?
The string it matches is "testValue=123e4567" and returns 123e4567 as output.
Now I need help in regular expression to match a string "<helpMe>123e4567</helpMe>" where I need 123e4567 as output. How do I write a regular expression for it?
This means:
( Begin captured group
. Match any character
+ One or more times
? Non-greedy quantifier
) End captured group
In the case of your regex, the non-greedy quantifier ? means that your captured group will begin after the first double-quote, and then end immediately before the very next double-quote it encounters. If it were greedy (without the ?), the group would extend to the very last double-quote it encounters on that line (i.e., "greedily" consuming as much of the line as possible).
For your "helpMe" example, you'd want this regex:
<helpMe>(.+?)</helpMe>
Given this string:
<div>Something<helpMe>ABCDE</helpMe></div>
You'd get this match:
ABCDE
The value of the non-greedy quantifier is evident in this variation:
Regex: <helpMe>(.+)</helpMe>
String: <div>Something<helpMe>ABCDE</helpMe><helpMe>FGHIJ</helpMe></div>
The greedy capture would look like this:
ABCDE</helpMe><helpMe>FGHIJ
There are some useful interactive tools to play with these variations:
Regex Tester
Regex Pal
Ken Redler has a great answer regarding your first question. For the second question try:
<(helpMe)>(.*?)</\1>
Using the back reference \1 you can find values between the set of matching tags. The first group finds the tag name, the second group matches the content itself, and the \1 back reference re-uses the first group's match (in this case the tag name).
Also, in C# you can use named groups, like: <(helpMe)>(?<value>.*?)</\1> where now match.Groups["value"].Value contains your value.
What does (.+?) indicate?
It means match any character (.) one or more times (+?)
A simple regex to match your second string would be
<helpMe>([a-z0-9]+)<\/helpMe>
This will match any character of a-z and any digit inside <helpme> and </helpMe>.
The pharanteses are used to capture a group. This is useful if you need to reference the value inside this group later.

Regex replace/search using values/variables in search text

What is the regex syntax to use part of a matched expression in the subsequent part of the search?
So, for example, if I have:
"{marker=1}some text{/marker=1}"
or
"{marker=2}some text{/marker=2}"
I want to use the first digit found in the pattern to find the second digit. So in
"{marker=1}{marker=2}some text{/marker=2}{/marker=1}"
the regex would match the 1's and then the 2's.
So far I've come up with {marker=(\d)}(.*?){/marker=(\d)} but don't know how to specify the second \d to refer to the value found in the first \d.
I'm doing this in C#.
try:
{marker=(\d)}(.*?){/marker=(\1)}
Numbered backreference is just \n, so \1 should work here:
Regex re = new Regex(#"\{marker=(\d)\}(.*?)\{/marker=(\1)\}");
// expect to work
Console.WriteLine(re.IsMatch(#"{marker=1}some text{/marker=1}"));
// expect to fail (end marker is different)
Console.WriteLine(re.IsMatch(#"{marker=1}some text{/marker=2}"));

Categories

Resources