Regex replace/search using values/variables in search text

Regex replace/search using values/variables in search text - c#

What is the regex syntax to use part of a matched expression in the subsequent part of the search?
So, for example, if I have:
"{marker=1}some text{/marker=1}"
or
"{marker=2}some text{/marker=2}"
I want to use the first digit found in the pattern to find the second digit. So in
"{marker=1}{marker=2}some text{/marker=2}{/marker=1}"
the regex would match the 1's and then the 2's.
So far I've come up with {marker=(\d)}(.*?){/marker=(\d)} but don't know how to specify the second \d to refer to the value found in the first \d.
I'm doing this in C#.

try:
{marker=(\d)}(.*?){/marker=(\1)}

Numbered backreference is just \n, so \1 should work here:
Regex re = new Regex(#"\{marker=(\d)\}(.*?)\{/marker=(\1)\}");
// expect to work
Console.WriteLine(re.IsMatch(#"{marker=1}some text{/marker=1}"));
// expect to fail (end marker is different)
Console.WriteLine(re.IsMatch(#"{marker=1}some text{/marker=2}"));

Related

Regular expression for specific combination of alphabets and numbers

I am trying to create regular expression for following type of strings:
combination of the prefix (XI/ YV/ XD/ YQ/ XZ), numerical digits only, and either no ‘Z’ or a ‘Z’ suffix.
For example, XD35Z should pass but XD01HW should not pass.
So far I tried following:
#"XD\d+Z?" - XD35Z passes but unfortunately it also works for XD01HW
#"XD\d+$Z" - XD01HW fails which is what I want but XD35Z also fails
I have also tried #"XD\d{1,}Z"? but it did not work
I need a single regex which will give me appropriate results for both types of strings.

Try this regex:
^(XI|YV|XD|YQ|XZ){1}\d+Z{0,1}$
I'm using quantifying braces to explicitly limit the allowed numbers of each character/group. And the ^ and $ anchors make sure that the regex matches only the whole line (string).
Broken into logical pieces this regex checks
^(XI|YV|XD|YQ|XZ){1} Starts with exactly one of the allowed prefixes
\d+ Is follow by one or more digits
Z{0,1}$ Ends with between 0 and 1 Z

You're misusing the $ which represents the end of the string in the Regex
It should be : #"^XD\d+Z?$" (notice that it appears at the end of the Regex, after the Z?)

The regex following the behaviour you want is:
^(XI|YV|XD|YQ|XZ)\d+Z?$
Explanation:
combination of the prefix (XI/ YV/ XD/ YQ/ XZ)
^(XI|YV|XD|YQ|XZ)
numerical digits only
\d+
‘Z’ or a ‘Z’ suffix
Z?$

Regex - Get matches of #[SomeText] in a string

I want to get all matches of #[SomeText] pattern in a string.
For example, for this string:
here is #[text1] some text #[text2]
I want #[text1] and #[text2].
I'm using Regex Hero to check my pattern matching online,
and my pattern works fine when there's one expression to match,
For example:
here is #[text1] text
but with more then one, I get both matches with the text in the middle.
This is my regex:
#\[.*\]
I would appreciate assistance in isolating the occurrences.

The problem here is that you are using greedy quantifier (*). To capture all you need, you should use lazy quantifier (*?) with a global modifier:
/(#\[.*?\])/g
Take a look here https://regex101.com/r/pH0gA5/1

This should work :
#\[(.*?)\]
Details :
(.*?) : match everything in a non-greedy way and capture it.
Because the *? quantifier is lazy (non-greedy), it matches as few characters as possible to allow the overall match attempt to succeed, i.e. text1. For the match attempt that starts at a given position, a lazy quantifier gives you the shortest match.

.* is greedy by default, so it only finds one match, treating "text1] and #[text2" as the text between the two square brackets.
If you add a questions mark after the .* then it will find the minimum number of characters before reaching a ].
So the regex \#[.*?] do what you want.

RegEx : Find match based on 1st two chars

I am new to RegEx and thus have a question on RegEx. I am writing my code in C# and need to come up with a regex to find matching strings.
The possible combination of strings i get are,
XYZF44DT508755
ABZF44DT508755
PQZF44DT508755
So what i need to check is whether the string starts with XY or AB or PQ.
I came up with this one and it doesn't work.
^((XY|AB|PQ).){2}
Note: I don't want to use regular string StartsWith()
UPDATE:
Now if i want to try a new matching condition like this -
If string starts with "XY" or "AB" or "PQ" and 3rd character is "Z" and 4th character is "F"
How to write the RegEx for that?

You can modify you expression to the following and use the IsMatch() method.
Regex.IsMatch(input, "^(?:XY|AB|PQ)")
The outer capturing group in conjuction with . (any single character) is trying to match a third character and then repeat the sequence twice because of the range quantifier {2} ...
According to your updated edit, you can simply place "ZF" after the grouping construct.
Regex.IsMatch(input, "^(?:XY|AB|PQ)ZF")

You want to test for just ^(XY|AB|PQ). Your RegEx means: Search for either XY, AB or PQ, then a random character, and repeat the whole sequence twice, for example "XYKPQL" would match your RegEx.
This is a screenshot of the matches on regex101:
^ forces the start of line,
(...) creates a matching group and
XY|AB|PQ matches either XY, AB or PQ.
If you want the next two characters to be ZF, just append ZF to the RegEx so it becomes ^(XY|AB|PQ)ZF.
Check out regex101, a great way to test your RegExes.

You were on the right track. ^(XY|AB|PQ) should match your string correctly.
The problem with ^((XY|AB|PQ).){2} is following the entire group with {2}. This means exactly 2 occurrences. That would be 2 occurrences of your first 2 characters, plus . (any single character), meaning this would match strings like XY_AB_. The _ could be anything.
It may have been your intention with the . to match a larger string. In this case you might try something along the lines of ^((XY|AB|PQ)\w*). The \w* will match 0 or more occurrences of "word characters", so this should match all of XYZF44DT508755 up to a space, line break, punctuation, etc., and not just the XY at the beginning.
There are some good tools out there for understanding regexes, one of my favorites is debuggex.
UPDATE
To answer your updated question:
If string starts with "XY" or "AB" or "PQ" and 3rd character is "Z" and 4th character is "F"
The regex would be (assuming you want to match the entire "word").
^((XY|AB|PQ)ZF\w*)
Debuggex Demo

Regular Expression to not allow 3 consecutive characters

I have the following regex:
Regex pattern = new Regex(#"^(?=.*\d)(?=.*[a-z])(?=.*[A-Z])[0-9a-zA-Z]{8,20}/(.)$");
(?=.*\d) //should contain at least one digit
(?=.*[a-z]) //should contain at least one lower case
(?=.*[A-Z]) //should contain at least one upper case
[a-zA-Z0-9]{8,20} //should contain at least 8 characters and maximum of 20
My problem is I also need to check if 3 consecutive characters are identical. Upon searching, I saw this solution:
/(.)\1\1/
However, I can't make it to work if I combined it to my existing regex, still no luck:
Regex(#"^(?=.*\d)(?=.*[a-z])(?=.*[A-Z])[0-9a-zA-Z]{8,20}$/(.)\1\1/");
What did I missed here? Thanks!

The problem is that /(.)\1\1/ includes the surrounding / characters which are used to quote literal regular expressions in some languages (like Perl). But even if you don't use the quoting characters, you can't just add it to a regular expression.
At the beginning of your regex, you have to say "What follows cannot contain a character followed by itself and then itself again", like this: (?!.*(.)\1\1). The (?! starts a zero-width negative lookahead assertion. The "zero-width" part means that it does not consume any characters in the input string, and the "negative lookahead assertions" means that it looks forward in the input string to make sure that the given pattern does not appear anywhere.
All told, you want a regex like this:
new Regex(#"^(?!.*(.)\1\1)(?=.*\d)(?=.*[a-z])(?=.*[A-Z])[0-9a-zA-Z]{8,20}$")

I solved by using trial and error:
Regex pattern = new Regex(#"^(?!.*(.)\1\1)(?=.*\d)(?=.*[a-z])(?=.*[A-Z])[0-9a-zA-Z]{8,20}$");

Regular expression match text between tag

I need a help with regular expression as I do not have good knowledge in it.
I have regular expression as:
Regex myregex = new Regex("testValue=\"(.+?)\"");
What does (.+?) indicate?
The string it matches is "testValue=123e4567" and returns 123e4567 as output.
Now I need help in regular expression to match a string "<helpMe>123e4567</helpMe>" where I need 123e4567 as output. How do I write a regular expression for it?

This means:
( Begin captured group
. Match any character
+ One or more times
? Non-greedy quantifier
) End captured group
In the case of your regex, the non-greedy quantifier ? means that your captured group will begin after the first double-quote, and then end immediately before the very next double-quote it encounters. If it were greedy (without the ?), the group would extend to the very last double-quote it encounters on that line (i.e., "greedily" consuming as much of the line as possible).
For your "helpMe" example, you'd want this regex:
<helpMe>(.+?)</helpMe>
Given this string:
<div>Something<helpMe>ABCDE</helpMe></div>
You'd get this match:
ABCDE
The value of the non-greedy quantifier is evident in this variation:
Regex: <helpMe>(.+)</helpMe>
String: <div>Something<helpMe>ABCDE</helpMe><helpMe>FGHIJ</helpMe></div>
The greedy capture would look like this:
ABCDE</helpMe><helpMe>FGHIJ
There are some useful interactive tools to play with these variations:
Regex Tester
Regex Pal

Ken Redler has a great answer regarding your first question. For the second question try:
<(helpMe)>(.*?)</\1>
Using the back reference \1 you can find values between the set of matching tags. The first group finds the tag name, the second group matches the content itself, and the \1 back reference re-uses the first group's match (in this case the tag name).
Also, in C# you can use named groups, like: <(helpMe)>(?<value>.*?)</\1> where now match.Groups["value"].Value contains your value.

What does (.+?) indicate?
It means match any character (.) one or more times (+?)
A simple regex to match your second string would be
<helpMe>([a-z0-9]+)<\/helpMe>
This will match any character of a-z and any digit inside <helpme> and </helpMe>.
The pharanteses are used to capture a group. This is useful if you need to reference the value inside this group later.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex replace/search using values/variables in search text - c#

try: {marker=(\d)}(.*?){/marker=(\1)}

Related

Regular expression for specific combination of alphabets and numbers

Regex - Get matches of #[SomeText] in a string

RegEx : Find match based on 1st two chars

Regular Expression to not allow 3 consecutive characters

Regular expression match text between tag

Categories

Resources