Why is Regex.Replace giving me weird result for last group

Why is Regex.Replace giving me weird result for last group - c#

A simple example:
Regex.Replace("12345678910999999999", #"(\d{3})(\d{3})(\d{3})(\d{2})", "$1-$2-$3 $4")
This outputs to:
123-456-789 10999999999
But why? I have specifically set the group index i need. And that group index contains the exact value (checked in debugger).
Here is a fiddle:
https://dotnetfiddle.net/dkAPx3

Match the rest of the string with .* to truncate it:
Regex.Replace("12345678910999999999", #"^(\d{3})(\d{3})(\d{3})(\d{2}).*", "$1-$2-$3 $4")
I'd also add ^ at the start to match the beginning of the string.
See the .NET regex demo.

Your regex has matched and replaced only "first" part of string, add .* to the end of the pattern:
Regex.Replace("12345678910999999999", #"(\d{3})(\d{3})(\d{3})(\d{2}).*", "$1-$2-$3 $4"); // results in "123-456-789 10"

Related

C# regex nth character not in list or string end

I'm trying to check if the 4th letter in a string is not s or S using the following regular expression.
Regex rx = new Regex(#"A[2-6][025][^sS].*");
In Addition I want corresponding three letter strings to match (e.g. "A30").
Unfortunately the Match check returns false.
Does someone know what I'm doing wrong and how I can alter my regex?
rx.Match(test).Success

This should do what you want:
^A[2-6][025](?:[^sS].*|)$
Note the non-capturing group part:
(?:[^sS].*|)
This matches a character that is not s or S, followed by any number of characters or an empty string.
Regex101

First you can check if there is an s or S at fourth character place with the following regex:
^...[sS]
At a second stage you want to check, if there is a combination of A and a number which can be solved with your approach:
A[2-6][025]

C# RegEx to match specific strings

I need to match (using regex) strings that can be like this:
required: custodian_{number 1 - 9}_{fieldType either txt or ssn}
optional: _{fieldLength 1-999}
So for example:
custodian_1_ssn_1 is valid
custodian_1_ssn_1_255 is valid
custodian or custodian_ or custodian_1 or custodian_1_ or custodian_1_ssn or custodian_1_ssn_ or custodian_1_ssn_1_ are not valid
Currently I am working with this:
(?:custodian|signer)_[1-9]?[0-9]_(?:txt|ssn)_[1-9][0-9]?(_[1-9]?[0-9]?[0-9]?)?
as my regex and my api is working to pick up:
custodian_1_txt_1
custodian_1_ssn_1
custodian_1_txt_1_255 <---- not matching the last "5"
any thoughts?

You may use pattern:
^custodian(?:_[a-z0-9]+)+$
^ Assert position beginning of line.
custodian Match literal substring custodian.
(?:_[a-z0-9]+)+ Non capturing group. Multiple sequence of _ followed by alphanumerics.
$ Assert position end of line.
You can check the correct matches here.
Obviously you can modify the pattern to add substring signer in non capturing group as:
^(?:custodian|signer)(?:_[a-z0-9]+)+$.

I suggest using \d for numbers not yours and this is my code try it:-
(?:custodian|signer)_[1-9]?[0-9]_(?:txt|ssn)_[1-9][0-9]?(_[1-9]?\d*)?
I just added a \d value to the end of your pattern to match all end digits before another match.

You could use an anchor to assert the start ^ and the end $ of the string and for the last part make at least the first 1-9 not optional or else it would match and underscore at the end:
^(?:custodian|signer)_[1-9]?[0-9]_(?:txt|ssn)_[1-9][0-9]?(_[1-9][0-9]?[0-9]?)?$

If you're only interested in the last digits, this super generic regex will do:
(?:.+)_(\d+)
If you do need to match the whole string, this worked:
^(?:custodian|signer)_\d+_(?:txt|ssn)(?:_\d+)?_(\d+)$

Matching pattern to end of line

I am trying to get some in-line comments from a text file and need some help with the expression.
this comes before selection
this is on the same line %% this is the first group and it can have any character /[{3$5!+-p
here is some more text in the middle
this stuff is also on a line with a comment %% this is the second group of stuff !##%^()<>/~`
this goes after the selections
I am trying to get everything that follows %%\s+. Here is what I tried:
%%\s+(.*)$
But that matches all text following the first %%. Not sure where to go from here.

Most engines default to the dot does not match newlines
AND not multi-line mode.
That means %%\s+(.*)$ should not match unless it finds
%% on the last line in the string.
Instead of trying to fight it, use inline modifiers (?..) that
override external switches.
Use (?-s)%%\s+(.*) which takes off dot all

Since . matches any character but a newline by default, you needn't use $:
%%\s+(.*)
See regex demo
Explanation:
%% - two literal % symbols
\s+ - 1 or more whitespace
(.*) - 0 or more any characters other than a newline (captured into Group 1)
C# demo:
var s = "THE_STRING";
var result = Regex.Matches(s, #"%%\s+(.*)")
.Cast<Match>()
.Select(p=>p.Groups[1].Value)
.ToList();
Console.WriteLine(string.Join("\n", result));

RegEx : Find match based on 1st two chars

I am new to RegEx and thus have a question on RegEx. I am writing my code in C# and need to come up with a regex to find matching strings.
The possible combination of strings i get are,
XYZF44DT508755
ABZF44DT508755
PQZF44DT508755
So what i need to check is whether the string starts with XY or AB or PQ.
I came up with this one and it doesn't work.
^((XY|AB|PQ).){2}
Note: I don't want to use regular string StartsWith()
UPDATE:
Now if i want to try a new matching condition like this -
If string starts with "XY" or "AB" or "PQ" and 3rd character is "Z" and 4th character is "F"
How to write the RegEx for that?

You can modify you expression to the following and use the IsMatch() method.
Regex.IsMatch(input, "^(?:XY|AB|PQ)")
The outer capturing group in conjuction with . (any single character) is trying to match a third character and then repeat the sequence twice because of the range quantifier {2} ...
According to your updated edit, you can simply place "ZF" after the grouping construct.
Regex.IsMatch(input, "^(?:XY|AB|PQ)ZF")

You want to test for just ^(XY|AB|PQ). Your RegEx means: Search for either XY, AB or PQ, then a random character, and repeat the whole sequence twice, for example "XYKPQL" would match your RegEx.
This is a screenshot of the matches on regex101:
^ forces the start of line,
(...) creates a matching group and
XY|AB|PQ matches either XY, AB or PQ.
If you want the next two characters to be ZF, just append ZF to the RegEx so it becomes ^(XY|AB|PQ)ZF.
Check out regex101, a great way to test your RegExes.

You were on the right track. ^(XY|AB|PQ) should match your string correctly.
The problem with ^((XY|AB|PQ).){2} is following the entire group with {2}. This means exactly 2 occurrences. That would be 2 occurrences of your first 2 characters, plus . (any single character), meaning this would match strings like XY_AB_. The _ could be anything.
It may have been your intention with the . to match a larger string. In this case you might try something along the lines of ^((XY|AB|PQ)\w*). The \w* will match 0 or more occurrences of "word characters", so this should match all of XYZF44DT508755 up to a space, line break, punctuation, etc., and not just the XY at the beginning.
There are some good tools out there for understanding regexes, one of my favorites is debuggex.
UPDATE
To answer your updated question:
If string starts with "XY" or "AB" or "PQ" and 3rd character is "Z" and 4th character is "F"
The regex would be (assuming you want to match the entire "word").
^((XY|AB|PQ)ZF\w*)
Debuggex Demo

Regular expression match text between tag

I need a help with regular expression as I do not have good knowledge in it.
I have regular expression as:
Regex myregex = new Regex("testValue=\"(.+?)\"");
What does (.+?) indicate?
The string it matches is "testValue=123e4567" and returns 123e4567 as output.
Now I need help in regular expression to match a string "<helpMe>123e4567</helpMe>" where I need 123e4567 as output. How do I write a regular expression for it?

This means:
( Begin captured group
. Match any character
+ One or more times
? Non-greedy quantifier
) End captured group
In the case of your regex, the non-greedy quantifier ? means that your captured group will begin after the first double-quote, and then end immediately before the very next double-quote it encounters. If it were greedy (without the ?), the group would extend to the very last double-quote it encounters on that line (i.e., "greedily" consuming as much of the line as possible).
For your "helpMe" example, you'd want this regex:
<helpMe>(.+?)</helpMe>
Given this string:
<div>Something<helpMe>ABCDE</helpMe></div>
You'd get this match:
ABCDE
The value of the non-greedy quantifier is evident in this variation:
Regex: <helpMe>(.+)</helpMe>
String: <div>Something<helpMe>ABCDE</helpMe><helpMe>FGHIJ</helpMe></div>
The greedy capture would look like this:
ABCDE</helpMe><helpMe>FGHIJ
There are some useful interactive tools to play with these variations:
Regex Tester
Regex Pal

Ken Redler has a great answer regarding your first question. For the second question try:
<(helpMe)>(.*?)</\1>
Using the back reference \1 you can find values between the set of matching tags. The first group finds the tag name, the second group matches the content itself, and the \1 back reference re-uses the first group's match (in this case the tag name).
Also, in C# you can use named groups, like: <(helpMe)>(?<value>.*?)</\1> where now match.Groups["value"].Value contains your value.

What does (.+?) indicate?
It means match any character (.) one or more times (+?)
A simple regex to match your second string would be
<helpMe>([a-z0-9]+)<\/helpMe>
This will match any character of a-z and any digit inside <helpme> and </helpMe>.
The pharanteses are used to capture a group. This is useful if you need to reference the value inside this group later.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Why is Regex.Replace giving me weird result for last group - c#

Match the rest of the string with .* to truncate it: Regex.Replace("12345678910999999999", #"^(\d{3})(\d{3})(\d{3})(\d{2}).*", "$1-$2-$3 $4") I'd also add ^ at the start to match the beginning of the string. See the .NET regex demo.

Your regex has matched and replaced only "first" part of string, add .* to the end of the pattern: Regex.Replace("12345678910999999999", #"(\d{3})(\d{3})(\d{3})(\d{2}).*", "$1-$2-$3 $4"); // results in "123-456-789 10"

Related

C# regex nth character not in list or string end

C# RegEx to match specific strings

Matching pattern to end of line

RegEx : Find match based on 1st two chars

Regular expression match text between tag

Categories

Resources