How to use Regex.Matches with a start index AND RegexOptions

How to use Regex.Matches with a start index AND RegexOptions - c#

There doesn't seem to be a way to specify both RegexOptions and a start index when using Regex.Matches.
According to the docs, there is a way to do both individually, but not together.
In the example below, I want matches to contain only the second hEllo in the string text
string pattern = #"\bhello\b";
string text = "hello world. hEllo";
Regex r = new Regex(pattern);
MatchCollection matches;
// matches nothing
matches = r.Matches(text, 5)
// matches the first occurence
matches = Regex.Matches(text, pattern, RegexOptions.IgnoreCase)
Is there a different way to accomplish this?

I don't believe you can. You should instead instantiate Regex using the desired options:
Regex r = new Regex(pattern, RegexOptions.IgnoreCase);
and then you can simply use your existing code from the first sample, which should now match since we're using the IgnoreCase option:
matches = r.Matches(text, 5);
Applicable constructor docs
Try it online

Related

Find all occurrences of regex pattern in string

I have the following code:
var varFormula = "IIF(ABCCF012HIZ3000=0,0,ABCCF012HCZ3000/ABCCF012HIZ3000)"
MatchCollection match = Regex.Matches(varFormula, #"^AB([CC|DD|EE]+[F|G])[0-9]{3}(HI|IC|HC)Z[A-Z0-9]{4}$", RegexOptions.IgnoreCase);
From above, I want to extract the following from varFormula but I'm not able to get any matches/groups:
ABCCF012HCZ3000
ABCCF012HIZ3000

Your regex uses ^ and $ which designate the start and end of a line. So it will never match the varFormula. Try the following:
var varFormula = #"IIF(ABCCF012HIZ3000=0,0,ABCCF012HCZ3000/ABCCF012HIZ3000)";
MatchCollection match = Regex.Matches(varFormula, #"AB(?:[CC|DD|EE]+[F|G])[0-9]{3}(?:HI|IC|HC)Z[A-Z0-9]{4}", RegexOptions.IgnoreCase)
Should give you three matches:
Match 1 ABCCF012HIZ3000
Match 2 ABCCF012HCZ3000
Match 3 ABCCF012HIZ3000

Regex Match any Word in single or multiple lines [\r\n]

We need to search the word "test.property" and replace "test1.field" in a single or multiple line.
word boundary won't ignore the \r\n, it can find the:
test.\r\nproperty
how to ignore the \r\n between words in regex C#?
ex:
Input sources:
int c = test\r\n.Property\r\n["Test"];
needed output :
int c = test1\r\n.Field\r\n["Test"];
My current output :
int c =test1.Field.["test"]
Pattern :
regex I'm using :
Regex regex = new Regex(#"\btest\s*\.\s*property\b", RegexOptions.IgnoreCase | RegexOptions.Singleline);
replacementLine = regex.Replace(sourceLine, "test1.field");
we need to replace only the string not the line break. Please give your suggestions?

try this :
Regex regex = new Regex(#"(?'g1'\btest\b\.\W*)(?'g3'\bproperty\b)", RegexOptions.IgnoreCase | RegexOptions.Singleline);
var replacementLine = regex.Replace(sourceLine, "${g1}Field");

You need a lookbehind which also accepts the possible whitespace. Here is an example which ignores the first, changes both the second and third findings.
var data = #"testNO.property['LeaveThis'];
test
.property
['Test'];
test.property['Test2'];";
var pattern = #"(?<=test[\r\n\s]*\.)property";
Regex.Replace(data, pattern, "field")
Result of Replace
testNO.property['LeaveThis'];
test
.field
['Test'];
test.field['Test2'];

regex pattern for tags needed

Howzit,
I need help with the following please.
I need to find tags in a string. These tags start with {{ and end with }}, there will be multiple tags in the string I receive.
So far I have this, but it doesn't find any matches, what am I missing here?
List<string> list = new List<string>();
string pattern = "{{*}}";
Regex r = new Regex(pattern, RegexOptions.IgnoreCase);
Match m = r.Match(text);
while (m.Success)
{
list.Add(m.Groups[0].Value);
m = m.NextMatch();
}
return list;
even tried string pattern = "{{[A-Za-z0-9]}}";
thanx
PS. I know close to nothing about regex.

Not only do you want to use {{.+?}} as your regex, you also need to pass RegexOptions.SingleLine. That will treat your entire string as a single line and the . will match \n (which it normally will not do).

Try {{.+}}. The .+ means there has to be at least one character as part of the tag.
EDIT:
To capture the string containing your tags you can do {{(.+)}} and then tokenize your match with the Tokenize or Scanner class?

I would recommend trying something like the following:
List<string> list = new List<string>();
string pattern = "{{(.*?)}}";
Regex r = new Regex(pattern, RegexOptions.IgnoreCase);
Match m = r.Match(text);
while (m.Success)
{
list.Add(m.Groups[1].Value);
m = m.NextMatch();
}
return list;
the regex specifies:
{{ # match {{ literally
( # begin capturing into group #1
.*? # match any characters, from zero to infinite, but be lazy*
) # end capturing group
}} # match }} literally
"lazy" means to attempt to continue matching the pattern afterwards "}}" before backtracking to the .*? and reluctantly adding a character to the capturing group only if the character does not match }} - hope that made sense.
I changed your code by modifying the regex and to extract the first matching group from the regex match object (m.Groups[1].value) instead of the entire match.

{{.*?}} or
{{.+?}}
. - means any symbol
? - means lazy(don't capute nextpattern)

C# Regular Expressions

I have a string that has multiple regular expression groups, and some parts of the string that aren't in the groups. I need to replace a character, in this case ^ only within the groups, but not in the parts of the string that aren't in a regex group.
Here's the input string:
STARTDONTREPLACEME^ENDDONTREPLACEME~STARTREPLACEME^ENDREPLACEME~STARTREPLACEME^BLAH^ENDREPLACEME~STARTDONTREPLACEME^BLAH^ENDDONTREPLACEME~
Here's what the output string should look like:
STARTDONTREPLACEME^ENDDONTREPLACEME~STARTREPLACEMEENDREPLACEME~STARTREPLACEMEBLAHENDREPLACEME~STARTDONTREPLACEME^BLAH^ENDDONTREPLACEME~
I need to do it using C# and can use regular expressions.
I can match the string into groups of those that should and shouldn't be replaced, but am struggling on how to return the final output string.

I'm not sure I get exactly what you're having trouble with, but it didn't take long to come up with this result:
string strRegex = #"STARTREPLACEME(.+)ENDREPLACEME";
RegexOptions myRegexOptions = RegexOptions.None;
Regex myRegex = new Regex(strRegex, myRegexOptions);
string strTargetString = #"STARTDONTREPLACEME^ENDDONTREPLACEME~STARTREPLACEME^ENDREPLACEME~STARTREPLACEME^BLAH^ENDREPLACEME~STARTDONTREPLACEME^BLAH^ENDDONTREPLACEME~";
string strReplace = "STARTREPLACEMEENDREPLACEME";
return myRegex.Replace(strTargetString, strReplace);
By using my favorite online Regex tool: http://regexhero.net/tester/
Is that helpful?

Regex rgx = new Regex(
#"\^(?=(?>(?:(?!(?:START|END)(?:DONT)?REPLACEME).)*)ENDREPLACEME)");
string s1 = rgx.Replace(s0, String.Empty);
Explanation: Each time a ^ is found, the lookahead scans ahead for an ending delimiter (ENDREPLACEME). If it finds one without seeing any of the other delimiters first, the match must have occurred inside a REPLACEME group. If the lookahead reports failure, it indicates that the ^ was found either between groups or within a DONTREPLACEME group.
Because lookaheads are zero-width assertions, only the ^ will actually be consumed in the event of a successful match.
Be aware that this will only work if delimiters are always properly balanced and groups are never nested within other groups.

If you are able to separate into groups that should be replaced and those that shouldn't, then instead of providing a single replacement string, you should be able to use a MatchEvaluator (a delegate that takes a Match and returns a string) to make the decision of which case it is currently dealing with and return the replacement string for that group alone.
You may also use an additional regex inside the MatchEvaluator. This solution produces the expected output:
Regex outer = new Regex(#"STARTREPLACEME.+ENDREPLACEME", RegexOptions.Compiled);
Regex inner = new Regex(#"\^", RegexOptions.Compiled);
string replaced = outer.Replace(start, m =>
{
return inner.Replace(m.Value, String.Empty);
});

Regex Pattern Help

I will have the following possible strings:
12_3
or
12_3+14_1+16_3-400_2
The numbers could be different, but what I'm looking for are the X_X numeric patterns. However, I need to do a replace that will search for 2_3 and NOT return the 12_3 as a valid match.
The +/-'s are arthemtic symbols, and can be any valid value. They also ARENT required (in the example of the first) .. so, I might want to check a string that just has 12_3, and if I pass in 2_3, it would NOT return a match. Only if I passed in 12_3.
This is for a C# script.
Many thanks for any help!! I'm regex stupid.

Ok, we have ,i.e.:
2_3+12_3+14_1+16_3-400_2+2_3
regexp #1:
Regex r1 = new Regex(#"\d+(\d_\d)");
MatchCollection mc = r1.Matches(sourcestring);
Matches Found:
[0][0] = 12_3
[0][1] = 2_3
[1][0] = 14_1
[1][1] = 4_1
[2][0] = 16_3
[2][1] = 6_3
[3][0] = 400_2
[3][1] = 0_2
regexp #2:
Regex r2 = new Regex(#"\d+\d_\d");
MatchCollection mc = r2.Matches(sourcestring);
Matches Found:
[0][0] = 12_3
[1][0] = 14_1
[2][0] = 16_3
[3][0] = 400_2
Is here that what you were looking for?

\b\d+_\d+\b.
\d is digit and \b is a zero-width word boundary. See this C# regex cheat sheet.
UPDATE: I just looked for a "C# regex cheat sheet" to verify that \b was a word boundary (it's \< and \> in grep). That was the first result. I didn't actually verify the text. Anyway, I now link to a different cheat sheet.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to use Regex.Matches with a start index AND RegexOptions - c#

Related

Find all occurrences of regex pattern in string

Regex Match any Word in single or multiple lines [\r\n]

regex pattern for tags needed

C# Regular Expressions

Regex Pattern Help

Categories

Resources