Regex Pattern Help - c#

I will have the following possible strings:
12_3
or
12_3+14_1+16_3-400_2
The numbers could be different, but what I'm looking for are the X_X numeric patterns. However, I need to do a replace that will search for 2_3 and NOT return the 12_3 as a valid match.
The +/-'s are arthemtic symbols, and can be any valid value. They also ARENT required (in the example of the first) .. so, I might want to check a string that just has 12_3, and if I pass in 2_3, it would NOT return a match. Only if I passed in 12_3.
This is for a C# script.
Many thanks for any help!! I'm regex stupid.

Ok, we have ,i.e.:
2_3+12_3+14_1+16_3-400_2+2_3
regexp #1:
Regex r1 = new Regex(#"\d+(\d_\d)");
MatchCollection mc = r1.Matches(sourcestring);
Matches Found:
[0][0] = 12_3
[0][1] = 2_3
[1][0] = 14_1
[1][1] = 4_1
[2][0] = 16_3
[2][1] = 6_3
[3][0] = 400_2
[3][1] = 0_2
regexp #2:
Regex r2 = new Regex(#"\d+\d_\d");
MatchCollection mc = r2.Matches(sourcestring);
Matches Found:
[0][0] = 12_3
[1][0] = 14_1
[2][0] = 16_3
[3][0] = 400_2
Is here that what you were looking for?

\b\d+_\d+\b.
\d is digit and \b is a zero-width word boundary. See this C# regex cheat sheet.
UPDATE: I just looked for a "C# regex cheat sheet" to verify that \b was a word boundary (it's \< and \> in grep). That was the first result. I didn't actually verify the text. Anyway, I now link to a different cheat sheet.

Related

How to use Regex.Matches with a start index AND RegexOptions

There doesn't seem to be a way to specify both RegexOptions and a start index when using Regex.Matches.
According to the docs, there is a way to do both individually, but not together.
In the example below, I want matches to contain only the second hEllo in the string text
string pattern = #"\bhello\b";
string text = "hello world. hEllo";
Regex r = new Regex(pattern);
MatchCollection matches;
// matches nothing
matches = r.Matches(text, 5)
// matches the first occurence
matches = Regex.Matches(text, pattern, RegexOptions.IgnoreCase)
Is there a different way to accomplish this?
I don't believe you can. You should instead instantiate Regex using the desired options:
Regex r = new Regex(pattern, RegexOptions.IgnoreCase);
and then you can simply use your existing code from the first sample, which should now match since we're using the IgnoreCase option:
matches = r.Matches(text, 5);
Applicable constructor docs
Try it online

How to check if a text contains a set of char using regex in C#

I would like to understand if my message includes a fixed set of chars like
<number>:<number>
What I am trying is:
if (!loggingEvent.RenderedMessage.Contains("[0-9]:[0-9]"))
{
...
}
but it is not working as I want. How can I fix it? It is in C#.
EDIT
The whole string is like:
The server IP is -> 127.1.2.35:9001!
A regex approach will look like
if (!Regex.IsMatch(loggingEvent.RenderedMessage, "[0-9]+:[0-9]+"))
Note that String.Contains does not support regex. Also, [0-9] matches 1 digit while you probably want to allow 1 or more (that is what + ensures).
See the online C# demo also extracting that substring:
var s = "The server IP is -> 127.1.2.35:9001!";
var result = Regex.Match(s, #"[0-9]+:[0-9]+");
if (result.Success)
Console.WriteLine(result.Value);
else
Console.WriteLine("No match!");
Regex regex = new Regex(#"[0-9]:[0-9]");
Match match = regex.Match("<number>:<number>");
if (match.Success)
{
Console.WriteLine(match.Value);
}

Regex to get values from a string using C#

I have posted this earlier but did not give clear information on what i was trying to achieve.
I am trying get values from a string using Regex in c#. I am not able to understand why some values i could get and some i can not using a similar approach.
Please find the code snippet below.
Kindly let me know what i am missing.
Thanks in advance.
string text = "0*MAO-001*20160409*20160408*Encounter Data Duplicates Report * *ENC000200800400120160407*PRO*PROD*";
//toget the value 20160409 from the above text
//this code works fine
Regex pattern = new Regex(#"([0][*]MAO[-][0][0][1].*?[*](?<Value>\d+)[*])");
Match match = pattern.Match(text);
string Value = match.Groups["Value"].Value.ToString();
//to get the value ENC000200800400120160407 from the above text
// this does not work and gives me nothing
Regex pattern2 = new Regex(#"([0][*]MAO[-][0][0][1].*?[*].*?[*].*?[*].*?[*].*?[*](?<Value2>\d+)[*])");
Match match2 = pattern.Match(text);
string Value2 = match.Groups["Value2"].Value.ToString();
It looks your file is '*' delimitered.
You can use one single regex to catch all the values
Try use
((?<values>[^\*]+)\*)
as your pattern.
All these values will be catched in values array.
----Update add c# code-----
string text = "0*MAO-001*20160409*20160408*Encounter Data Duplicates Report * *ENC000200800400120160407*PRO*PROD*";
Regex pattern = new Regex(#"(?<values>[^\*]+)\*");
var matches = pattern.Matches(text);
string Value = matches[3].Groups["values"].Captures[0];
string Value2 = matches[6].Groups["values"].Captures[0];
You need to use this for 2nd regex
([0][*]MAO[-][0][0][1].*?[*].*?[*].*?[*].*?[*].*?[*](?<Value2>\w+)[*])
\w is any character from set [A-Za-z0-9_]. You were using only \d which searches for digits [0-9] which was not the case
C# Code
In your second try at using the regex, you are matching with pattern and not pattern2.
Match match2 = pattern.Match(text);
string Value2 = match.Groups["Value2"].Value.ToString();
You are also using the Groups from match and not match2.
This is why it is important to name your variables something meaningful to what they represent. Yes it may be a "pattern" but what does that pattern represent. When you use variables that are vaguely named it creates issues like these.
You almost got it, but the field you're looking for contains letters and digits.
This is your second regex kind of fixed.
([0][*]MAO[-][0][0][1].*?[*](?:.*?[*]){4}(?<Value2>.*?)[*])
( # (1 start)
[0] [*] MAO [-] [0] [0] [1] .*? [*]
(?: .*? [*] ){4}
(?<Value2> .*? ) # (2)
[*]
) # (1 end)
To make it a little less busy, this might be better
(0\*MAO-001.*?\*(?:[^*]*\*){4}(?<Value2>[^*]*)\*)

Regex Matchcollection groups

I already tried two days to solve the Problem, that I have a MatchCollection. In the patter is a Group and I want to have a list with the Solutions of the Group (there were two or more Solutions).
string input = "<tr><td>Mi, 09.09.15</td><td>1</td><td>PK</td><td>E</td><td>123</td><td></td></tr><tr><td>Mi, 09.09.15</td><td>2</td><td>ER</td><td>ER</td><td>234</td><td></td></tr>";
string Patter2 = "^<tr>$?<td>$?[D-M][i-r],[' '][0-3][1-9].[0-1][1-9].[0-9][0-9]$?</td>$?<td>$?([1-9][0-2]?)$?</td>$?";
Regex r2 = new Regex(Patter2);
MatchCollection mc2 = r2.Matches(input);
foreach (Match match in mc2)
{
GroupCollection groups = match.Groups;
string s = groups[1].Value;
Datum2.Text = s;
}
But only the last match (2) appears in the TextBox "Datum2".
I know that I have to use e.g. a listbox, but the Groups[1].Value is a string...
Thanks for your help and time.
Dieter
First thing you need to correct in the code is Datum2.Text = s; would overwrite the text in Datum2 if it were more than one match.
Now, about your regex,
^ forces a match at the begging of the line, so there is really only 1 match. If you remove it, it'll match twice.
I can't seem to understand what was intended with $? all over the pattern (just take them out).
[' '] matches "either a quote, a space or a quote (no need to repeat characters in a character class.
All dots in [0-3][1-9].[0-1][1-9].[0-9][0-9] need to be escaped. A dot matches any character otherwise.
[0-1][1-9] matches all months except "10". The second character shoud be [0-9] (or \d).
Code:
string input = "<tr><td>Mi, 09.09.15</td><td>1</td><td>PK</td><td>E</td><td>123</td><td></td></tr><tr><td>Mi, 09.09.15</td><td>2</td><td>ER</td><td>ER</td><td>234</td><td></td></tr>";
string Patter2 = "<tr><td>[D-M][i-r],[' ][0-3][0-9]\\.[0-1][0-9]\\.[0-9][0-9]</td><td>([1-9][0-2]?)</td>";
Regex r2 = new Regex(Patter2);
MatchCollection mc2 = r2.Matches(input);
string s= "";
foreach (Match match in mc2)
{
GroupCollection groups = match.Groups;
s = s + " " + groups[1].Value;
}
Datum2.Text = s;
Output:
1 2
DEMO
You should know that regex is not the tool to parse HTML. It'll work for simple cases, but for real cases do consider using HTML Agility Pack

Regex.split, how to read left of the matched pattern

I am trying to convert a Perl script to a C# 3.5 routine.
The perl code I have is:
if($work =~ /\<[0-9][0-9][0-9]\>/){
$left = $`;
$match = $&;
$work = $';
}
In C# I wrote the following code:
string[] sSplit = Regex.Split(work, #"\<[0-9][0-9][0-9]\>");
if sSplit.length is > 2
{
left = sSplit[0];
match = sSplit[1];
work = sSPlit[2];
}
However the above is not giving me the matched pattern in sSplit[1], but the content to the right of the matched string instead.
Regex.Split is not what you need. The equivalent to =~ /.../ is Regex.Match.
However, Regex.Match has no equivalent to Perl’s $` or $', so you need to use a workaround, but I think it’s a fair one:
var m = Regex.Match(work, #"^(.*?)(\<[0-9][0-9][0-9]\>)(.*)$", RegexOptions.Singleline);
if (m.Success)
{
left = m.Groups[0].Value;
match = m.Groups[1].Value; // perhaps with Convert.ToInt32()?
work = m.Groups[2].Value;
}
Alternatively, you can use the match index and length to get the stuff:
var m = Regex.Match(work, #"^\<[0-9][0-9][0-9]\>");
if (m.Success)
{
left = work.Substring(0, m.Index);
match = m.Value; // perhaps with Convert.ToInt32()?
work = work.Substring(m.Index + m.Length);
}
When trying regular expressions, I always recomment RegexHero, which is an online tool that visualizes your .NET regular expressions. In this case, use Regex.Match and use Groups. That'll give what you want.
Note that the backslash in \< and \> are not needed in C# (nor in Perl, btw).
Also note that $`, $& and $' have equivalents in C# when used in a replacement expression. If that's what you need in the end, you can use these "magic variables", but only in Regex.Replace.
A split is usually asking to throw away the delimiters. Perl acts just the same way (without the verboten $& type variables.)
You capture delimters in Perl by putting parens around them:
my #parts = split /(<[0-9][0-9][0-9]>)/; # includes the delimiter
my #parts = split /<[0-9][0-9][0-9]>/; # doesn't include the delimiter

Categories

Resources