c# regex for number/number/string pattern

c# regex for number/number/string pattern - c#

I am trying to find {a number} / { a number } / {a string} patterns. I can get number / number to work, but when I add / string it does not.
Example of what I'm trying to find:
15/00969/FUL
My regex:
Regex reg = new Regex(#"\d/\d/\w");

You should use + quantifier that means 1 or more times and it applies to the pattern preceding the quantifier, and I would add word boundaries \b to only match whole words:
\b\d+/\d+/\w+\b
C# code (using verbatim string literal so that we just could copy/paste regular expressions from testing tools or services without having to escape backslashes):
var rx = new Regex(#"\b\d+/\d+/\w+\b");
If you want to precise the number of characters corresponding to some pattern, you can use {}s:
\b\d{2}/\d{5}/\w{3}\b
And, finally, if you have only letters in the string, you can use \p{L} (or \p{Lu} to only capture uppercase letters) shorthand class in C#:
\b\d{2}/\d{5}/\p{L}{3}\b
Sample code (also featuring capturing groups introduced with unescaped ( and )):
var rx = new Regex(#"\b(\d{2})/(\d{5})/(\p{L}{3})\b");
var res = rx.Matches("15/00969/FUL").OfType<Match>()
.Select(p => new
{
first_number = p.Groups[1].Value,
second_number = p.Groups[2].Value,
the_string = p.Groups[3].Value
}).ToList();
Output:

Regex reg = new Regex(#"\d+/\d+/\w+");
Complete example:
Regex r = new Regex(#"(\d+)/(\d+)/(\w+)");
string input = "15/00969/FUL";
var m = r.Match(input);
if (m.Success)
{
string a = m.Groups[1].Value; // 15
string b = m.Groups[2].Value; // 00969
string c = m.Groups[3].Value; // FUL
}

You are missing the quantifiers in your Regex
If you want to match 1 or more items you should use the +.
If you already know the number of items you need to match, you can specify it using {x} or {x,y} for a range (x and y being two numbers)
So your regex would become:
Regex reg = new Regex(#"\d/+\d+/\w+");
For example if all the elements you want to match have this format ({2 digit}/{5 digit}/{3 letters}), you could write:
Regex reg = new Regex(#"\d/{2}\d{5}/\w{3}");
And that would match 15/00969/FUL
More info on the Regular Expressions can be found here

bool match = new Regex(#"[\d]+[/][\d]+[/][\w]+").IsMatch("15/00969/FUL"); //true
Regular Expression:
[\d]+ //one or more digits
[\w]+ //one or more alphanumeric characters
[/] // '/'-character

Related

How to split string by another string

I have this string (it's from EDI data):
ISA*ESA?ISA*ESA?
The * indicates it could be any character and can be of any length.
? indicates any single character.
Only the ISA and ESA are guaranteed not to change.
I need this split into two strings which could look like this: "ISA~this is date~ESA|" and
"ISA~this is more data~ESA|"
How do I do this in c#?
I can't use string.split, because it doesn't really have a delimeter.

You can use Regex.Split for accomplishing this
string splitStr = "|", inputStr = "ISA~this is date~ESA|ISA~this is more data~ESA|";
var regex = new Regex($#"(?<=ESA){Regex.Escape(splitStr)}(?=ISA)", RegexOptions.Compiled);
var items = regex.Split(inputStr);
foreach (var item in items) {
Console.WriteLine(item);
}
Output:
ISA~this is date~ESA
ISA~this is more data~ESA|
Note that if your string between the ISA and ESA have the same pattern that we are looking for, then you will have to find some smart way around it.
To explain the Regex a bit:
(?<=ESA) Look-behind assertion. This portion is not captured but still matched
(?=ISA) Look-ahead assertion. This portion is not captured but still matched
Using these look-around assertions you can find the correct | character for splitting

Simply use the
int x = whateverString.indexOf("?ISA"); // replace ? with the actual character here
and then just use the substring from 0 to that indexOf, indexOf to length.
Edit:
If ? is not known,
can we just use the regex Pattern and Matcher.
Matcher matcher = Patter.compile("ISA.*ESA").match(whateverString);
if(matcher.find()) {
matcher.find();
int x = matcher.start();
}
Here x would give that start index of that match.
Edit: I mistakenly saw it as java one, for C#
string pattern = #"ISA.*ESA";
Regex myRegex = new Regex(pattern, RegexOptions.IgnoreCase);
Match m = myRegex.Match(whateverString); // m is the first match
while (m.Success)
{
Console.writeLine(m.value);
m = m.NextMatch(); // more matches
}

RegEx will probably be the best for this. See this link
Mask would be
ISA(?<data1>.*?)ESA.ISA(?<data2>.*?)ESA.
This will give you 2 groups with data you need
Match match = Regex.Match(input, #"ISA(?<data1>.*?)ESA.ISA(?<data2>.*?)ESA.",RegexOptions.IgnoreCase);
if (match.Success)
{
var data1 = match.Groups["data1"].Value;
var data2 = match.Groups["data2"].Value;
}
Use Regex.Matches If you need multiple matches found, and specify different RegexOptions if needed.

It's kinda hacky but you could do...
string x = "ISA*ESA?ISA*ESA?";
x = x.Replace("*","~"); // OR SOME OTHER DELIMITER
string[] y = x.Split('~');
Not perfect in all situations, but it could solve your problem simply.

You could split by "ISA" and "ESA" and then put the parts back together.
string input = "ISA~this is date~ESA|ISA~this is more data~ESA|";
string start = "ISA",
end = "ESA";
var splitedInput = input.Split(new[] { start, end }, StringSplitOptions.None);
var firstPart = $"{start}{splitedInput[1]}{end}{splitedInput[2]}";
var secondPart = $"{start}{splitedInput[3]}{end}{splitedInput[4]}";
firstPart = "ISA~this is date~ESA|"
secondPart = "ISA~this is more data~ESA|";

Use a Regex like ISA(.+?)ESA and select the first group
string input = "ISA~mycontent+ESA";
Match match = Regex.Match(input, #"ISA(.+?)ESA",RegexOptions.IgnoreCase);
if (match.Success)
{
string key = match.Groups[1].Value;
}

Instead of "splitting" by a string, I would instead describe your question as "grouping" by a string. This can easily be done using a regular expression:
Regular expression: ^(ISA.*?(?=ESA)ESA.)(ISA.*?(?=ESA)ESA.)$
Explanation:
^ - asserts position at start of the string
( - start capturing group
ISA - match string ISA exactly
.*?(?=ESA) - match any character 0 or more times, positive lookahead on the
string ESA (basically match any character until the string ESA is found)
ESA - match string ESA exactly
. - match any character
) - end capturing group
repeat one more time...
$ - asserts position at end of the string
Try it on Regex101
Example:
string input = "ISA~this is date~ESA|ISA~this is more data~ESA|";
Regex regex = new Regex(#"^(ISA.*?(?=ESA)ESA.)(ISA.*?(?=ESA)ESA.)$",
RegexOptions.Compiled);
Match match = regex.Match(input);
if (match.Success)
{
string firstValue = match.Groups[1].Value; // "ISA~this is date~ESA|"
string secondValue = match.Groups[2].Value; // "ISA~this is more data~ESA|"
}

There are two answers to the question "How to split a string by another string".
var matches = input.Split(new [] { "ISA" }, StringSplitOptions.RemoveEmptyEntries);
and
var matches = Regex.Split(input, "ISA").ToList();
However, the first removes empty entries, while the second does not.

Regex Matchcollection groups

I already tried two days to solve the Problem, that I have a MatchCollection. In the patter is a Group and I want to have a list with the Solutions of the Group (there were two or more Solutions).
string input = "<tr><td>Mi, 09.09.15</td><td>1</td><td>PK</td><td>E</td><td>123</td><td></td></tr><tr><td>Mi, 09.09.15</td><td>2</td><td>ER</td><td>ER</td><td>234</td><td></td></tr>";
string Patter2 = "^<tr>$?<td>$?[D-M][i-r],[' '][0-3][1-9].[0-1][1-9].[0-9][0-9]$?</td>$?<td>$?([1-9][0-2]?)$?</td>$?";
Regex r2 = new Regex(Patter2);
MatchCollection mc2 = r2.Matches(input);
foreach (Match match in mc2)
{
GroupCollection groups = match.Groups;
string s = groups[1].Value;
Datum2.Text = s;
}
But only the last match (2) appears in the TextBox "Datum2".
I know that I have to use e.g. a listbox, but the Groups[1].Value is a string...
Thanks for your help and time.
Dieter

First thing you need to correct in the code is Datum2.Text = s; would overwrite the text in Datum2 if it were more than one match.
Now, about your regex,
^ forces a match at the begging of the line, so there is really only 1 match. If you remove it, it'll match twice.
I can't seem to understand what was intended with $? all over the pattern (just take them out).
[' '] matches "either a quote, a space or a quote (no need to repeat characters in a character class.
All dots in [0-3][1-9].[0-1][1-9].[0-9][0-9] need to be escaped. A dot matches any character otherwise.
[0-1][1-9] matches all months except "10". The second character shoud be [0-9] (or \d).
Code:
string input = "<tr><td>Mi, 09.09.15</td><td>1</td><td>PK</td><td>E</td><td>123</td><td></td></tr><tr><td>Mi, 09.09.15</td><td>2</td><td>ER</td><td>ER</td><td>234</td><td></td></tr>";
string Patter2 = "<tr><td>[D-M][i-r],[' ][0-3][0-9]\\.[0-1][0-9]\\.[0-9][0-9]</td><td>([1-9][0-2]?)</td>";
Regex r2 = new Regex(Patter2);
MatchCollection mc2 = r2.Matches(input);
string s= "";
foreach (Match match in mc2)
{
GroupCollection groups = match.Groups;
s = s + " " + groups[1].Value;
}
Datum2.Text = s;
Output:
1 2
DEMO
You should know that regex is not the tool to parse HTML. It'll work for simple cases, but for real cases do consider using HTML Agility Pack

In Perl you use brackets to extract your matches what is the equivalent of that in c#

For instance in Perl I can do
$x=~/$(\d+)\s/ which is basically saying from variable x find any number preceded by $ sign and followed by any white space character. Now $1 is equal to the number.
In C# I tried
Regex regex = new Regex(#"$(\d+)\s");
if (regex.IsMatch(text))
{
// need to access matched number here?
}

First off, your regex there $(\d+)\s actually means: find a number after the end of the string. It can never match. You have to escape the $ since it's a metacharacter.
Anyway, the equivalent C# for this is:
var match = Regex.Match(text, #"\$(\d+)\s");
if (match.Success)
{
var number = match.Groups[1].Value;
// ...
}
And, for better maintainability, groups can be named:
var match = Regex.Match(text, #"\$(?<number>\d+)\s");
if (match.Success)
{
var number = match.Groups["number"].Value;
// ...
}
And in this particular case you don't even have to use groups in the first place:
var match = Regex.Match(text, #"(?<=\$)\d+(?=\s)");
if (match.Success)
{
var number = match.Value;
// ...
}

To get a matched result, use Match instead of IsMatch.
var regex = new Regex("^[^#]*#(?<domain>.*)$");
// accessible via
regex.Match("foo#domain.com").Groups["domain"]
// or use an index
regex.Match("foo#domain.com").Matches[0]

Use the Match method instead of IsMatch and you need to escape $ to match it literally because it is a character of special meaning meaning "end of string".
Match m = Regex.Match(s, #"\$(\d+)\s");
if (m.Success) {
Console.WriteLine(m.Groups[1].Value);
}

Extracting Numbers from String RegEx

I am really struggling with Regular Expressions and can't seem to extract the number from this string
"id":143331539043251,
I've tried with this ... but I'm getting compilation errors
var regex = new Regex(#""id:"\d+,");
Note that the full string contains other numbers I don't want. I want numbers between id: and the ending ,

Try this code:
var match = Regex.Match(input, #"\""id\"":(?<num>\d+)");
var yourNumber = match.Groups["num"].Value;
Then use extracted number yourNumber as a string or parse it to number type.

If all you need is the digits, just match on that:
[0-9]+
Note that I am not using \d as that would match on any digit (such as Arabic numerals) in the .NET regex engine.
Update, following comments on the question and on this answer - the following regex will match the pattern and place the matched numbers in a capturing group:
#"""id"":([0-9]+),"
Used as:
Regex.Match(#"""id"":143331539043251,", #"""id"":([0-9]+),").Groups[1].Value
Which returns 143331539043251.

If you are open to using LINQ try the following (c#):
string stringVariable = "123cccccbb---556876---==";
var f = (from a in stringVariable.ToCharArray() where Char.IsDigit(a) == true select a);
var number = String.Join("", f);

Why my \p{L} returns underscore?

I have the following code to parse by Regex:
const string patern = #"^(\p{L}+)_";
var rgx = new Regex(patern);
var str1 = "library_log_12312_12.log";
var m = rgx.Matches(str1);
It returns only one match and it is "library_". I have read a lot of resources and it should not contain underscore, should it?

Your pattern includes the _, so the match does too. If you only want the group, you need to specify that. It'll be in group 1 (as group 0 is always the whole match):
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main(string[] args)
{
var regex = new Regex(#"^(\p{L}+)_");
var input = "library_log_12312_12.log";
var matches = regex.Matches(input);
var match = matches[0];
Console.WriteLine(match.Groups[0]); // library_
Console.WriteLine(match.Groups[1]); // library
}
}

Your regex ends with _ so basically, it matches on one or more Unicode letters, followed by an underscore (which is not a Unicode letter).
The captured group will not contain the _.
Works as expected.

It should contain the underscore as it is in your regular expression.
If you only want to have library as the result, you need to access the first sub-group in the result:
var m = rgx.Matches(str1).Cast<Match>().Select(x => x.Groups[1].Value);

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

c# regex for number/number/string pattern - c#

I am trying to find {a number} / { a number } / {a string} patterns. I can get number / number to work, but when I add / string it does not. Example of what I'm trying to find: 15/00969/FUL My regex: Regex reg = new Regex(#"\d/\d/\w");

Regex reg = new Regex(#"\d+/\d+/\w+"); Complete example: Regex r = new Regex(#"(\d+)/(\d+)/(\w+)"); string input = "15/00969/FUL"; var m = r.Match(input); if (m.Success) { string a = m.Groups[1].Value; // 15 string b = m.Groups[2].Value; // 00969 string c = m.Groups[3].Value; // FUL }

bool match = new Regex(#"[\d]+[/][\d]+[/][\w]+").IsMatch("15/00969/FUL"); //true Regular Expression: [\d]+ //one or more digits [\w]+ //one or more alphanumeric characters [/] // '/'-character

Related

How to split string by another string

Regex Matchcollection groups

In Perl you use brackets to extract your matches what is the equivalent of that in c#

Extracting Numbers from String RegEx

Why my \p{L} returns underscore?

Categories

Resources