Two Capturing Groups in Regex

Two Capturing Groups in Regex - c#

I have string such as
(1)ABC(Some other text)
(2343)DEFGHIJ
(99)Q
I wanted a regex that would capture these strings into two groups like so
ist: (1) 2nd: ABC(Some other text)
1st: (2343) 2nd: DEFGHIJ
ist: (99) 2nd: Q
So I wrote this Regex
var regex new Regex("^\\((\\d+)(.*)\\)");
Match match = regex.Match(str);
But instead of the two groups I expected I get three groups
In the first example I get
(1)ABC(Some other text)
1
)ABC(Some other text
What's wrong?

The regex you are looking for is probably
#"^(\(\d+\))(.*)"
You reversed the order of the (. Note that the groups will be 3, because as someone pointed out, the group 0 is all the matched text. So
string str = "(1)ABC(Some other text)";
var regex = new Regex(#"^(\(\d+\))(.*)");
Match match = regex.Match(str);
if (match.Success)
{
string gr1 = match.Groups[1].Value; // (1)
string gr2 = match.Groups[2].Value; // (Some other text)
}

Related

Regex get nth value separated with slash

I have a universal regex code where it uses Groups[1] value to extract the result. It's easy to extract SN and Ref by just giving a sn=(.*?)\. pattern. But it's so difficult to get for example, PKSC and V928. I have to use Groups[1] because users who use this application can choose their own value to display. It can be NC339 or PKXC.
//var source = "SN=1395939213.#variable/OGT84/PKXC/Undetermined.Thank You#{customer}"
//sometimes like this
var source = "SN=8029758034.Ref=BFO7Y95B3KN5#resolved/NC339/V928/ClearenceBBF.Brief#{supervisor}/verified"
var value = Regex.Match(source, pattern, RegexOptions.IgnoreCase | RegexOptions.Singleline).Groups[1].Value

You can use
^(?:[^/]*/){2}([^/]+)
See the regex demo.
Details
^ - start of a string
(?:[^/]*/){2} - two occurrences of any chars other than / and then a /
([^/]+) - Group 1: one or more chars other than /.

Try following :
string pattern = #"SN=(?'sn'\d+).*\{(?'supervisor'[^}]+)";
string input = "SN=1395939213.#variable/OGT84/PKXC/Undetermined.Thank You#{customer}";
Match match = Regex.Match(input, pattern);
string sn = match.Groups["sn"].Value;
string supervisor = match.Groups["supervisor"].Value;

Extract substring with Regex

im trying to extract a substring with regex but im having some troubles...
The string is build from a columns of strings and i need the the 4th column only
string stringToExtractFrom = "289 120 00001110 ??
4Control#SimApi##QAEAAV01#ABV01##Z = ??4Control#SimApi##QAEAAV01#ABV01##Z
(public: class SimApi::Control & __thiscall SimApi::Control::operator=(class
SimApi::Control const &))"
string pattern = #"\s+\d+\s+\d+\s+\S+\s(.*)\=";
RegexOptions options = RegexOptions.Multiline;
Regex regX = new Regex(pattern, options);
Match m = regX.Match(stringToExtractFrom);
while (m.Success)
{
Group g = m.Groups[1];
defData += g+"\n";
m = m.NextMatch();
}
this is the wanted string: ??
4Control#SimApi##QAEAAV01#ABV01##Z
with the string below it worked when i got the substring i want as a group
1 0 00002E00 ??0ADOFactory#SimApiEx##QAE#ABV01##Z =
??0ADOFactory#SimApiEx##QAE#ABV01##Z (public: __thiscall
SimApiEx::ADOFactory::ADOFactory(class SimApiEx::ADOFactory const &))

If the second string works for you and the first one does not, you might first match 1+ digits and use \S+ for the third part. Then use a negated character class to capture matching not an equals sign:
\d+\s+\d+\s+\S+\s+([^=]+) =
See a .NET regex demo | C# Demo

How to split string by another string

I have this string (it's from EDI data):
ISA*ESA?ISA*ESA?
The * indicates it could be any character and can be of any length.
? indicates any single character.
Only the ISA and ESA are guaranteed not to change.
I need this split into two strings which could look like this: "ISA~this is date~ESA|" and
"ISA~this is more data~ESA|"
How do I do this in c#?
I can't use string.split, because it doesn't really have a delimeter.

You can use Regex.Split for accomplishing this
string splitStr = "|", inputStr = "ISA~this is date~ESA|ISA~this is more data~ESA|";
var regex = new Regex($#"(?<=ESA){Regex.Escape(splitStr)}(?=ISA)", RegexOptions.Compiled);
var items = regex.Split(inputStr);
foreach (var item in items) {
Console.WriteLine(item);
}
Output:
ISA~this is date~ESA
ISA~this is more data~ESA|
Note that if your string between the ISA and ESA have the same pattern that we are looking for, then you will have to find some smart way around it.
To explain the Regex a bit:
(?<=ESA) Look-behind assertion. This portion is not captured but still matched
(?=ISA) Look-ahead assertion. This portion is not captured but still matched
Using these look-around assertions you can find the correct | character for splitting

Simply use the
int x = whateverString.indexOf("?ISA"); // replace ? with the actual character here
and then just use the substring from 0 to that indexOf, indexOf to length.
Edit:
If ? is not known,
can we just use the regex Pattern and Matcher.
Matcher matcher = Patter.compile("ISA.*ESA").match(whateverString);
if(matcher.find()) {
matcher.find();
int x = matcher.start();
}
Here x would give that start index of that match.
Edit: I mistakenly saw it as java one, for C#
string pattern = #"ISA.*ESA";
Regex myRegex = new Regex(pattern, RegexOptions.IgnoreCase);
Match m = myRegex.Match(whateverString); // m is the first match
while (m.Success)
{
Console.writeLine(m.value);
m = m.NextMatch(); // more matches
}

RegEx will probably be the best for this. See this link
Mask would be
ISA(?<data1>.*?)ESA.ISA(?<data2>.*?)ESA.
This will give you 2 groups with data you need
Match match = Regex.Match(input, #"ISA(?<data1>.*?)ESA.ISA(?<data2>.*?)ESA.",RegexOptions.IgnoreCase);
if (match.Success)
{
var data1 = match.Groups["data1"].Value;
var data2 = match.Groups["data2"].Value;
}
Use Regex.Matches If you need multiple matches found, and specify different RegexOptions if needed.

It's kinda hacky but you could do...
string x = "ISA*ESA?ISA*ESA?";
x = x.Replace("*","~"); // OR SOME OTHER DELIMITER
string[] y = x.Split('~');
Not perfect in all situations, but it could solve your problem simply.

You could split by "ISA" and "ESA" and then put the parts back together.
string input = "ISA~this is date~ESA|ISA~this is more data~ESA|";
string start = "ISA",
end = "ESA";
var splitedInput = input.Split(new[] { start, end }, StringSplitOptions.None);
var firstPart = $"{start}{splitedInput[1]}{end}{splitedInput[2]}";
var secondPart = $"{start}{splitedInput[3]}{end}{splitedInput[4]}";
firstPart = "ISA~this is date~ESA|"
secondPart = "ISA~this is more data~ESA|";

Use a Regex like ISA(.+?)ESA and select the first group
string input = "ISA~mycontent+ESA";
Match match = Regex.Match(input, #"ISA(.+?)ESA",RegexOptions.IgnoreCase);
if (match.Success)
{
string key = match.Groups[1].Value;
}

Instead of "splitting" by a string, I would instead describe your question as "grouping" by a string. This can easily be done using a regular expression:
Regular expression: ^(ISA.*?(?=ESA)ESA.)(ISA.*?(?=ESA)ESA.)$
Explanation:
^ - asserts position at start of the string
( - start capturing group
ISA - match string ISA exactly
.*?(?=ESA) - match any character 0 or more times, positive lookahead on the
string ESA (basically match any character until the string ESA is found)
ESA - match string ESA exactly
. - match any character
) - end capturing group
repeat one more time...
$ - asserts position at end of the string
Try it on Regex101
Example:
string input = "ISA~this is date~ESA|ISA~this is more data~ESA|";
Regex regex = new Regex(#"^(ISA.*?(?=ESA)ESA.)(ISA.*?(?=ESA)ESA.)$",
RegexOptions.Compiled);
Match match = regex.Match(input);
if (match.Success)
{
string firstValue = match.Groups[1].Value; // "ISA~this is date~ESA|"
string secondValue = match.Groups[2].Value; // "ISA~this is more data~ESA|"
}

There are two answers to the question "How to split a string by another string".
var matches = input.Split(new [] { "ISA" }, StringSplitOptions.RemoveEmptyEntries);
and
var matches = Regex.Split(input, "ISA").ToList();
However, the first removes empty entries, while the second does not.

Regex Matchcollection groups

I already tried two days to solve the Problem, that I have a MatchCollection. In the patter is a Group and I want to have a list with the Solutions of the Group (there were two or more Solutions).
string input = "<tr><td>Mi, 09.09.15</td><td>1</td><td>PK</td><td>E</td><td>123</td><td></td></tr><tr><td>Mi, 09.09.15</td><td>2</td><td>ER</td><td>ER</td><td>234</td><td></td></tr>";
string Patter2 = "^<tr>$?<td>$?[D-M][i-r],[' '][0-3][1-9].[0-1][1-9].[0-9][0-9]$?</td>$?<td>$?([1-9][0-2]?)$?</td>$?";
Regex r2 = new Regex(Patter2);
MatchCollection mc2 = r2.Matches(input);
foreach (Match match in mc2)
{
GroupCollection groups = match.Groups;
string s = groups[1].Value;
Datum2.Text = s;
}
But only the last match (2) appears in the TextBox "Datum2".
I know that I have to use e.g. a listbox, but the Groups[1].Value is a string...
Thanks for your help and time.
Dieter

First thing you need to correct in the code is Datum2.Text = s; would overwrite the text in Datum2 if it were more than one match.
Now, about your regex,
^ forces a match at the begging of the line, so there is really only 1 match. If you remove it, it'll match twice.
I can't seem to understand what was intended with $? all over the pattern (just take them out).
[' '] matches "either a quote, a space or a quote (no need to repeat characters in a character class.
All dots in [0-3][1-9].[0-1][1-9].[0-9][0-9] need to be escaped. A dot matches any character otherwise.
[0-1][1-9] matches all months except "10". The second character shoud be [0-9] (or \d).
Code:
string input = "<tr><td>Mi, 09.09.15</td><td>1</td><td>PK</td><td>E</td><td>123</td><td></td></tr><tr><td>Mi, 09.09.15</td><td>2</td><td>ER</td><td>ER</td><td>234</td><td></td></tr>";
string Patter2 = "<tr><td>[D-M][i-r],[' ][0-3][0-9]\\.[0-1][0-9]\\.[0-9][0-9]</td><td>([1-9][0-2]?)</td>";
Regex r2 = new Regex(Patter2);
MatchCollection mc2 = r2.Matches(input);
string s= "";
foreach (Match match in mc2)
{
GroupCollection groups = match.Groups;
s = s + " " + groups[1].Value;
}
Datum2.Text = s;
Output:
1 2
DEMO
You should know that regex is not the tool to parse HTML. It'll work for simple cases, but for real cases do consider using HTML Agility Pack

regex to grab 2 strings from line

is this the correct way to handle this?
string item = "strawb bana .93";
string itemPattern = #"\w*";
string pricePattern = #"\d*\.\d*";
var match = Regex.Match(item, itemPattern, RegexOptions.IgnoreCase);
var match2 = Regex.Match(item, pricePattern, RegexOptions.IgnoreCase);
if (match.Success & match2.Success)
{
Console.WriteLine("match");
Console.WriteLine(match.Value);
Console.WriteLine(match2.Value);
}
else
Console.WriteLine("no match");
is there a more concise way perhaps? Actually, I'm not grabbing the item correctly. Basically, I want to grab the item and price.

Just change the line to this and it should match your item even if it contains spaces:
string itemPattern = #"[a-z\s]*";
UPDATE: A better approach is to use groups:
string item = "strawb bana as .93";
string itemPattern = #"([a-z\s]*)(\d*\.*\d*)";
var match = Regex.Match(item, itemPattern, RegexOptions.IgnoreCase);
if (match.Success)
{
Console.WriteLine("match");
Console.WriteLine("name: "+match.Groups[1].Value);
Console.WriteLine("Price: "+match.Groups[2].Value);
}
else
Console.WriteLine("no match");
Console.Read();

([a-zA-Z\s]+).*?(\d*\.\d{2}) //item in group 1, price in group 2
*case insensitive, matches prices of .93 or 11.93 (digits preceding decimals are optional), also will match a slighter weirder string like "Strawb bana-11.98"
updated: to match items with numbers in them:
([\w\s]+?).?(\d*\.\d{2}) //matches 'item42 Bananas .55'
(clearly you can keep dreaming up inputs and making the pattern progressively more complicated, but maybe I should just go to bed :)

You likely want to combine to one regex and use grouping to capture the parts you want (check my C# as I'm not a C# developer!)
string item = "strawb bana .93";
string pattern = #"([a-z\s]+)(\d*\.\d{2})"
var match = Regex.Match(item,pattern,RegexOptions.IgnoreCase);
if( match.Success ) {
item_name = match.Groups[1].value;
price = match.Groups[2].value;
}
As for the regex itself:
([a-z\s]+) matches one or more (the + sign) letters or spaces and captures them as group 1
(\d*\. starts group two and optionally matches one or more numbers followed by a period
\d{2}) matches exactly two digits and closes the second group
Edit: There seems to be some disagreement about how ? is interpreted by C#. I've changed \d+? to \d* which means "0 or more digits"

Try this regex:
[\w\s]*\d*\.\d*
and by using Grouping constructs, your code should be like this:
string item = "strawb bana .93";
foreach(Match match in Regex.Matches(item, #"(?<str>[\w\s]*)(?<num>\d*\.\d+)"))
{
String str = match.Groups["str"].Value;
String nuum = match.Groups["num"].Value;
}
explain:
(?< name > subexpression)
Captures the matched subexpression into a named group.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Two Capturing Groups in Regex - c#

Related

Regex get nth value separated with slash

Extract substring with Regex

How to split string by another string

Regex Matchcollection groups

regex to grab 2 strings from line

Categories

Resources