regex to grab 2 strings from line

regex to grab 2 strings from line - c#

is this the correct way to handle this?
string item = "strawb bana .93";
string itemPattern = #"\w*";
string pricePattern = #"\d*\.\d*";
var match = Regex.Match(item, itemPattern, RegexOptions.IgnoreCase);
var match2 = Regex.Match(item, pricePattern, RegexOptions.IgnoreCase);
if (match.Success & match2.Success)
{
Console.WriteLine("match");
Console.WriteLine(match.Value);
Console.WriteLine(match2.Value);
}
else
Console.WriteLine("no match");
is there a more concise way perhaps? Actually, I'm not grabbing the item correctly. Basically, I want to grab the item and price.

Just change the line to this and it should match your item even if it contains spaces:
string itemPattern = #"[a-z\s]*";
UPDATE: A better approach is to use groups:
string item = "strawb bana as .93";
string itemPattern = #"([a-z\s]*)(\d*\.*\d*)";
var match = Regex.Match(item, itemPattern, RegexOptions.IgnoreCase);
if (match.Success)
{
Console.WriteLine("match");
Console.WriteLine("name: "+match.Groups[1].Value);
Console.WriteLine("Price: "+match.Groups[2].Value);
}
else
Console.WriteLine("no match");
Console.Read();

([a-zA-Z\s]+).*?(\d*\.\d{2}) //item in group 1, price in group 2
*case insensitive, matches prices of .93 or 11.93 (digits preceding decimals are optional), also will match a slighter weirder string like "Strawb bana-11.98"
updated: to match items with numbers in them:
([\w\s]+?).?(\d*\.\d{2}) //matches 'item42 Bananas .55'
(clearly you can keep dreaming up inputs and making the pattern progressively more complicated, but maybe I should just go to bed :)

You likely want to combine to one regex and use grouping to capture the parts you want (check my C# as I'm not a C# developer!)
string item = "strawb bana .93";
string pattern = #"([a-z\s]+)(\d*\.\d{2})"
var match = Regex.Match(item,pattern,RegexOptions.IgnoreCase);
if( match.Success ) {
item_name = match.Groups[1].value;
price = match.Groups[2].value;
}
As for the regex itself:
([a-z\s]+) matches one or more (the + sign) letters or spaces and captures them as group 1
(\d*\. starts group two and optionally matches one or more numbers followed by a period
\d{2}) matches exactly two digits and closes the second group
Edit: There seems to be some disagreement about how ? is interpreted by C#. I've changed \d+? to \d* which means "0 or more digits"

Try this regex:
[\w\s]*\d*\.\d*
and by using Grouping constructs, your code should be like this:
string item = "strawb bana .93";
foreach(Match match in Regex.Matches(item, #"(?<str>[\w\s]*)(?<num>\d*\.\d+)"))
{
String str = match.Groups["str"].Value;
String nuum = match.Groups["num"].Value;
}
explain:
(?< name > subexpression)
Captures the matched subexpression into a named group.

Related

How to split string by another string

I have this string (it's from EDI data):
ISA*ESA?ISA*ESA?
The * indicates it could be any character and can be of any length.
? indicates any single character.
Only the ISA and ESA are guaranteed not to change.
I need this split into two strings which could look like this: "ISA~this is date~ESA|" and
"ISA~this is more data~ESA|"
How do I do this in c#?
I can't use string.split, because it doesn't really have a delimeter.

You can use Regex.Split for accomplishing this
string splitStr = "|", inputStr = "ISA~this is date~ESA|ISA~this is more data~ESA|";
var regex = new Regex($#"(?<=ESA){Regex.Escape(splitStr)}(?=ISA)", RegexOptions.Compiled);
var items = regex.Split(inputStr);
foreach (var item in items) {
Console.WriteLine(item);
}
Output:
ISA~this is date~ESA
ISA~this is more data~ESA|
Note that if your string between the ISA and ESA have the same pattern that we are looking for, then you will have to find some smart way around it.
To explain the Regex a bit:
(?<=ESA) Look-behind assertion. This portion is not captured but still matched
(?=ISA) Look-ahead assertion. This portion is not captured but still matched
Using these look-around assertions you can find the correct | character for splitting

Simply use the
int x = whateverString.indexOf("?ISA"); // replace ? with the actual character here
and then just use the substring from 0 to that indexOf, indexOf to length.
Edit:
If ? is not known,
can we just use the regex Pattern and Matcher.
Matcher matcher = Patter.compile("ISA.*ESA").match(whateverString);
if(matcher.find()) {
matcher.find();
int x = matcher.start();
}
Here x would give that start index of that match.
Edit: I mistakenly saw it as java one, for C#
string pattern = #"ISA.*ESA";
Regex myRegex = new Regex(pattern, RegexOptions.IgnoreCase);
Match m = myRegex.Match(whateverString); // m is the first match
while (m.Success)
{
Console.writeLine(m.value);
m = m.NextMatch(); // more matches
}

RegEx will probably be the best for this. See this link
Mask would be
ISA(?<data1>.*?)ESA.ISA(?<data2>.*?)ESA.
This will give you 2 groups with data you need
Match match = Regex.Match(input, #"ISA(?<data1>.*?)ESA.ISA(?<data2>.*?)ESA.",RegexOptions.IgnoreCase);
if (match.Success)
{
var data1 = match.Groups["data1"].Value;
var data2 = match.Groups["data2"].Value;
}
Use Regex.Matches If you need multiple matches found, and specify different RegexOptions if needed.

It's kinda hacky but you could do...
string x = "ISA*ESA?ISA*ESA?";
x = x.Replace("*","~"); // OR SOME OTHER DELIMITER
string[] y = x.Split('~');
Not perfect in all situations, but it could solve your problem simply.

You could split by "ISA" and "ESA" and then put the parts back together.
string input = "ISA~this is date~ESA|ISA~this is more data~ESA|";
string start = "ISA",
end = "ESA";
var splitedInput = input.Split(new[] { start, end }, StringSplitOptions.None);
var firstPart = $"{start}{splitedInput[1]}{end}{splitedInput[2]}";
var secondPart = $"{start}{splitedInput[3]}{end}{splitedInput[4]}";
firstPart = "ISA~this is date~ESA|"
secondPart = "ISA~this is more data~ESA|";

Use a Regex like ISA(.+?)ESA and select the first group
string input = "ISA~mycontent+ESA";
Match match = Regex.Match(input, #"ISA(.+?)ESA",RegexOptions.IgnoreCase);
if (match.Success)
{
string key = match.Groups[1].Value;
}

Instead of "splitting" by a string, I would instead describe your question as "grouping" by a string. This can easily be done using a regular expression:
Regular expression: ^(ISA.*?(?=ESA)ESA.)(ISA.*?(?=ESA)ESA.)$
Explanation:
^ - asserts position at start of the string
( - start capturing group
ISA - match string ISA exactly
.*?(?=ESA) - match any character 0 or more times, positive lookahead on the
string ESA (basically match any character until the string ESA is found)
ESA - match string ESA exactly
. - match any character
) - end capturing group
repeat one more time...
$ - asserts position at end of the string
Try it on Regex101
Example:
string input = "ISA~this is date~ESA|ISA~this is more data~ESA|";
Regex regex = new Regex(#"^(ISA.*?(?=ESA)ESA.)(ISA.*?(?=ESA)ESA.)$",
RegexOptions.Compiled);
Match match = regex.Match(input);
if (match.Success)
{
string firstValue = match.Groups[1].Value; // "ISA~this is date~ESA|"
string secondValue = match.Groups[2].Value; // "ISA~this is more data~ESA|"
}

There are two answers to the question "How to split a string by another string".
var matches = input.Split(new [] { "ISA" }, StringSplitOptions.RemoveEmptyEntries);
and
var matches = Regex.Split(input, "ISA").ToList();
However, the first removes empty entries, while the second does not.

Regex Matchcollection groups

I already tried two days to solve the Problem, that I have a MatchCollection. In the patter is a Group and I want to have a list with the Solutions of the Group (there were two or more Solutions).
string input = "<tr><td>Mi, 09.09.15</td><td>1</td><td>PK</td><td>E</td><td>123</td><td></td></tr><tr><td>Mi, 09.09.15</td><td>2</td><td>ER</td><td>ER</td><td>234</td><td></td></tr>";
string Patter2 = "^<tr>$?<td>$?[D-M][i-r],[' '][0-3][1-9].[0-1][1-9].[0-9][0-9]$?</td>$?<td>$?([1-9][0-2]?)$?</td>$?";
Regex r2 = new Regex(Patter2);
MatchCollection mc2 = r2.Matches(input);
foreach (Match match in mc2)
{
GroupCollection groups = match.Groups;
string s = groups[1].Value;
Datum2.Text = s;
}
But only the last match (2) appears in the TextBox "Datum2".
I know that I have to use e.g. a listbox, but the Groups[1].Value is a string...
Thanks for your help and time.
Dieter

First thing you need to correct in the code is Datum2.Text = s; would overwrite the text in Datum2 if it were more than one match.
Now, about your regex,
^ forces a match at the begging of the line, so there is really only 1 match. If you remove it, it'll match twice.
I can't seem to understand what was intended with $? all over the pattern (just take them out).
[' '] matches "either a quote, a space or a quote (no need to repeat characters in a character class.
All dots in [0-3][1-9].[0-1][1-9].[0-9][0-9] need to be escaped. A dot matches any character otherwise.
[0-1][1-9] matches all months except "10". The second character shoud be [0-9] (or \d).
Code:
string input = "<tr><td>Mi, 09.09.15</td><td>1</td><td>PK</td><td>E</td><td>123</td><td></td></tr><tr><td>Mi, 09.09.15</td><td>2</td><td>ER</td><td>ER</td><td>234</td><td></td></tr>";
string Patter2 = "<tr><td>[D-M][i-r],[' ][0-3][0-9]\\.[0-1][0-9]\\.[0-9][0-9]</td><td>([1-9][0-2]?)</td>";
Regex r2 = new Regex(Patter2);
MatchCollection mc2 = r2.Matches(input);
string s= "";
foreach (Match match in mc2)
{
GroupCollection groups = match.Groups;
s = s + " " + groups[1].Value;
}
Datum2.Text = s;
Output:
1 2
DEMO
You should know that regex is not the tool to parse HTML. It'll work for simple cases, but for real cases do consider using HTML Agility Pack

Two Capturing Groups in Regex

I have string such as
(1)ABC(Some other text)
(2343)DEFGHIJ
(99)Q
I wanted a regex that would capture these strings into two groups like so
ist: (1) 2nd: ABC(Some other text)
1st: (2343) 2nd: DEFGHIJ
ist: (99) 2nd: Q
So I wrote this Regex
var regex new Regex("^\\((\\d+)(.*)\\)");
Match match = regex.Match(str);
But instead of the two groups I expected I get three groups
In the first example I get
(1)ABC(Some other text)
1
)ABC(Some other text
What's wrong?

The regex you are looking for is probably
#"^(\(\d+\))(.*)"
You reversed the order of the (. Note that the groups will be 3, because as someone pointed out, the group 0 is all the matched text. So
string str = "(1)ABC(Some other text)";
var regex = new Regex(#"^(\(\d+\))(.*)");
Match match = regex.Match(str);
if (match.Success)
{
string gr1 = match.Groups[1].Value; // (1)
string gr2 = match.Groups[2].Value; // (Some other text)
}

Get partial string from string

I have the following string:
This isMyTest testing
I want to get isMyTest as a result. I only have two first characters available("is"). The rest of the word can vary.
Basically, I need to select a first word delimeted by spaces which starts with chk.
I've started with the following:
if (text.contains(" is"))
{
text.LastIndexOf(" is"); //Should give me index.
}
now I cannot find the right bound of the word since I need to match on something like

You can use regular expressions:
string pattern = #"\bis";
string input = "This isMyTest testing";
return Regex.Matches(input, pattern);

You can use IndexOf to get the index of the next space:
int startPosition = text.LastIndexOf(" is");
if (startPosition != -1)
{
int endPosition = text.IndexOf(' ', startPosition + 1); // Find next space
if (endPosition == -1)
endPosition = text.Length - 1; // Select end if this is the last word?
}

What about using a regex match? Generally if you are searching for a pattern in a string (ie starting with a space followed by some other character) regex are perfectly suited to this. Regex statements really only fall apart in contextually sensitive areas (such as HTML) but are perfect for a regular string search.
// First we see the input string.
string input = "/content/alternate-1.aspx";
// Here we call Regex.Match.
Match match = Regex.Match(input, #"[ ]is[A-z0-9]*", RegexOptions.IgnoreCase);
// Here we check the Match instance.
if (match.Success)
{
// Finally, we get the Group value and display it.
string key = match.Groups[1].Value;
Console.WriteLine(key);
}

C# regular expression trouble

Problem!
I Have the following input (rules) from a flat file (talking about numeric input):
Input might be a natural number (below 1000): 1, 10, 100, 999, ...
Input might be a comma separated number surrounded by quotes (above 1000): "1,000", "2,000", "3,000", "10,000", ...
I Have the following regular expression to validate the input: (?:(\d+)|\x22([0-9]+(?:,[0-9]+)*)\x22), So for an input like 10 I'm expecting in the first matching group 10, which is exactly what I got. But when I got an input like "10,000" I'm expecting in the first matching group 10,000, but it is stored at the second matching group.
Example
string text1 = "\"" + "10,000" + "\"";
string text2 = "50";
string pattern = #"(\d+)|\x22([0-9]+(?:,[0-9]+){0,})\x22";
Match match1 = Regex.Match(text1, pattern);
Match match2 = Regex.Match(text2, pattern);
if (match1.Success)
{
Console.WriteLine("Match#1 Group#1: " + match1.Groups[1].Value);
Console.WriteLine("Match#1 Group#2: " + match1.Groups[2].Value);
# Outputs
# Match#1 Group#1:
# Match#1 Group#2: 10,000
}
if (match2.Success)
{
Console.WriteLine("Match#2 Group#1: " + match2.Groups[1].Value);
Console.WriteLine("Match#2 Group#2: " + match2.Groups[2].Value);
# Outputs
# Match#2 Group#1: 50
# Match#2 Group#2:
}
Expected Result
Both results on the same matching group, in this case 1
Questions?
What am I doing wrong? I'm just getting bad grouping from the regular expression matches.
Also, I'm using filehelpers .NET to parse the file, is there any other way to resolve this problem. Actualy I'm trying to implement a custom converter.
Object File
[FieldConverter(typeof(OOR_Quantity))]
public Int32 Quantity;
OOR_Quantity
internal class OOR_Quantity : ConverterBase
{
public override object StringToField(string from)
{
string pattern = #"(?:(\d+)|\x22([0-9]+(?:,[0-9]+)*)\x22)";
Regex regex = new Regex(pattern);
if (regex.IsMatch(from))
{
Match match = regex.Match(from);
return int.Parse(match.Groups[1].Value);
}
throw new ...
}
}

Group numbers are assigned purely on the basis of their positions in the regex--specifically, the relative position of the opening bracket, (. In your regex, (\d+) is the first group and ([0-9]+(?:,[0-9]+)*) is the second.
If you want to refer to them both with the same identifier, use named groups and give them both the same name:
#"(?:(?<NUMBER>\d+)|\x22(?<NUMBER>[0-9]+(?:,[0-9]+)*)\x22)"
Now you can retrieve the captured value as match.Groups["NUMBER"].Value.

I tested the regex below with Ruby:
text1 = "\"10,000\""
text2 = "50"
regex = /"?([0-9]+(?:,[0-9]+){0,})"?/
text1 =~ regex
puts "#$1"
text2 =~ regex
puts "#$1"
The result is:
10,000
50
I think you can rewrite in C#. Isn't it enough for you?

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

regex to grab 2 strings from line - c#

Related

How to split string by another string

Regex Matchcollection groups

Two Capturing Groups in Regex

Get partial string from string

C# regular expression trouble

Categories

Resources