Regex Named Capture for multiple numbers in string - c#

Given a string
var testData = "1234 test string 987 more test";
I want to be able to use a regex to pull out 1234 and 987. As far as I could tell using
var reg = new Regex(#"?<numbers>\d+");
should do what I want but when I say
var match = reg.match(testData);
I would think that
Assert.AreEqual(match.Groups["numbers"].Captures.Count(), 2);
but it's only 1. What am I doing wrong? Intuition tells me that the
?<group>
means there can only be 0 or 1 of these values. Should I not be using a named group?
*<group>
doesn't seem to work in the regex builder in visual studio but I did not try it in my tests.

Why didn't you use the pattern string as below:
Regex reg = new Regex(#"\d+");
and then get the numbers by:
MatchCollection matches = reg.Matches(testData);
After that, the matches variable contains 2 Match value which represent for 1234 and 987.
You also use the assert as:
Assert.AreEqual(matches.Count, 2);
Hope it will help you!

try {
Regex regexObj = new Regex(#"([\d]+)", RegexOptions.IgnoreCase);
Match matchResults = regexObj.Match(subjectString);
while (matchResults.Success) {
for (int i = 1; i < matchResults.Groups.Count; i++) {
Group groupObj = matchResults.Groups[i];
if (groupObj.Success) {
// matched text: groupObj.Value
// match start: groupObj.Index
// match length: groupObj.Length
}
}
matchResults = matchResults.NextMatch();
}
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}

Related

How to split string by another string

I have this string (it's from EDI data):
ISA*ESA?ISA*ESA?
The * indicates it could be any character and can be of any length.
? indicates any single character.
Only the ISA and ESA are guaranteed not to change.
I need this split into two strings which could look like this: "ISA~this is date~ESA|" and
"ISA~this is more data~ESA|"
How do I do this in c#?
I can't use string.split, because it doesn't really have a delimeter.
You can use Regex.Split for accomplishing this
string splitStr = "|", inputStr = "ISA~this is date~ESA|ISA~this is more data~ESA|";
var regex = new Regex($#"(?<=ESA){Regex.Escape(splitStr)}(?=ISA)", RegexOptions.Compiled);
var items = regex.Split(inputStr);
foreach (var item in items) {
Console.WriteLine(item);
}
Output:
ISA~this is date~ESA
ISA~this is more data~ESA|
Note that if your string between the ISA and ESA have the same pattern that we are looking for, then you will have to find some smart way around it.
To explain the Regex a bit:
(?<=ESA) Look-behind assertion. This portion is not captured but still matched
(?=ISA) Look-ahead assertion. This portion is not captured but still matched
Using these look-around assertions you can find the correct | character for splitting
Simply use the
int x = whateverString.indexOf("?ISA"); // replace ? with the actual character here
and then just use the substring from 0 to that indexOf, indexOf to length.
Edit:
If ? is not known,
can we just use the regex Pattern and Matcher.
Matcher matcher = Patter.compile("ISA.*ESA").match(whateverString);
if(matcher.find()) {
matcher.find();
int x = matcher.start();
}
Here x would give that start index of that match.
Edit: I mistakenly saw it as java one, for C#
string pattern = #"ISA.*ESA";
Regex myRegex = new Regex(pattern, RegexOptions.IgnoreCase);
Match m = myRegex.Match(whateverString); // m is the first match
while (m.Success)
{
Console.writeLine(m.value);
m = m.NextMatch(); // more matches
}
RegEx will probably be the best for this. See this link
Mask would be
ISA(?<data1>.*?)ESA.ISA(?<data2>.*?)ESA.
This will give you 2 groups with data you need
Match match = Regex.Match(input, #"ISA(?<data1>.*?)ESA.ISA(?<data2>.*?)ESA.",RegexOptions.IgnoreCase);
if (match.Success)
{
var data1 = match.Groups["data1"].Value;
var data2 = match.Groups["data2"].Value;
}
Use Regex.Matches If you need multiple matches found, and specify different RegexOptions if needed.
It's kinda hacky but you could do...
string x = "ISA*ESA?ISA*ESA?";
x = x.Replace("*","~"); // OR SOME OTHER DELIMITER
string[] y = x.Split('~');
Not perfect in all situations, but it could solve your problem simply.
You could split by "ISA" and "ESA" and then put the parts back together.
string input = "ISA~this is date~ESA|ISA~this is more data~ESA|";
string start = "ISA",
end = "ESA";
var splitedInput = input.Split(new[] { start, end }, StringSplitOptions.None);
var firstPart = $"{start}{splitedInput[1]}{end}{splitedInput[2]}";
var secondPart = $"{start}{splitedInput[3]}{end}{splitedInput[4]}";
firstPart = "ISA~this is date~ESA|"
secondPart = "ISA~this is more data~ESA|";
Use a Regex like ISA(.+?)ESA and select the first group
string input = "ISA~mycontent+ESA";
Match match = Regex.Match(input, #"ISA(.+?)ESA",RegexOptions.IgnoreCase);
if (match.Success)
{
string key = match.Groups[1].Value;
}
Instead of "splitting" by a string, I would instead describe your question as "grouping" by a string. This can easily be done using a regular expression:
Regular expression: ^(ISA.*?(?=ESA)ESA.)(ISA.*?(?=ESA)ESA.)$
Explanation:
^ - asserts position at start of the string
( - start capturing group
ISA - match string ISA exactly
.*?(?=ESA) - match any character 0 or more times, positive lookahead on the
string ESA (basically match any character until the string ESA is found)
ESA - match string ESA exactly
. - match any character
) - end capturing group
repeat one more time...
$ - asserts position at end of the string
Try it on Regex101
Example:
string input = "ISA~this is date~ESA|ISA~this is more data~ESA|";
Regex regex = new Regex(#"^(ISA.*?(?=ESA)ESA.)(ISA.*?(?=ESA)ESA.)$",
RegexOptions.Compiled);
Match match = regex.Match(input);
if (match.Success)
{
string firstValue = match.Groups[1].Value; // "ISA~this is date~ESA|"
string secondValue = match.Groups[2].Value; // "ISA~this is more data~ESA|"
}
There are two answers to the question "How to split a string by another string".
var matches = input.Split(new [] { "ISA" }, StringSplitOptions.RemoveEmptyEntries);
and
var matches = Regex.Split(input, "ISA").ToList();
However, the first removes empty entries, while the second does not.

Get only wild card value using regular expression

I want to extract only wild card tokens using regular expressions in dotnet (C#).
Like if I use pattern like Book_* (so it match directory wild card), it extract values what match with *.
For Example:
For a string "Book_1234" and pattern "Book_*"
I want to extract "1234"
For a string "Book_1234_ABC" and pattern "Book_*_*"
I should be able to extract 1234 and ABC
This should do it : (DEMO)
string input = "Book_1234_ABC";
MatchCollection matches = Regex.Matches(input, #"_([A-Za-z0-9]*)");
foreach (Match m in matches)
if (m.Success)
Console.WriteLine(m.Groups[1].Value);
The approach to your scenario would be to
Get the List of strings which appears in between the wildcard (*).
Join the lists with regexp divider (|).
replace the regular expression with char which you do not expect in your string (i suppose space should be adequate here)
trim and then split the returned string by char you used in previous step which will return you the list of wildcard characters.
var str = "Book_1234_ABC";
var inputPattern = "Book_*_*";
var patterns = inputPattern.Split('*');
if (patterns.Last().Equals(""))
patterns = patterns.Take(patterns.Length - 1).ToArray();
string expression = string.Join("|", patterns);
var wildCards = Regex.Replace(str, expression, " ").Trim().Split(' ');
I would first convert the '*' wildcard in an equivalent Regex, ie:
* becames \w+
then I use this regex to extract the matches.
When I run this code using your input strings:
using System;
using System.Text.RegularExpressions;
namespace SampleApplication
{
public class Test
{
static Regex reg = new Regex(#"Book_([^_]+)_*(.*)");
static void DoMatch(String value) {
Console.WriteLine("Input: " + value);
foreach (Match item in reg.Matches(value)) {
for (int i = 0; i < item.Groups.Count; ++i) {
Console.WriteLine(String.Format("Group: {0} = {1}", i, item.Groups[i].Value));
}
}
Console.WriteLine("\n");
}
static void Main(string[] args) {
// For a string "Book_1234" and pattern "Book_*" I want to extract "1234"
DoMatch("Book_1234");
// For a string "Book_1234_ABC" and pattern "Book_*_*" I should be able to extract 1234 and ABC
DoMatch("Book_1234_ABC");
}
}
}
I get this console output:
Input: Book_1234
Group: 0 = Book_1234
Group: 1 = 1234
Group: 2 =
Input: Book_1234_ABC
Group: 0 = Book_1234_ABC
Group: 1 = 1234
Group: 2 = ABC

Regex to find and replace a year in a string

I have a string in my c#:
The.Big.Bang.Theory.(2013).S07E05.Release.mp4
I need to find an occurance of (2013), and replace the whole thing, including the brackets, with _ (Three underscores). So the output would be:
The.Big.Bang.Theory._.S07E05.Release.mp4
Is there a regex that can do this? Or is there a better method?
I then do some processing on the new string - but later, need to report that '(2013)' was removed .. so I need to store the value that is replaced.
Tried with your string. It works
string pattern = #"\(\d{4}\)";
string search = "The.Big.Bang.Theory.(2013).S07E05.Release.mp4";
var m = Regex.Replace(search, pattern, "___");
Console.WriteLine(m);
This will find any 4 digits number enclosed in open/close brakets.
If the year number can change, I think that Regex is the best approach .
Instead this code will tell you if there a match for your pattern
var k = Regex.Matches(search, pattern);
if(k.Count > 0)
Console.WriteLine(k[0].Value);
Many of these answers forgot the original question in that you wanted to know what you are replacing.
string pattern = #"\((19|20)\d{2}\)";
string search = "The.Big.Bang.Theory.(2013).S07E05.Release.mp4";
string replaced = Regex.Match(search, pattern).Captures[0].ToString();
string output = Regex.Replace(search, pattern, "___");
Console.WriteLine("found: {0} output: {1}",replaced,output);
gives you the output
found: (2013) output: The.Big.Bang.Theory.___.S07E05.Release.mp4
Here is an explanation of my pattern too.
\( -- match the (
(19|20) -- match the numbers 19 or 20. I assume this is a date for TV shows or movies from 1900 to now.
\d{2} -- match 2 more digits
\) -- match )
Here is a working snippet from a console application, note the regex \(\d{4}\):
var r = new System.Text.RegularExpressions.Regex(#"\(\d{4}\)");
var s = r.Replace("The.Big.Bang.Theory.(2013).S07E05.Release.mp4", "___");
Console.WriteLine(s);
and the output from the console application:
The.Big.Bang.Theory.___.S07E05.Release.mp4
and you can reference this Rubular for proof.
Below is a modified solution taking into consideration your additional requirement:
var m = r.Match("The.Big.Bang.Theory.(2013).S07E05.Release.mp4");
if (m.Success)
{
var s = "The.Big.Bang.Theory.(2013).S07E05.Release.mp4".Replace(m.Value, "___");
var valueReplaced = m.Value;
}
Try this:
string s = "The.Big.Bang.Theory.(2013).S07E05.Release.mp4";
var info = Regex.Split(
Regex.Matches(s, #"\(.*?\)")
.Cast<Match>().First().ToString(), #"[\s,]+");
s = s.Replace(info[0], "___");
Result
The.Big.Bang.Theory.___.S07E05.Release.mp4
try this :
string str="The.Big.Bang.Theory.(2013).S07E05.Release.mp4";
var matches = Regex.Matches(str, #"\([0-9]{4}\)");
List<string> removed=new List<string>();
if (matches.Count > 0)
{
for (int i = 0; i < matches.Count; i++)
{
List.add(matches.value);
}
}
str=Regex.replace(str,#"\([0-9]{4}\)","___");
System.out.println("Removed Strings are:")
foreach(string s in removed )
{
System.out.println(s);
}
output:
Removed Strings are:
(2013)
You don't need a regex for a simple replace (you can use one, but's it's not needed)
var name = "The.Big.Bang.Theory.(2013).S07E05.Release.mp4";
var replacedName = name.Replace("(2013)", "___");

C# pattern creation for receiving an integer from a string

i have some string like the ones below:
hu212 text = 1
reference = 1
racial construction = 1
2007 = 1
20th century history = 2
and i want to take only the integer AFTER the '='.. how can i do that?
i am trying this:
Regex exp = new Regex(#"[a-zA-Z]*[0-9]*[=][0-9]+",RegexOptions.IgnoreCase);
try
{
MatchCollection MatchList = exp.Matches(line);
Match FirstMatch = MatchList[0];
Console.WriteLine(FirstMatch.Value);
}catch(ArgumentOutOfRangeException ex)
{
System.Console.WriteLine("ERROR");
}
but it is not working...
i tryed some others but i get results like "20th" or "hu212"...
What exaclty Matches does? gives me the rest of the string that doesn match with the reg?
Instead of Regex you could also do:
int match = int.Parse(line.SubString(line.IndexOf('=')).Trim());
You need to allow whitespace (\s) between the = and the digits:
Regex pattern = new Regex(#"=\s*([0-9]+)$");
Here's a more complete example:
Regex pattern = new Regex(#"=\s*([0-9]+)$");
Match match = pattern.Match(input);
if (match.Success)
{
int value = int.Parse(match.Groups[1].Value);
// Use the value
}
See it working online: ideone
what about
string str = "hu212 text = 1"
string strSplit = str.split("=")[1].trim();
String StringToParse = "hu212 text = 1";
String[] splitString = String.Split(StringToParse);
Int32 outNum;
Int32.TryParse ( splitString[splitString.Length-1], out outNum );
Regex pattern = new Regex(#"=\s?(\d)");
This allow to have with or without space. The number is in group 1.
hu212 text =1
reference = 1

Using regex to capture a numeric value within a string in C#

I have a string of characters which has 0 or more occurrences of ABC = dddd within it. The dddd stands for an integer value, not necessarily four digits.
What I'd like to do is capture the integer values that occur within this pattern. I know how to perform matches with regexes but I'm new to capturing. It's not necessary to capture all the ABC integer values in one call—it's fine to loop over the string.
If this is too involved I'll just write a tiny parser, but I'd like to use regex if it's reasonably elegant. Expertise greatly appreciated.
First we need to start with a regex that matches the pattern we are looking for. This will match the example you have given (assuming ABC is alphanumeric): \w+\s*=\s*\d+
Next we need to define what we want to capture in a match by defining capture groups. .Net includes support for named capture groups, which I absolutely adore. We specify a group with (?<name for capture>expression), turning our regex into: (?<key>\w+)\s*=\s*(?<value>\d+). This gives us two captures, key and value.
Using this, we can iterate over all matches in your text:
Regex pattern = new Regex(#"(?<key>\w+)\s*=\s*(?<value>\d+)");
string body = "This is your text here. value = 1234";
foreach (Match match in pattern.Matches(body))
{
Console.WriteLine("Found key {0} with value {1}",
match.Groups.Item["key"].Value,
match.Groups.Item["value"].Value
);
}
You can use something like this:
MatchCollection allMatchResults = null;
try {
// This matches a literal '=' and then any number of digits following
Regex regexObj = new Regex(#"=(\d+)");
allMatchResults = regexObj.Matches(subjectString);
if (allMatchResults.Count > 0) {
// Access individual matches using allMatchResults.Item[]
} else {
// Match attempt failed
}
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
Based on your coment, perhaps this is more what you're after:
try {
Regex regexObj = new Regex(#"=(\d+)");
Match matchResults = regexObj.Match(subjectString);
while (matchResults.Success) {
for (int i = 1; i < matchResults.Groups.Count; i++) {
Group groupObj = matchResults.Groups[i];
if (groupObj.Success) {
// matched text: groupObj.Value
// match start: groupObj.Index
// match length: groupObj.Length
}
}
matchResults = matchResults.NextMatch();
}
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}

Categories

Resources