Regex for obtaining numeric values within a string in C#

Regex for obtaining numeric values within a string in C# - c#

I have the following example strings:
TAR:100
TAR:100|LED:50
TAR:30|LED:30|ASO:40
I need a regex that obtains the numeric values after the colon, which are always in the range 0 to 100 inclusive.
The result after the regex is applied to any of the above strings should be:
for TAR:100 the result should be 100
for TAR:100|LED:50 the result should be the array [100,50]
for TAR:30|LED:30|ASO:40 the result should be the array [30,30,40]
The word before the colon can have any length and both upper and lowercase.
I have tried with the following but it doesn't yield the result I need:
String text = "TAR:100|LED:50";
String pattern = "\\|?([a-zA-Z]{1,}:)";
string[] values= Regex.Split(text, pattern);
The regex should work whether the string is TAR:100 or TAR:100|LED:50 if possible.

You added () which makes the text parts that you want to remove also be returned.
Below is my solution, with a slightly changed regex.
Note that we need to start looping the values at i = 1, which is purely caused by using Split on a string that starts with a split-sequence; it has nothing to do with the Regex itself.
Explanation: if we used a simpler str.Split to split by a separator "#", then "a#b#c" would produce ["a", "b", "c"], whereas "#b#c" would produce ["", "b", "c"]. In general, and by definition: if Split removes N sequences by which the string gets splitted, then the result is N+1 strings. And all the strings that we deal with here are of the form "#b#c", so there is always an empty first result.
Accepting that as a given fact, the results are usable by starting from i = 1:
var pattern = #"\|?[a-zA-Z]+:";
var testCases = new[] { "TAR:100", "TAR:100|LED:50", "TAR:30|LED:30|ASO:40" };
foreach (var text in testCases)
{
string[] values = Regex.Split(text, pattern);
for (var i = 1; i < values.Length; i++)
Console.WriteLine(values[i]);
Console.WriteLine("------------");
}
Output:
100
------------
100
50
------------
30
30
40
------------
Working DotNetFiddle: https://dotnetfiddle.net/i9kH8n

In .NET you can use the Group.Captures and use the same name for 2 capture groups and match the format of the string.
\b[a-zA-Z]+:(?<numbers>[0-9]+)(?:\|[a-zA-Z]+:(?<numbers>[0-9]+))*\b
Regex demo | C# demo
string[] strings = {
"TAR:100",
"TAR:100|LED:50",
"TAR:30|LED:30|ASO:40"
};
string pattern = #"\b[a-zA-Z]+:(?<numbers>[0-9]+)(?:\|[a-zA-Z]+:(?<numbers>[0-9]+))*\b";
foreach (String str in strings)
{
Match match = Regex.Match(str, pattern);
if (match.Success)
{
string[] result = match.Groups["numbers"].Captures.Select(c => c.Value).ToArray();
Console.WriteLine(String.Join(',', result));
}
}
Output
100
100,50
30,30,40
Another option could be making use of the \G anchor and have the value in capture group 1.
\b(?:[a-zA-Z]+:|\G(?!^))([0-9]+)(?:\||$)
Regex demo | C# demo
string[] strings = {
"TAR:100",
"TAR:100|LED:50",
"TAR:30|LED:30|ASO:40"
};
string pattern = #"\b(?:[a-zA-Z]+:|\G(?!^))([0-9]+)(?:\||$)";
foreach (String str in strings)
{
MatchCollection matches = Regex.Matches(str, pattern);
string[] result = matches.Select(m => m.Groups[1].Value).ToArray();
Console.WriteLine(String.Join(',', result));
}
Output
100
100,50
30,30,40

Related

Regex How to Match 2 fields

How would capture both the filenames inside the quotes, and the numbers following as named captures (Regex / C#)?
Files("fileone.txt", 5969784, "file2.txt", 45345333)
Out of every occurrence in the string, the ability to capture "fileone.txt" and the integer following (a loop cycles each pair)
I am trying to use this https://regex101.com/r/MwMzBo/1 but having issues matching without the '[' and ']'.
Required to be able to loop each filename+size as a pair and moving next.
Any help is appreciated!
UPDATE
string file = "Files(\"fileone.txt\", 5969784, \"file2.txt\", 45345333, \"file2.txt\", 45345333)";
var regex = new Regex(#"(?:\G(?!\A)\s*,\s*|\w+\()(?:""(?<file>.*?)""|'(?<file>.*?)')\s*,\s*(?<number>\d+)");
var match = regex.Match(file);
var names = match.Groups["file"].Captures.Cast<Capture>();
var lengths = match.Groups["number"].Captures.Cast<Capture>();
var filelist = names.Zip(lengths, (f, n) => new { file = f.Value, length = long.Parse(n.Value) }).ToArray();
foreach (var item in filelist)
{
// Only returning 1 pair result, ignoring the rest
}
Reading match.Value to confirm what is being read. Only first pair is being picked up.
while (match.Success)
{
MessageBox.Show(match.Value);
match = match.NextMatch();
}
Now we are getting all results properly. I read, that Regex.Match only returns the first matched result. This explains a lot.

You can use
(?:\G(?!\A)\s*,\s*|\w+\()(?:""(?<file>.*?)""|'(?<file>.*?)')\s*,\s*(?<number>\d+)
See the regex demo
Details:
(?:\G(?!\A)\s*,\s*|\w+\() - end of the previous successful match and a comma enclosed with zero or more whitespaces, or a word and an opening ( char
(?:""(?<file>.*?)""|'(?<file>.*?)') - ", Group "file" capturing any zero or more chars other than a newline char as few as possible and then a ", or a ', Group "file" capturing any zero or more chars other than a newline char as few as possible and then a '
\s*,\s* - a comma enclosed with zero or more whitespaces
(?<number>\d+) - Group "number": one or more digits.

I like doing it in smaller pieces :
string input = "cov('Age', ['5','7','9'])";
string pattern1 = #"\((?'key'[^,]+),\s+\[(?'values'[^\]]+)";
Match match = Regex.Match(input, pattern1);
string key = match.Groups["key"].Value.Trim(new char[] {'\''});
string pattern2 = #"'(?'value'[^']+)'";
string values = match.Groups["values"].Value;
MatchCollection matches = Regex.Matches(values, pattern2);
int[] number = matches.Cast<Match>().Select(x => int.Parse(x.Value.Replace("'",string.Empty))).ToArray();

How to split string by another string

I have this string (it's from EDI data):
ISA*ESA?ISA*ESA?
The * indicates it could be any character and can be of any length.
? indicates any single character.
Only the ISA and ESA are guaranteed not to change.
I need this split into two strings which could look like this: "ISA~this is date~ESA|" and
"ISA~this is more data~ESA|"
How do I do this in c#?
I can't use string.split, because it doesn't really have a delimeter.

You can use Regex.Split for accomplishing this
string splitStr = "|", inputStr = "ISA~this is date~ESA|ISA~this is more data~ESA|";
var regex = new Regex($#"(?<=ESA){Regex.Escape(splitStr)}(?=ISA)", RegexOptions.Compiled);
var items = regex.Split(inputStr);
foreach (var item in items) {
Console.WriteLine(item);
}
Output:
ISA~this is date~ESA
ISA~this is more data~ESA|
Note that if your string between the ISA and ESA have the same pattern that we are looking for, then you will have to find some smart way around it.
To explain the Regex a bit:
(?<=ESA) Look-behind assertion. This portion is not captured but still matched
(?=ISA) Look-ahead assertion. This portion is not captured but still matched
Using these look-around assertions you can find the correct | character for splitting

Simply use the
int x = whateverString.indexOf("?ISA"); // replace ? with the actual character here
and then just use the substring from 0 to that indexOf, indexOf to length.
Edit:
If ? is not known,
can we just use the regex Pattern and Matcher.
Matcher matcher = Patter.compile("ISA.*ESA").match(whateverString);
if(matcher.find()) {
matcher.find();
int x = matcher.start();
}
Here x would give that start index of that match.
Edit: I mistakenly saw it as java one, for C#
string pattern = #"ISA.*ESA";
Regex myRegex = new Regex(pattern, RegexOptions.IgnoreCase);
Match m = myRegex.Match(whateverString); // m is the first match
while (m.Success)
{
Console.writeLine(m.value);
m = m.NextMatch(); // more matches
}

RegEx will probably be the best for this. See this link
Mask would be
ISA(?<data1>.*?)ESA.ISA(?<data2>.*?)ESA.
This will give you 2 groups with data you need
Match match = Regex.Match(input, #"ISA(?<data1>.*?)ESA.ISA(?<data2>.*?)ESA.",RegexOptions.IgnoreCase);
if (match.Success)
{
var data1 = match.Groups["data1"].Value;
var data2 = match.Groups["data2"].Value;
}
Use Regex.Matches If you need multiple matches found, and specify different RegexOptions if needed.

It's kinda hacky but you could do...
string x = "ISA*ESA?ISA*ESA?";
x = x.Replace("*","~"); // OR SOME OTHER DELIMITER
string[] y = x.Split('~');
Not perfect in all situations, but it could solve your problem simply.

You could split by "ISA" and "ESA" and then put the parts back together.
string input = "ISA~this is date~ESA|ISA~this is more data~ESA|";
string start = "ISA",
end = "ESA";
var splitedInput = input.Split(new[] { start, end }, StringSplitOptions.None);
var firstPart = $"{start}{splitedInput[1]}{end}{splitedInput[2]}";
var secondPart = $"{start}{splitedInput[3]}{end}{splitedInput[4]}";
firstPart = "ISA~this is date~ESA|"
secondPart = "ISA~this is more data~ESA|";

Use a Regex like ISA(.+?)ESA and select the first group
string input = "ISA~mycontent+ESA";
Match match = Regex.Match(input, #"ISA(.+?)ESA",RegexOptions.IgnoreCase);
if (match.Success)
{
string key = match.Groups[1].Value;
}

Instead of "splitting" by a string, I would instead describe your question as "grouping" by a string. This can easily be done using a regular expression:
Regular expression: ^(ISA.*?(?=ESA)ESA.)(ISA.*?(?=ESA)ESA.)$
Explanation:
^ - asserts position at start of the string
( - start capturing group
ISA - match string ISA exactly
.*?(?=ESA) - match any character 0 or more times, positive lookahead on the
string ESA (basically match any character until the string ESA is found)
ESA - match string ESA exactly
. - match any character
) - end capturing group
repeat one more time...
$ - asserts position at end of the string
Try it on Regex101
Example:
string input = "ISA~this is date~ESA|ISA~this is more data~ESA|";
Regex regex = new Regex(#"^(ISA.*?(?=ESA)ESA.)(ISA.*?(?=ESA)ESA.)$",
RegexOptions.Compiled);
Match match = regex.Match(input);
if (match.Success)
{
string firstValue = match.Groups[1].Value; // "ISA~this is date~ESA|"
string secondValue = match.Groups[2].Value; // "ISA~this is more data~ESA|"
}

There are two answers to the question "How to split a string by another string".
var matches = input.Split(new [] { "ISA" }, StringSplitOptions.RemoveEmptyEntries);
and
var matches = Regex.Split(input, "ISA").ToList();
However, the first removes empty entries, while the second does not.

3-digit grouping of all numbers in an alphanumeric string

I found it not efficient to iterate through string parts split by space character and extract numeric parts and apply
UInt64.Parse(Regex.Match(numericPart, #"\d+").Value)
and the concatenating them together to form the string with numbers being grouped.
Is there a better, more efficient way to 3-digit grouping of all numbers in an string containing other characters?

I am pretty sure the most efficient way (CPU-wise, with just a single pass over the string) is the basic foreach loop, along these lines
var sb = new StringBuilder()
foreach(char c in inputString)
{
// if c is a digit count
// else reset counter
// if there are three digits insert a "."
}
return sb.ToString()
This will produce 123.456.7
If you want 1.234.567 you'll need an additional buffer for digit-sequences

So you want to replace all longs in a string with the same long but with a number-group-separator of the current culture? .... Yes
string[] words = input.Split();
var newWords = words.Select(w =>
{
long l;
bool isLong = System.Int64.TryParse(w.Trim(), out l);
if(isLong)
return l.ToString("N0");
else
return w;
});
string result = string.Join(" ", newWords);
With the input from your comment:
string input = "hello 134443 in the 33 when 88763 then";
You get the expected result: "hello 134,443 in the 33 when 88,763 then", if your current culture uses comma as number-group-separator.

I will post my regex-based example. I believe regex does not have to be too slow, especially once it is compiled and is declared with static and readonly.
// Declare the regex
private static readonly Regex regex = new Regex(#"(\d)(?=(\d{3})+(?!\d))", RegexOptions.Compiled);
// Then, somewhere inside a method
var replacement = string.Format("$1{0}", System.Globalization.CultureInfo.CurrentCulture.NumberFormat.NumberGroupSeparator); // Get the system digit grouping separator
var strn = "Hello 34234456 where 3334 is it?"; // Just a sample string
// Somewhere (?:inside a loop)?
var res = regex.Replace(strn, replacement);
Output (if , is a system digit grouping separator):
Hello 34,234,456 where 3,334 is it?

Get only wild card value using regular expression

I want to extract only wild card tokens using regular expressions in dotnet (C#).
Like if I use pattern like Book_* (so it match directory wild card), it extract values what match with *.
For Example:
For a string "Book_1234" and pattern "Book_*"
I want to extract "1234"
For a string "Book_1234_ABC" and pattern "Book_*_*"
I should be able to extract 1234 and ABC

This should do it : (DEMO)
string input = "Book_1234_ABC";
MatchCollection matches = Regex.Matches(input, #"_([A-Za-z0-9]*)");
foreach (Match m in matches)
if (m.Success)
Console.WriteLine(m.Groups[1].Value);

The approach to your scenario would be to
Get the List of strings which appears in between the wildcard (*).
Join the lists with regexp divider (|).
replace the regular expression with char which you do not expect in your string (i suppose space should be adequate here)
trim and then split the returned string by char you used in previous step which will return you the list of wildcard characters.
var str = "Book_1234_ABC";
var inputPattern = "Book_*_*";
var patterns = inputPattern.Split('*');
if (patterns.Last().Equals(""))
patterns = patterns.Take(patterns.Length - 1).ToArray();
string expression = string.Join("|", patterns);
var wildCards = Regex.Replace(str, expression, " ").Trim().Split(' ');

I would first convert the '*' wildcard in an equivalent Regex, ie:
* becames \w+
then I use this regex to extract the matches.

When I run this code using your input strings:
using System;
using System.Text.RegularExpressions;
namespace SampleApplication
{
public class Test
{
static Regex reg = new Regex(#"Book_([^_]+)_*(.*)");
static void DoMatch(String value) {
Console.WriteLine("Input: " + value);
foreach (Match item in reg.Matches(value)) {
for (int i = 0; i < item.Groups.Count; ++i) {
Console.WriteLine(String.Format("Group: {0} = {1}", i, item.Groups[i].Value));
}
}
Console.WriteLine("\n");
}
static void Main(string[] args) {
// For a string "Book_1234" and pattern "Book_*" I want to extract "1234"
DoMatch("Book_1234");
// For a string "Book_1234_ABC" and pattern "Book_*_*" I should be able to extract 1234 and ABC
DoMatch("Book_1234_ABC");
}
}
}
I get this console output:
Input: Book_1234
Group: 0 = Book_1234
Group: 1 = 1234
Group: 2 =
Input: Book_1234_ABC
Group: 0 = Book_1234_ABC
Group: 1 = 1234
Group: 2 = ABC

what is the regular expression to remove everything but digit?

I have this data in format
"NEW ITEM:1_BELT:3_JEANS:1_BELT:1_SUIT 3 PCS:1_SHOES:1"
the format is Item1:Item1Qty_Item2:Item2Qty.........ItemN:ItemNQty
I need to separte the the items and their corresponding quantities and form arrays. I did the item part like this..
var allItemsAry = Regex.Replace(myString, "[\\:]+\\d", "").Split('_');
Now allItemsAry is correct like this [NEW ITEM, BELT, JEANS, BELT, SUIT 3 PCS, SHOES]
But I can't figrure out how to get qty, whatever expression I try that 3 from SUIT 3 PCS comes along with that, like these
var allQtyAry = Regex.Replace(dataForPackageConsume, "[^(\\:+\\d)]", "").split(':')
This comes up as :1:3:1:13:1:1 (when replaced). So I can't separate by : to get make it array, as can be seen the forth item is 13, while it should be 1, that 3 is coming from SUIT 3 PCS. I also tried some other variations, but that 3 from SUIT 3 PCS always pops in. How do I just get the quantities of clothes (possible attached with : so I can split them by this and form the array?
UPDATE : If I didn't make it clear before I want the numbers that are exactly preceded by : along with the semicolon.
So, what I want is :1:3:1:1:1:1.

Instead of removing everything except numerals, how about matching only numerals?
For instance:
Regex regex = new Regex(#":\d+");
string result = string.Empty;
foreach (Match match in regex.Matches(input))
result += match.Value;

[^\d:]+|:(?!\d)|(?<!:)\d+
[^\d:]+ will match all non-digit non-:s.
:(?!\d) will match all :s not followed by a digit (negative lookahead).
(?<!:)\d+ will match all digits not preceded by a : (negative lookbehind).
Source
NEW ITEM:1_BELT:3_JEANS:1_BELT:1_SUIT 3 PCS:1_SHOES:1
Regular Expression
[^\d:]+|:(?!\d)|(?<!:)\d+
Results
Match
NEW ITEM
_BELT
_JEANS
_BELT
_SUIT
3
PCS
_SHOES

You want it only numbers like :1:3:1:1:3:1:1 ?
string s = "NEW ITEM:1_BELT:3_JEANS:1_BELT:1_SUIT 3 PCS:1_SHOES:1";
var output = Regex.Replace(s, #"[^0-9]+", "");
StringBuilder sb = new StringBuilder();
foreach (var i in output)
{
sb.Append(":" + i);
}
Console.WriteLine(sb); // :1:3:1:1:3:1:1
Here is a DEMO.
Ok, if every char is digit after : then you can use it like;
string s = "NEW ITEM:1_BELT:3_JEANS:1_BELT:1_SUIT 3 PCS:1_SHOES:1";
var array = s.Split(new char[] { ':' }, StringSplitOptions.RemoveEmptyEntries);
StringBuilder sb = new StringBuilder();
foreach (var item in array)
{
if (Char.IsDigit(item[0]))
{
sb.Append(":" + item[0]);
}
}
Console.WriteLine(sb); //:1:3:1:1:1:1
DEMO.

This will work with one replace:
var allQtyAry = Regex.Replace(dataForPackageConsume, #"[^_:]+:", "").split('_')
Explanation:
[^_:] means match anything that's not a _ or a :
[^_:]+: means match any sequence of at least one character not matching either _ or :, but ending with a :
Since regular expressions are greedy by default (ie they grab as much as possible), matching will start at the beginning of the string or after each _:
NEW ITEM: 1_BELT: 3_JEANS: 1_BELT: 1_SUIT 3 PCS: 1_SHOES: 1
Removing the matched parts (the italic bold bits above) results in:
1_3_1_1_1_1
Splitting by _ results in:
[1, 3, 1, 1, 1, 1]

Try this regex [^:\d+?].*?(?=:), it should do the trick
string[] list = Regex.Replace(test, #"[^:\d+?].*?(?=:)", string.Empty).Split(new char[] { ':' }, StringSplitOptions.RemoveEmptyEntries);
The regex matches and replaces with an empty string everything preceding the colon : (exclusive) .*?(?=:). It also excludes :# from the match [^:\d+?] thus you end up with :1:3:1:1:1:1 before the split

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex for obtaining numeric values within a string in C# - c#

Related

Regex How to Match 2 fields

How to split string by another string

3-digit grouping of all numbers in an alphanumeric string

Get only wild card value using regular expression

what is the regular expression to remove everything but digit?

Categories

Resources