Regex to get numbers after a period in a string - c#

I'm trying to find the right regex to extract the numbers after the . in the string below. E.g, the first line should return and array of 1 1 1 1 1, the second should return 2 1 0 1 2. I can't seem to figure the correct regex expression to achieve this. Any help would be appreciated.
line = 0.1, 1.1, 2.1, 3.1, 4.1 // payline 0
line = 0.2, 1.1, 2.0, 3.1, 4.2 // payline 1
So far, I have the code below, but it just returns all the the numbers in the sting instead. eg, the first line returns 0 1 1 1 2 1 3 1 4 1 0 and the second returns 0 2 1 1 2 0 3 1 4 2 1
foreach (var line in Paylines)
{
int[] lines = (from Match m in Regex.Matches(line.ToString(), #"\d+")
select int.Parse(m.Value)).ToArray();
foreach (var x in lines)
{
Console.WriteLine(x.ToString());
}
}

You may use a lookbehind-based regex solution:
#"(?<=\.)\d+"
It matches 1+ digits after a dot without placing the dot into a match value.
See the regex demo.
In C#, you may use
var myVals = Regex.Matches(line, #"(?<=\.)\d+", RegexOptions.ECMAScript)
.Cast<Match>()
.Select(m => int.Parse(m.Value))
.ToList();
See the C# demo.
The RegexOptions.ECMAScript option is passed for the \d to only match ASCII digits in the [0-9] range and avoid matching other Unicode digits.

Related

Get only the longest match from the groups in regex

I have various strings like '10001110110', '10000', '100001', '00011','0001', '111000' etc..
I need to find out the longest possible combination of 1s with no or 1 zero in between.
I have got a regex like this - (?=(1+01+))
But its not returning a group where there is no leading or trailing one.
I want to regex to consider this case too.
Currently its returning all groups
Eg. if the input string is '10110111' it returns 3 groups
{null, 1011}, {null, 110111} and {null, 10111}
I want my regex to return only 1 match with the longest combination. Is it possible to do so?
For the following rule:
I need to find out the longest possible combination of 1s with no or 1
zero in between.
you can capture 1+ times a 1, and then optionally match 0 followed by again 1+ times a 1 in the lookahead assertion.
(?=(1+(?:0?1+)?))
Regex demo | C# demo
To get the longest result, you can process the matches, and then sort by the length of the string, and then get the first result from the collection.
string pattern = #"(?=(1+(?:0?1+)?))";
string input = #"10001110110 10000 100001 00011 0001 111000 101110111011011";
var result = Regex.Matches(input, pattern)
.Select(m => m.Groups[1].Value)
.OrderByDescending(s => s.Length)
.FirstOrDefault();
Console.WriteLine(result);
Output
1110111

Removing whitespace between consecutive numbers

I have a string, from which I want to remove the whitespaces between the numbers:
string test = "Some Words 1 2 3 4";
string result = Regex.Replace(test, #"(\d)\s(\d)", #"$1$2");
the expected/desired result would be:
"Some Words 1234"
but I retrieve the following:
"Some Words 12 34"
What am I doing wrong here?
Further examples:
Input: "Some Words That Should not be replaced 12 9 123 4 12"
Output: "Some Words That Should not be replaced 129123412"
Input: "test 9 8"
Output: "test 98"
Input: "t e s t 9 8"
Output: "t e s t 98"
Input: "Another 12 000"
Output: "Another 12000"
Regex.Replace continues to search after the previous match:
Some Words 1 2 3 4
^^^
first match, replace by "12"
Some Words 12 3 4
^
+-- continue searching here
Some Words 12 3 4
^^^
next match, replace by "34"
You can use a zero-width positive lookahead assertion to avoid that:
string result = Regex.Replace(test, #"(\d)\s(?=\d)", #"$1");
Now the final digit is not part of the match:
Some Words 1 2 3 4
^^?
first match, replace by "1"
Some Words 12 3 4
^
+-- continue searching here
Some Words 12 3 4
^^?
next match, replace by "2"
...
Your regex consumes the digit on the right. (\d)\s(\d) matches and captures 1 in Some Words 1 2 3 4 into Group 1, then matches 1 whitespace, and then matches and consumes (i.e. adds to the match value and advances the regex index) 2. Then, the regex engine tries to find another match from the current index, that is already after 1 2. So, the regex does not match 2 3, but finds 3 4.
Here is your regex demo and a diagram showing that:
Also, see the process of matching here:
Use lookarounds instead that are non-consuming:
(?<=\d)\s+(?=\d)
See the regex demo
Details
(?<=\d) - a positive lookbehind that matches a location in string immediately preceded with a digit
\s+ - 1+ whitespaces
(?=\d) - a positive lookahead that matches a location in string immediately followed with a digit.
C# demo:
string test = "Some Words 1 2 3 4";
string result = Regex.Replace(test, #"(?<=\d)\s+(?=\d)", "");
See the online demo:
var strs = new List<string> {"Some Words 1 2 3 4", "Some Words That Should not be replaced 12 9 123 4 12", "test 9 8", "t e s t 9 8", "Another 12 000" };
foreach (var test in strs)
{
Console.WriteLine(Regex.Replace(test, #"(?<=\d)\s+(?=\d)", ""));
}
Output:
Some Words 1234
Some Words That Should not be replaced 129123412
test 98
t e s t 98
Another 12000

RegEx to find numbers sequence in string separated by space with predefined maximum length

Sorry for the confusing title, I'll try to explain this with example. Currently we have this expression to find number sequence in a string
\b((\d[ ]{0,1}){13,19})\b
Now I'd like to modify it so it fulfills these rule
- The length should be between 13 to 19 characters, excluding the whitespaces
- Each number cluster must have minimum 3 digits
The expression should mark these as matched:
1234567890123
1234 5678 9012 345
Not match:
123456789012 3
123 12 123 1 23134
Current expression that I have will mark all of them as match.
Example
This is possible using look-around.
The regex can be changed to the following:
\b(?<!\d )(?=(?:\d ?){13,19}(?! ?\d))(?:\d{3,} ?)+\b(?! ?\d)
This works by looking ahead to make sure the number is between 13 and 19 digits long. It then matches groups of 3 or more digits. It then uses negative look ahead after its found all groups of 3 to make sure there aren't any numbers left. If there are, we've found a group smaller than 3. This works on the examples you've provided.
\b Makes sure that its the start of a "word".
(?<!\d ) Make sure there are no numbers behind.
(?=(?:\d ?){13,19}(?! ?\d)) Looks ahead to make sure the number is between 13 and 19 digits long
(?:\d ?){13,19} From original. ?: added to make non-capturing
(?! ?\d) Negative look ahead: if there is still digits left after getting 19 digits, too big therefore discard current match
(?:\d{3,} ?)+ Match any number of clusters bigger than 3 (min 13, max 19 handled by first look ahead)
\b(?! ?\d) Looks for the end of a cluster. If there are still numbers left after the end of the cluster, there must be a cluster that is too small.
Test here
I suggest the following solution also based on lookarounds:
\b\d(?!\d?\b)(?: ?\d(?!(?<= \d)\d?\b)){12,18}\b
See the regex demo
The main point is that we only match the next digit if it is not a part of a 1- or 2-digit group.
Pattern explanation
\b - starting word boundary
\d(?!\d?\b) - a digit that is not followed with 1 or 0 digits and then a trailing word boundary (that is, if it is 12 or 1 like group, it is failed)
(?: ?\d(?!(?<= \d)\d?\b)){12,18} - 12 to 18 occurrences of:
? - 1 or 0 spaces
\d(?!(?<= \d)\d?\b) - any single digit that is not followed with 1 or 0 digits followed with a word boundary (thanks to the (?!\d?\b)), and if that 1 or 0 digits are preceded with space + 1 digit ((?<= \d) lookbehind does that)
\b - a trailing word boundary.
NOTE that in case you want to match these strings in a non-numeric context (that means, if you do not want to allow any digits on the left and on the right) you might also consider adding (?<!\d *) at the front and (?! *\d) at the end of the pattern.
Note that to match any whitespace, you may replace a literal space with \s in the pattern.
If you can use Linq, this will be way easier to maintain:
var myList = new List<string>
{
"1234567890123",
"1234 5678 9012 345",
"123456789012 3",
"123 12 123 1 23134"
};
foreach(var input in myList)
{
var splitted = Regex.Split(input, #"\s+"); // Split on whitespace
var length = splitted.Sum(x => x.Length); // Compute the total length
var smallestGroupSize = splitted.Min(x => x.Length); // Compute the length of the smallest chunck
Console.WriteLine($"Total lenght: {length}, smallest group size: {smallestGroupSize}");
if (length < 13 || length > 19 || smallestGroupSize < 3)
{
Console.WriteLine($"Input '{input}' is incorrect{Environment.NewLine}");
continue;
}
Console.WriteLine($"Input '{input}' is correct!{Environment.NewLine}");
}
which produces:
Total lenght: 13, smallest group size: 13
Input '1234567890123' is correct!
Total lenght: 15, smallest group size: 3
Input '1234 5678 9012 345' is correct!
Total lenght: 13, smallest group size: 1
Input '123456789012 3' is incorrect
Total lenght: 14, smallest group size: 1
Input '123 12 123 1 23134' is incorrect

Filtering on full string match but not on substrings

So I've got a long string of numbers and characters and I'd like to filter out a substring. The thing I'm struggling with is that I need a full match on a certain value (starting with S) but this may not be matched in another value.
Input:
S10 1+0000000297472+00EURS100 1+0000000297472+00EURS1023P 1+0000000816072+00EUR
The input is exactly like this.
Breakdown of input:
S10 1+0000000297472+00EUR
Every part starts with a tag S and ends with EUR
There are spaces in between because every part has a fixed length
=>
index 0 : tag 'S' with length 1
index 1 : code with length 7
index 8 : numbertype with length 1
index 9 : sign with length 1
index 10 : value with length 13
index 23 : sign with length 1
index 24 : exponent with length 2
index 26 : unit with length 3
I need to match on for example S10 and I only want this substring till EUR. I don't want it to match on S100 or S1023P or any other combination. Only on exactly S10
Output:
S10 1+0000000297472+00EUR
I'm trying to use Regex to find my match on 'S + code'. I'm doing a full match on my search query and then as soon as anything follows I don't want it anymore. But doing it like this also discards the actual match as after the S10 the value will follow which will match with [^\d|^\D])+\w
foreach (var field in fieldList)
{
var query = "S" + field.BallanceCode;
var index = Regex.Match(values, Regex.Escape(query) + #"([^\d|^\D])+\w").Index;
}
For example when looking for S10
needs to match:
S10 1+0000000297472+00EUR
may not match:
S10/15 1+0000001748447+00EUR
S1023P 1+0000000816072+00EUR
S10000001+0000000546546+00EUR
Update:
Using this code
var index = Regex.Match(values, Regex.Escape(query) + #"\p{Zs}.*?EUR").Index;
wil yield S10, S10/15, etc when looked for. However looking for S1000000 in the string doesn't work because there is no whitespace between the code and 1+
S10000001+0000000546546+00EUR
For example when looking for S1000000
needs to match:
S10000001+0000000297472+00EUR
may not match:
S10 1+0000001748447+00EUR
S1023P 1+0000000816072+00EUR
S10/15 1+0000000546546+00EUR
You can use a regex that requires a space (or whitespace) to appear right after the field.BallanceCode:
var index = Regex.Match(values, Regex.Escape(query) + (field.BallanceCode.Length < 7 ? #"\p{Zs}" : "") + ".*?EUR").Index;
The regex will match the S10, then any horizontal whitespace (\p{Zs}), then any 0 or more characters other than a newline (as few as possible due to *?) up to the first EUR.
The (field.BallanceCode.Length < 7 ? #"\p{Zs}" : "") check is necessary to support a 7-digit BallanceCode. If it contains 7 digits or more, we do not check if there is a whitespace after it. If the length is less than 7, we check for a space.
So you just want the start (S...) and end (...EUR) of each line and skip everything in between?
^([sS]\d+).*?([\d\+]+EUR)$
http://regexr.com/3c1ob

regular expression for 2 string arguments having numeric values with range constraint

I need to validate console input arguments. User can pass only 2 arguments separated by Space.
First argument should be between 1 to 100
Second argument should be between 1 to 750.
I need a regular expression to validate the input. Please help.
Description
this regex will match 1-100 space 1-750
^\b([1-9][0-9]?|100)\b\s+\b([1-9][0-9]?|[1-6][0-9]{2}|7[0-4][0-9]|750)\b$
Expanded
^ match the start of the string
\b match the word boundary
( open capture group 1
[1-9] match any single digit not including zero followed by
[0-9]? match any single digit or no digit
| or
100 match the number one hundred
) close the capture group 1
\b\s+\b require a word break, space, and word break.
( start capture group 2
[1-9] match any single digit not including zero followed by
[0-9]? match any single digit or no digit
| or
[1-6] match any digits 1 thru 6 followed by
[0-9]{2} match two of any digits
| or
7 match a seven followed by
[0-4] match digits 0 thru 4 followed by
[0-9] match any single digit
| or
750 match the number seven hundred and fifty
) close the capture group
\b$ require a word break and end of string.
It sounds like you want a pattern like this:
^(1|[1-9]\d|100)\s+(1|[1-9]\d|[1-6]\d\d|7[0-5]\d)$
However, you are probably better off verifying the inputs via normal integer comparison:
int int1, int2;
if (int.TryParse(param1, out int1) && int.TryParse(param2, out int2))
{
if (int1 >= 1 && int1 <= 100 && int2 >= 1 && int2 <= 750)
{
...
}
}
As others have said, regex isn't the best option, but if you really want to use it, this seems to work...
^(?:100|[1-9]\d?) (?:[1-7](?:[0-4]\d|50)|[1-9]\d?)$
I rather recommend not using regex but something like this:
int a=0,b=0;
if(args.Length != 2){
// not 2 arguments
}else{
if(!int.TryParse(args[0], out a) || !int.TryParse(args[1], out b)){
// not numbers
}else{
if(a < 1 || a > 100 || b < 1 || b > 750){
// out of ranges
}else{
// everything fine
}
}
}
and you'll have your numbers right there.

Categories

Resources