Getting matching result using Regex - c#

I need to find the matching result i.e a string using Regex. Let me demonstrate the scenario using sample inputs.
string input= "xb-cv_107_20190608_032214_006"; // <-1st case
string input = "yb-ha_107_20190608_032214_006__foobar"; // <-2nd case
string input= "fv_vgf_ka01mq3286__20190426_084135_039"; // <-3rd case
string input="fv_vgf_ka01mq3286__2090426_084135_039"; //<-4th case
For 1st case input, output required= "xb-cv_107_20190608_032214_006".
For 2nd case input, output required= "yb-ha_107_20190608_032214_006".
For 3rd case input, output required= "fv_vgf_ka01mq3286__20190426_084135_039".
For 4th case input, output required= null since the pattern does not match.
The procedure to get the output is:
Check using regex if pattern ends with _ followed by 8 decimals followed by '_'
followed by 6 decimals followed by 3 decimals
Or check using regex if pattern ends with _ followed by 8 decimals followed by _ followed by 6 decimals followed by 3 decimals exists followed by __ exists followed by anything random.
Till now, I have come up with this Regex expression:
string pattern = #".+[_][0-9]{8}[_][0-9]{6}[_][0-9]{3}([_]{2})?";
var result = Regex.Match(input, pattern)?.Groups[0].Value ;

You may use
var result = Regex.Match(input, #"^(.+_[0-9]{8}_[0-9]{6}_[0-9]{3})__")?.Groups[1].Value;
Regex details:
^ - start of string
( - Group 1 start:
.+ - any 1+ chars other than LF, as many as possible
_[0-9]{8}_[0-9]{6}_[0-9]{3} - _, 8 digits, _, 6 digits, _, 3 digits
) - end of Group 1
__ - two underscores.
If there is a match, the result holds the value that resides in Group 1.
If there is no match, result is null.

Related

Get only the longest match from the groups in regex

I have various strings like '10001110110', '10000', '100001', '00011','0001', '111000' etc..
I need to find out the longest possible combination of 1s with no or 1 zero in between.
I have got a regex like this - (?=(1+01+))
But its not returning a group where there is no leading or trailing one.
I want to regex to consider this case too.
Currently its returning all groups
Eg. if the input string is '10110111' it returns 3 groups
{null, 1011}, {null, 110111} and {null, 10111}
I want my regex to return only 1 match with the longest combination. Is it possible to do so?
For the following rule:
I need to find out the longest possible combination of 1s with no or 1
zero in between.
you can capture 1+ times a 1, and then optionally match 0 followed by again 1+ times a 1 in the lookahead assertion.
(?=(1+(?:0?1+)?))
Regex demo | C# demo
To get the longest result, you can process the matches, and then sort by the length of the string, and then get the first result from the collection.
string pattern = #"(?=(1+(?:0?1+)?))";
string input = #"10001110110 10000 100001 00011 0001 111000 101110111011011";
var result = Regex.Matches(input, pattern)
.Select(m => m.Groups[1].Value)
.OrderByDescending(s => s.Length)
.FirstOrDefault();
Console.WriteLine(result);
Output
1110111

Simplify regex code in C#: Add a space between a digit/decimal and unit

I have a regex code written in C# that basically adds a space between a number and a unit with some exceptions:
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+", #"$1");
dosage_value = Regex.Replace(dosage_value, #"(\d)%\s+", #"$1%");
dosage_value = Regex.Replace(dosage_value, #"(\d+(\.\d+)?)", #"$1 ");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+%", #"$1% ");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+:", #"$1:");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+e", #"$1e");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+E", #"$1E");
Example:
10ANYUNIT
10:something
10 : something
10 %
40 e-5
40 E-05
should become
10 ANYUNIT
10:something
10: something
10%
40e-5
40E-05
Exceptions are: %, E, e and :.
I have tried, but since my regex knowledge is not top-notch, would someone be able to help me reduce this code with same expected results?
Thank you!
For your example data, you might use 2 capture groups where the second group is in an optional part.
In the callback of replace, check if capture group 2 exists. If it does, use is in the replacement, else add a space.
(\d+(?:\.\d+)?)(?:\s*([%:eE]))?
( Capture group 1
\d+(?:\.\d+)? match 1+ digits with an optional decimal part
) Close group 1
(?: Non capture group to match a as a whole
\s*([%:eE]) Match optional whitespace chars, and capture 1 of % : e E in group 2
)? Close non capture group and make it optional
.NET regex demo
string[] strings = new string[]
{
"10ANYUNIT",
"10:something",
"10 : something",
"10 %",
"40 e-5",
"40 E-05",
};
string pattern = #"(\d+(?:\.\d+)?)(?:\s*([%:eE]))?";
var result = strings.Select(s =>
Regex.Replace(
s, pattern, m =>
m.Groups[1].Value + (m.Groups[2].Success ? m.Groups[2].Value : " ")
)
);
Array.ForEach(result.ToArray(), Console.WriteLine);
Output
10 ANYUNIT
10:something
10: something
10%
40e-5
40E-05
As in .NET \d can also match digits from other languages, \s can also match a newline and the start of the pattern might be a partial match, a bit more precise match can be:
\b([0-9]+(?:\.[0-9]+)?)(?:[\p{Zs}\t]*([%:eE]))?
I think you need something like this:
dosage_value = Regex.Replace(dosage_value, #"(\d+(\.\d*)?)\s*((E|e|%|:)+)\s*", #"$1$3 ");
Group 1 - (\d+(\.\d*)?)
Any number like 123 1241.23
Group 2 - ((E|e|%|:)+)
Any of special symbols like E e % :
Group 1 and Group 2 could be separated with any number of whitespaces.
If it's not working as you asking, please provide some samples to test.
For me it's too complex to be handled just by one regex. I suggest splitting into separate checks. See below code example - I used four different regexes, first is described in detail, the rest can be deduced based on first explanation.
using System.Text.RegularExpressions;
var testStrings = new string[]
{
"10mg",
"10:something",
"10 : something",
"10 %",
"40 e-5",
"40 E-05",
};
foreach (var testString in testStrings)
{
Console.WriteLine($"Input: '{testString}', parsed: '{RegexReplace(testString)}'");
}
string RegexReplace(string input)
{
// First look for exponential notation.
// Pattern is: match zero or more whitespaces \s*
// Then match one or more digits and store it in first capturing group (\d+)
// Then match one ore more whitespaces again.
// Then match part with exponent ([eE][-+]?\d+) and store it in second capturing group.
// It will match lower or uppercase 'e' with optional (due to ? operator) dash/plus sign and one ore more digits.
// Then match zero or more white spaces.
var expForMatch = Regex.Match(input, #"\s*(\d+)\s+([eE][-+]?\d+)\s*");
if(expForMatch.Success)
{
return $"{expForMatch.Groups[1].Value}{expForMatch.Groups[2].Value}";
}
var matchWithColon = Regex.Match(input, #"\s*(\d+)\s*:\s*(\w+)");
if (matchWithColon.Success)
{
return $"{matchWithColon.Groups[1].Value}:{matchWithColon.Groups[2].Value}";
}
var matchWithPercent = Regex.Match(input, #"\s*(\d+)\s*%");
if (matchWithPercent.Success)
{
return $"{matchWithPercent.Groups[1].Value}%";
}
var matchWithUnit = Regex.Match(input, #"\s*(\d+)\s*(\w+)");
if (matchWithUnit.Success)
{
return $"{matchWithUnit.Groups[1].Value} {matchWithUnit.Groups[2].Value}";
}
return input;
}
Output is:
Input: '10mg', parsed: '10 mg'
Input: '10:something', parsed: '10:something'
Input: '10 : something', parsed: '10:something'
Input: '10 %', parsed: '10%'
Input: '40 e-5', parsed: '40e-5'
Input: '40 E-05', parsed: '40E-05'

Replace floating numbers in math equation with letter variables

I want to replace all the floating numbers from a mathematical expression with letters using regular expressions. This is what I've tried:
Regex rx = new Regex("[-]?([0-9]*[.])?[0-9]+");
string expression = "((-30+5.2)*(2+7))-((-3.1*2.5)-9.12)";
char letter = 'a';
while (rx.IsMatch(expression))
{
expression = rx.Replace(expression , letter.ToString(), 1);
letter++;
}
The problem is that if I have for example (5-2)+3 it will replace it to: (ab)+c
So it gets the -2 as a number but I don't want that.
I am not experienced with Regex but I think I need something like this:
Check for '-', if there is a one, check if there is a number or right parenthesis before it. If there is NOT then save the '-'.
After that check for digits + dot + digits
My above Regex also works with values like: .2 .3 .4 but I don't need that, it should be explicit: 0.2 0.3 0.4
Following the suggested logic, you may consider
(?:(?<![)0-9])-)?[0-9]+(?:\.[0-9]+)?
See the regex demo.
Regex details
(?:(?<![)0-9])-)? - an optional non-capturing group matching 1 or 0 occurrences of
(?<![)0-9]) - a place in string that is not immediately preceded with a ) or digit
- - a minus
[0-9]+ - 1+ digits
(?:\.[0-9]+)? - an optional non-capturing group matching 1 or 0 occurrences of a . followed with 1+ digits.
In code, it is better to use a match evaluator (see the C# demo online):
Regex rx = new Regex(#"(?:(?<![)0-9])-)?[0-9]+(?:\.[0-9]+)?");
string expression = "((-30+5.2)*(2+7))-((-3.1*2.5)-9.12)";
char letter = (char)96; // char before a in ASCII table
string result = rx.Replace(expression, m =>
{
letter++; // char is incremented
return letter.ToString();
}
);
Console.WriteLine(result); // => ((a+b)*(c+d))-((e*f)-g)

Regex - Trying to write match validation for Swedish Social Number

So I want the formats xxxxxx-xxxx AND xxxxxxxx-xxxx to be possible. I've managed to fix the first section before the dash, but the last four digits are troublesome. It does require to match at least 4 characters, but I also want the regex to return false if there's more than 4 characters. How do I do it?
This is how it looks so far:
var regex = new Regex(#"^\d{6,8}[-|(\s)]{0,1}\d{4}");
And this is the results:
var regex = new Regex(#"^\d{6,8}[-|(\s)]{0,1}\d{4}");
Match m = regex.Match("840204-2344");
Console.WriteLine(m.Success); // Outputs True
Match m = regex.Match("19840204-2344");
Console.WriteLine(m.Success); // Outputs True
Match m = regex.Match("19840204-23");
Console.WriteLine(m.Success); // Outputs false
Match m = regex.Match("19840204-2323423423");
Console.WriteLine(m.Success); // Outputs true, and this is what I don't want
The \d{6,8} pattern matches 6, 7 or 8 digits, so that will already invalidate your regex pattern. Besdies, [-|(\s)]{0,1} matches 1 or 0 -, (, ), | or whitespace chars, and will also match strings like 19840204|2323, 19840204(2323 and 19840204)2323.
You may use
^\d{6}(?:\d{2})?[-\s]?\d{4}$
See the regex demo.
Details
^ - start of string
\d{6} - 6 digits
(?:\d{2})? - optional 2 digits
[-\s]? - 1 or 0 - or whitespaces
\d{4} - 4 digits
$ - end of string.
To make \d only match ASCII digits, pass RegexOptions.ECMAScriptoption. Example:
var res = Regex.IsMatch(s, #"^\d{6}(?:\d{2})?[-\s]?\d{4}$", RegexOptions.ECMAScript);
You are forgetting the $ at the end:
var regex = new Regex(#"^(\d{6}|\d{8})-\d{4}$");
If you want to match the social security number anywhere in a string, you van also use \b to test for boundaries:
var regex = new Regex(#"\b(\d{6}|\d{8})-\d{4}\b");
Edit: I corrected the RegEx to fix the problems mentioned in the comments. The commentors are right, of course. In my earlier post I just wanted to explain why the RegEx matched the longer string.

Regular expression match all numbers after the last dash?

Trying to find the last instance of numbers after last dash in a string so
test-123-2-456 would return 456
123-test would return ""
123-test-456 would return 456
123-test-456sdfsdf would return 456
123-test-asd456 would return 456
The expression, #"[^-]*$", does not match the numbers though, and I have tried using [\d] but to no avail.
Sure, the simplest solution would be something like this:
(\d+)[^-]*$
This will match one or more digits, captured in group 1, followed by zero or more of any character other than a hyphen, followed by the end of the string. In other words, it will match any sequence of digits as long as there are no hyphens between that sequence and the end of the string. You then just have to extract group 1 from the match. For example:
var inputs = new[] {
"test-123-2-456",
"123-test",
"123-test-456",
"123-test-456sdfsdf",
"123-test-asd456"
};
foreach(var str in inputs)
{
var m = Regex.Match(str, #"(\d+)[^-]*$");
Console.WriteLine("{0} --> {1}", str, m.Groups[1].Value);
}
Produces:
test-123-2-456 --> 456
123-test -->
123-test-456 --> 456
123-test-456sdfsdf --> 456
123-test-asd456 --> 456
Alternatively, if you could use a negative lookahead like this:
\d+(?!.*-)
This will match one or more digit characters so long as they are not followed by a hyphen. Only the digits will be included in the match.
Note that these two options behave differently if there are two or more sets of numbers after the last -, e.g. foo-123bar456. In this case it's not entirely clear what you want to happen, but the first pattern will simply match everything starting from the first sequence of digits to the end (123bar456) with group 1 only containing the first sequence of digits (123). If you'd like to change this so that it only captures the last sequence of digits, place a \d inside the character class (i.e. (\d+)[^\d-]*$). The second second pattern would produce a separate match for each sequence digits (in this example, 123 and 456) but the Regex.Match method will only give you the first match.
I suggest to apply two regex-functions. Take the result of the first one as the input for the second one.
The first regex is:
-[0-9]+[^-]+$ // Take the last peace of your string lead by a minus (-)
// followed by digits ([0-9]+)
// and some ugly rest that doesn't contain another minus ([^-]+$)
The second regex is:
-[0-9]+ // Seperate the relevant digits from the ugly rest
// You know that there can only be one minus + digits part in it
Tested here: http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx
The latest group from this RegEx can get the last number for you:
[^-A-z][0-9]+[^A-z]
If you are looking at groups, you can write this code by matching groups to get the latest number:
var inputs = new[] {
"test-123-2-456",
"123-test",
"123-test-456",
"123-test-456sdfsdf",
"123-test-asd456"
};
var m = Regex.Match(str, #"([0-9]*)");
if(m.Groups.Length>1) //This will avoid the values starting with numbers only.
Console.WriteLine("{0} --> {1}", str, m.Groups[m.Groups.Length-1].Value);

Categories

Resources