How to parse R123[i] pattern inside string in variable using regex? - c#

I have expression like this
R4013[i] == 3 and R4014[i] == 2 AND R40[i] == 1 and R403[i+1] == 5 and R404[i+1] == 2 AND R405[i+1] == 1 R231[2]
I want to grap all the variable here with regex so I get variable R4013[i],R4014[i],R40[i] and so on.
I have already have regex pattern like this but not work
[RMB].+\[.+\]

You could try the below,
#"[RMB][^\[]*\[[^\]]*\]"
DEMO
[RMB] Picks up a single character from the given list.
[^\[]* negated character class which matches any character but not [ symbol, zero or more times.
\[ Matches a literal [ symbol.
[^\]]* Matches any character but not of ], zero or more times.
\] Matches the literal ] symbol.
Code:
String input = #"R4013[i] == 3 and R4014[i] == 2 AND R40[i] == 1 and R403[i+1] == 5 and R404[i+1] == 2 AND R405[i+1] == 1 R231[2]";
Regex rgx = new Regex(#"[RMB][^\[]*\[[^\]]*\]");
foreach (Match m in rgx.Matches(input))
Console.WriteLine(m.Groups[0].Value);
IDEONE

R\d+\[[^\]]*\]
This simple regex should do it, see demo:
https://regex101.com/r/wU7sQ0/3
Or you can simply modify your own regex to make it non greedy. Your regex uses .+ which is greedy and so it was capturing all data up to last [] instead of each variable. .+ is greedy and will not stop at the first instance of [ which you want. So use .+? instead
[RMB].+?\[.+?\]
https://regex101.com/r/wU7sQ0/4
Your regex [RMB].+\[.+\] will capture up to last [] as you have .+ after [RMB]. See here
string strRegex = #"[RMB].+?\[.+?\]";
Regex myRegex = new Regex(strRegex, RegexOptions.Multiline);
string strTargetString = #"R4013[i] == 3 and R4014[i] == 2 AND R40[i] == 1 and R403[i+1] == 5 and R404[i+1] == 2 AND R405[i+1] == 1 R231[2]";
foreach (Match myMatch in myRegex.Matches(strTargetString))
{
if (myMatch.Success)
{
// Add your code here
}
}

Related

How to write REGEX to get the particular string in C# ASP.NET?

Need to get three strings from the below mentioned string, need the possible solution in C# and ASP.NET:
"componentStatusId==2|3,screeningOwnerId>0"
I need to get '2','3' and '0' using a regular expression in C#
If all you want is the numbers from a string then you could use the regex in this code:
string re = "(?:\\b(\\d+)\\b[^\\d]*)+";
Regex regex = new Regex(re);
string input = "componentStatusId==2|3,screeningOwnerId>0";
MatchCollection matches = regex.Matches(input);
for (int ii = 0; ii < matches.Count; ii++)
{
Console.WriteLine("Match[{0}] // of 0..{1}:", ii, matches.Count - 1);
DisplayMatchResults(matches[ii]);
}
Function DisplayMatchResults is taken from this Stack Overflow answer.
The Console output from the above is:
Match[0] // of 0..0:
Match has 1 captures
Group 0 has 1 captures '2|3,screeningOwnerId>0'
Capture 0 '2|3,screeningOwnerId>0'
Group 1 has 3 captures '0'
Capture 0 '2'
Capture 1 '3'
Capture 2 '0'
match.Groups[0].Value == "2|3,screeningOwnerId>0"
match.Groups[1].Value == "0"
match.Groups[0].Captures[0].Value == "2|3,screeningOwnerId>0"
match.Groups[1].Captures[0].Value == "2"
match.Groups[1].Captures[1].Value == "3"
match.Groups[1].Captures[2].Value == "0"
Hence the numbers can be seen in match.Groups[1].Captures[...].
Another possibility is to use Regex.Split where the pattern is "non digits". The results from the code below will need post processing to remove empty strings. Note that Regex.Split does not have the StringSplitOptions.RemoveEmptyEntries of the string Split method.
string input = "componentStatusId==2|3,screeningOwnerId>0";
string[] numbers = Regex.Split(input, "[^\\d]+");
for (int ii = 0; ii < numbers.Length; ii++)
{
Console.WriteLine("{0}: '{1}'", ii, numbers[ii]);
}
The output from this is:
0: ''
1: '2'
2: '34'
3: '0'
Use following regex and capture your values from group 1, 2 and 3.
componentStatusId==(\d+)\|(\d+),screeningOwnerId>(\d+)
Demo
For generalizing componentStatusId and screeningOwnerId with any string, you can use \w+ in the regex and make it more general.
\w+==(\d+)\|(\d+),\w+>(\d+)
Updated Demo

Simplify regex code in C#: Add a space between a digit/decimal and unit

I have a regex code written in C# that basically adds a space between a number and a unit with some exceptions:
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+", #"$1");
dosage_value = Regex.Replace(dosage_value, #"(\d)%\s+", #"$1%");
dosage_value = Regex.Replace(dosage_value, #"(\d+(\.\d+)?)", #"$1 ");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+%", #"$1% ");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+:", #"$1:");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+e", #"$1e");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+E", #"$1E");
Example:
10ANYUNIT
10:something
10 : something
10 %
40 e-5
40 E-05
should become
10 ANYUNIT
10:something
10: something
10%
40e-5
40E-05
Exceptions are: %, E, e and :.
I have tried, but since my regex knowledge is not top-notch, would someone be able to help me reduce this code with same expected results?
Thank you!
For your example data, you might use 2 capture groups where the second group is in an optional part.
In the callback of replace, check if capture group 2 exists. If it does, use is in the replacement, else add a space.
(\d+(?:\.\d+)?)(?:\s*([%:eE]))?
( Capture group 1
\d+(?:\.\d+)? match 1+ digits with an optional decimal part
) Close group 1
(?: Non capture group to match a as a whole
\s*([%:eE]) Match optional whitespace chars, and capture 1 of % : e E in group 2
)? Close non capture group and make it optional
.NET regex demo
string[] strings = new string[]
{
"10ANYUNIT",
"10:something",
"10 : something",
"10 %",
"40 e-5",
"40 E-05",
};
string pattern = #"(\d+(?:\.\d+)?)(?:\s*([%:eE]))?";
var result = strings.Select(s =>
Regex.Replace(
s, pattern, m =>
m.Groups[1].Value + (m.Groups[2].Success ? m.Groups[2].Value : " ")
)
);
Array.ForEach(result.ToArray(), Console.WriteLine);
Output
10 ANYUNIT
10:something
10: something
10%
40e-5
40E-05
As in .NET \d can also match digits from other languages, \s can also match a newline and the start of the pattern might be a partial match, a bit more precise match can be:
\b([0-9]+(?:\.[0-9]+)?)(?:[\p{Zs}\t]*([%:eE]))?
I think you need something like this:
dosage_value = Regex.Replace(dosage_value, #"(\d+(\.\d*)?)\s*((E|e|%|:)+)\s*", #"$1$3 ");
Group 1 - (\d+(\.\d*)?)
Any number like 123 1241.23
Group 2 - ((E|e|%|:)+)
Any of special symbols like E e % :
Group 1 and Group 2 could be separated with any number of whitespaces.
If it's not working as you asking, please provide some samples to test.
For me it's too complex to be handled just by one regex. I suggest splitting into separate checks. See below code example - I used four different regexes, first is described in detail, the rest can be deduced based on first explanation.
using System.Text.RegularExpressions;
var testStrings = new string[]
{
"10mg",
"10:something",
"10 : something",
"10 %",
"40 e-5",
"40 E-05",
};
foreach (var testString in testStrings)
{
Console.WriteLine($"Input: '{testString}', parsed: '{RegexReplace(testString)}'");
}
string RegexReplace(string input)
{
// First look for exponential notation.
// Pattern is: match zero or more whitespaces \s*
// Then match one or more digits and store it in first capturing group (\d+)
// Then match one ore more whitespaces again.
// Then match part with exponent ([eE][-+]?\d+) and store it in second capturing group.
// It will match lower or uppercase 'e' with optional (due to ? operator) dash/plus sign and one ore more digits.
// Then match zero or more white spaces.
var expForMatch = Regex.Match(input, #"\s*(\d+)\s+([eE][-+]?\d+)\s*");
if(expForMatch.Success)
{
return $"{expForMatch.Groups[1].Value}{expForMatch.Groups[2].Value}";
}
var matchWithColon = Regex.Match(input, #"\s*(\d+)\s*:\s*(\w+)");
if (matchWithColon.Success)
{
return $"{matchWithColon.Groups[1].Value}:{matchWithColon.Groups[2].Value}";
}
var matchWithPercent = Regex.Match(input, #"\s*(\d+)\s*%");
if (matchWithPercent.Success)
{
return $"{matchWithPercent.Groups[1].Value}%";
}
var matchWithUnit = Regex.Match(input, #"\s*(\d+)\s*(\w+)");
if (matchWithUnit.Success)
{
return $"{matchWithUnit.Groups[1].Value} {matchWithUnit.Groups[2].Value}";
}
return input;
}
Output is:
Input: '10mg', parsed: '10 mg'
Input: '10:something', parsed: '10:something'
Input: '10 : something', parsed: '10:something'
Input: '10 %', parsed: '10%'
Input: '40 e-5', parsed: '40e-5'
Input: '40 E-05', parsed: '40E-05'

Regex for Backus-Naur Form

i'm trying to make a regex to match a string like:
i<A> | n<B> | <C>
It needs to return the values:
("i", "A")
("n", "B")
("", "C")
Currently i'm using the following regex:
^([A-Za-z0-9]*)\<(.*?)\>
but it only matches the first pair ("i", "A").
I can't find a way to fix it.
the ^ asserts position at start of a line so it will only check the beginning of each line if you remove that i should work
and add a ? for the empty value see example below
string pattern = #"([A-Za-z0-9]?)<(.?)>";
string input = #"i<A> | n<B> | <C>";
RegexOptions options = RegexOptions.Multiline;
foreach (Match m in Regex.Matches(input, pattern, options))
{
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
If the | is part of the string and should be matched, you can make use of the captures property with 2 capture groups with the same name.
^(?<first>[A-Za-z0-9]*)<(?<second>[^<>]*)>(?:\s+\|\s+(?<first>[A-Za-z0-9]*)<(?<second>[^<>]*)>)+$
The pattern matches:
^ Start of string
(?<first>[A-Za-z0-9]*) Named group first, optionally match any of the listed ranges
<(?<second>[^<>]*)> Match < then start named group second and match any char except < and > and match >
(?: Non capture group
\s+\|\s+(?<first>[A-Za-z0-9]*)<(?<second>[^<>]*)> Match a | between whitespace chars and the same pattern for both named groups
)+ Close group and repeat 1+ times
$ End of string
See a .NET regex demo | C# demo
Then you could for example create Tuples out of the matches to create the pairs.
string str = "i<A> | n<B> | <C>";
MatchCollection matches = Regex.Matches(str, #"^(?<first>[A-Za-z0-9]*)<(?<second>[^<>]*)>(?:\s+\|\s+(?<first>[A-Za-z0-9]*)<(?<second>[^<>]*)>)+$");
foreach (Match match in matches)
{
match.Groups["first"].Captures
.Select(c => c.Value)
.Zip(match.Groups["second"].Captures.Select(c => c.Value), (x, y) => Tuple.Create(x, y))
.ToList()
.ForEach(t => Console.WriteLine("first: {0}, second: {1}", t.Item1, t.Item2));
}
Output
first: i, second: A
first: n, second: B
first: , second: C

Create a dictionary from a string by separating the string using a word

Str = 7 X 4 # 70Hz LUVYG
I want to split the string with Hz
Key: 7 X 4 # 70Hz
Value: LUVYG
With the regular expression
^ - start of string
(.*) - zero or more characters (Groups[1])
\s - whitespace
(\S*) - zero or more non-whitespace characters (Groups[2])
$ - end of string
it is possible to split a string into 2 groups, using the last whitespace in the string as the delimiter.
Using this regular expression, you can use LINQ to process a collection of strings into a dictionary:
var unprocessed = new[] { "7 X 4 # 70Hz LUVYG" };
var dict =
unprocessed
.Select(w => Regex.Match(w, #"^(.*)\s(\S*)$"))
.Where(m => m.Success)
.ToDictionary(m => m.Groups[1].Value, m => m.Groups[2].Value);

Regex - Trying to write match validation for Swedish Social Number

So I want the formats xxxxxx-xxxx AND xxxxxxxx-xxxx to be possible. I've managed to fix the first section before the dash, but the last four digits are troublesome. It does require to match at least 4 characters, but I also want the regex to return false if there's more than 4 characters. How do I do it?
This is how it looks so far:
var regex = new Regex(#"^\d{6,8}[-|(\s)]{0,1}\d{4}");
And this is the results:
var regex = new Regex(#"^\d{6,8}[-|(\s)]{0,1}\d{4}");
Match m = regex.Match("840204-2344");
Console.WriteLine(m.Success); // Outputs True
Match m = regex.Match("19840204-2344");
Console.WriteLine(m.Success); // Outputs True
Match m = regex.Match("19840204-23");
Console.WriteLine(m.Success); // Outputs false
Match m = regex.Match("19840204-2323423423");
Console.WriteLine(m.Success); // Outputs true, and this is what I don't want
The \d{6,8} pattern matches 6, 7 or 8 digits, so that will already invalidate your regex pattern. Besdies, [-|(\s)]{0,1} matches 1 or 0 -, (, ), | or whitespace chars, and will also match strings like 19840204|2323, 19840204(2323 and 19840204)2323.
You may use
^\d{6}(?:\d{2})?[-\s]?\d{4}$
See the regex demo.
Details
^ - start of string
\d{6} - 6 digits
(?:\d{2})? - optional 2 digits
[-\s]? - 1 or 0 - or whitespaces
\d{4} - 4 digits
$ - end of string.
To make \d only match ASCII digits, pass RegexOptions.ECMAScriptoption. Example:
var res = Regex.IsMatch(s, #"^\d{6}(?:\d{2})?[-\s]?\d{4}$", RegexOptions.ECMAScript);
You are forgetting the $ at the end:
var regex = new Regex(#"^(\d{6}|\d{8})-\d{4}$");
If you want to match the social security number anywhere in a string, you van also use \b to test for boundaries:
var regex = new Regex(#"\b(\d{6}|\d{8})-\d{4}\b");
Edit: I corrected the RegEx to fix the problems mentioned in the comments. The commentors are right, of course. In my earlier post I just wanted to explain why the RegEx matched the longer string.

Categories

Resources