Regex for Backus-Naur Form

Regex for Backus-Naur Form - c#

i'm trying to make a regex to match a string like:
i<A> | n<B> | <C>
It needs to return the values:
("i", "A")
("n", "B")
("", "C")
Currently i'm using the following regex:
^([A-Za-z0-9]*)\<(.*?)\>
but it only matches the first pair ("i", "A").
I can't find a way to fix it.

the ^ asserts position at start of a line so it will only check the beginning of each line if you remove that i should work
and add a ? for the empty value see example below
string pattern = #"([A-Za-z0-9]?)<(.?)>";
string input = #"i<A> | n<B> | <C>";
RegexOptions options = RegexOptions.Multiline;
foreach (Match m in Regex.Matches(input, pattern, options))
{
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}

If the | is part of the string and should be matched, you can make use of the captures property with 2 capture groups with the same name.
^(?<first>[A-Za-z0-9]*)<(?<second>[^<>]*)>(?:\s+\|\s+(?<first>[A-Za-z0-9]*)<(?<second>[^<>]*)>)+$
The pattern matches:
^ Start of string
(?<first>[A-Za-z0-9]*) Named group first, optionally match any of the listed ranges
<(?<second>[^<>]*)> Match < then start named group second and match any char except < and > and match >
(?: Non capture group
\s+\|\s+(?<first>[A-Za-z0-9]*)<(?<second>[^<>]*)> Match a | between whitespace chars and the same pattern for both named groups
)+ Close group and repeat 1+ times
$ End of string
See a .NET regex demo | C# demo
Then you could for example create Tuples out of the matches to create the pairs.
string str = "i<A> | n<B> | <C>";
MatchCollection matches = Regex.Matches(str, #"^(?<first>[A-Za-z0-9]*)<(?<second>[^<>]*)>(?:\s+\|\s+(?<first>[A-Za-z0-9]*)<(?<second>[^<>]*)>)+$");
foreach (Match match in matches)
{
match.Groups["first"].Captures
.Select(c => c.Value)
.Zip(match.Groups["second"].Captures.Select(c => c.Value), (x, y) => Tuple.Create(x, y))
.ToList()
.ForEach(t => Console.WriteLine("first: {0}, second: {1}", t.Item1, t.Item2));
}
Output
first: i, second: A
first: n, second: B
first: , second: C

Related

Simplify regex code in C#: Add a space between a digit/decimal and unit

I have a regex code written in C# that basically adds a space between a number and a unit with some exceptions:
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+", #"$1");
dosage_value = Regex.Replace(dosage_value, #"(\d)%\s+", #"$1%");
dosage_value = Regex.Replace(dosage_value, #"(\d+(\.\d+)?)", #"$1 ");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+%", #"$1% ");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+:", #"$1:");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+e", #"$1e");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+E", #"$1E");
Example:
10ANYUNIT
10:something
10 : something
10 %
40 e-5
40 E-05
should become
10 ANYUNIT
10:something
10: something
10%
40e-5
40E-05
Exceptions are: %, E, e and :.
I have tried, but since my regex knowledge is not top-notch, would someone be able to help me reduce this code with same expected results?
Thank you!

For your example data, you might use 2 capture groups where the second group is in an optional part.
In the callback of replace, check if capture group 2 exists. If it does, use is in the replacement, else add a space.
(\d+(?:\.\d+)?)(?:\s*([%:eE]))?
( Capture group 1
\d+(?:\.\d+)? match 1+ digits with an optional decimal part
) Close group 1
(?: Non capture group to match a as a whole
\s*([%:eE]) Match optional whitespace chars, and capture 1 of % : e E in group 2
)? Close non capture group and make it optional
.NET regex demo
string[] strings = new string[]
{
"10ANYUNIT",
"10:something",
"10 : something",
"10 %",
"40 e-5",
"40 E-05",
};
string pattern = #"(\d+(?:\.\d+)?)(?:\s*([%:eE]))?";
var result = strings.Select(s =>
Regex.Replace(
s, pattern, m =>
m.Groups[1].Value + (m.Groups[2].Success ? m.Groups[2].Value : " ")
)
);
Array.ForEach(result.ToArray(), Console.WriteLine);
Output
10 ANYUNIT
10:something
10: something
10%
40e-5
40E-05
As in .NET \d can also match digits from other languages, \s can also match a newline and the start of the pattern might be a partial match, a bit more precise match can be:
\b([0-9]+(?:\.[0-9]+)?)(?:[\p{Zs}\t]*([%:eE]))?

I think you need something like this:
dosage_value = Regex.Replace(dosage_value, #"(\d+(\.\d*)?)\s*((E|e|%|:)+)\s*", #"$1$3 ");
Group 1 - (\d+(\.\d*)?)
Any number like 123 1241.23
Group 2 - ((E|e|%|:)+)
Any of special symbols like E e % :
Group 1 and Group 2 could be separated with any number of whitespaces.
If it's not working as you asking, please provide some samples to test.

For me it's too complex to be handled just by one regex. I suggest splitting into separate checks. See below code example - I used four different regexes, first is described in detail, the rest can be deduced based on first explanation.
using System.Text.RegularExpressions;
var testStrings = new string[]
{
"10mg",
"10:something",
"10 : something",
"10 %",
"40 e-5",
"40 E-05",
};
foreach (var testString in testStrings)
{
Console.WriteLine($"Input: '{testString}', parsed: '{RegexReplace(testString)}'");
}
string RegexReplace(string input)
{
// First look for exponential notation.
// Pattern is: match zero or more whitespaces \s*
// Then match one or more digits and store it in first capturing group (\d+)
// Then match one ore more whitespaces again.
// Then match part with exponent ([eE][-+]?\d+) and store it in second capturing group.
// It will match lower or uppercase 'e' with optional (due to ? operator) dash/plus sign and one ore more digits.
// Then match zero or more white spaces.
var expForMatch = Regex.Match(input, #"\s*(\d+)\s+([eE][-+]?\d+)\s*");
if(expForMatch.Success)
{
return $"{expForMatch.Groups[1].Value}{expForMatch.Groups[2].Value}";
}
var matchWithColon = Regex.Match(input, #"\s*(\d+)\s*:\s*(\w+)");
if (matchWithColon.Success)
{
return $"{matchWithColon.Groups[1].Value}:{matchWithColon.Groups[2].Value}";
}
var matchWithPercent = Regex.Match(input, #"\s*(\d+)\s*%");
if (matchWithPercent.Success)
{
return $"{matchWithPercent.Groups[1].Value}%";
}
var matchWithUnit = Regex.Match(input, #"\s*(\d+)\s*(\w+)");
if (matchWithUnit.Success)
{
return $"{matchWithUnit.Groups[1].Value} {matchWithUnit.Groups[2].Value}";
}
return input;
}
Output is:
Input: '10mg', parsed: '10 mg'
Input: '10:something', parsed: '10:something'
Input: '10 : something', parsed: '10:something'
Input: '10 %', parsed: '10%'
Input: '40 e-5', parsed: '40e-5'
Input: '40 E-05', parsed: '40E-05'

Create a dictionary from a string by separating the string using a word

Str = 7 X 4 # 70Hz LUVYG
I want to split the string with Hz
Key: 7 X 4 # 70Hz
Value: LUVYG

With the regular expression
^ - start of string
(.*) - zero or more characters (Groups[1])
\s - whitespace
(\S*) - zero or more non-whitespace characters (Groups[2])
$ - end of string
it is possible to split a string into 2 groups, using the last whitespace in the string as the delimiter.
Using this regular expression, you can use LINQ to process a collection of strings into a dictionary:
var unprocessed = new[] { "7 X 4 # 70Hz LUVYG" };
var dict =
unprocessed
.Select(w => Regex.Match(w, #"^(.*)\s(\S*)$"))
.Where(m => m.Success)
.ToDictionary(m => m.Groups[1].Value, m => m.Groups[2].Value);

Regex to extract one (or more) product numbers at the start of a filename seperated by an _

We have filenames that contain product numbers at the start and based on this we apply processing when adding them to the system
i need a regex that should match the following
70707_70708_70709_display1.jpg
70707_Front010.jpg
and NOT these
626-this files is tagged.jpg
1000x1000_webbanner2.jpg
2000 years ago_files.jpg
626gamingassets_styleguide.jpg
70707_Front010_0001_1.jpg
i have a regex that almost does what i want except for one case highlighted below
\d{3,}(?=_)
70707_70708_70709_display1.jpg - success 3 matches {70707,70708,70709}
70707_Front010.jpg - success 1 match {70707 }
626-this files is tagged.jpg - success 0 matches
1000x1000_webbanner2.jpg - fail 1 match {1000}
2000 years ago_files.jpg - success 0 matches
626gamingassets_styleguide.jpg - success 0 matches
70707_Front010_0001_1.jpg - fail 2 matches{70707,0001}
I have a regex test to illustrate this at regex101.
The regex should only look for sets of underscore separated numbers at the beginning.

You may try a non-regex solution:
var results = s.Split('_').TakeWhile(x => x.All(char.IsDigit) && x.Length >= 3).ToList();
if (results.Count > 0)
Console.WriteLine("Results: {0}", string.Join(", ", results));
else
Console.WriteLine("No match: '{0}'", s);
See the C# demo. Here, the string is split with _ and then only the first items that are all digits and of length 3+ are returned.
You may use the following regex based solution:
^(?<v>\d{3,})(?:_(?<v>\d{3,}))*_
See the regex demo
Pattern details
^ - start of a string
(?<v>\d{3,}) - Group v: 3 or more digits
(?:_(?<v>\d{3,}))* - 0+ occurrences of
_ - an underscore
(?<v>\d{3,}) - Group v: 3 or more digits
_ - a _.
C# demo:
var lst = new List<string> {"70707_70708_70709_display1.jpg",
"70707_Front010.jpg",
"626-this files is tagged.jpg",
"1000x1000_webbanner2.jpg",
"2000 years ago_files.jpg",
"626gamingassets_styleguide.jpg" };
foreach (var s in lst)
{
var mcoll = Regex.Matches(s, #"^(?<v>\d{3,})(?:_(?<v>\d{3,}))*_")
.Cast<Match>()
.SelectMany(m => m.Groups["v"].Captures.Cast<Capture>().Select(c => c.Value))
.ToList();
if (mcoll.Count > 0)
Console.WriteLine("Results: {0}", string.Join(", ", mcoll));
else
Console.WriteLine("No match: '{0}'", s);
}
Output:
Results: 70707, 70708, 70709
Results: 70707
No match: '626-this files is tagged.jpg'
No match: '1000x1000_webbanner2.jpg'
No match: '2000 years ago_files.jpg'
No match: '626gamingassets_styleguide.jpg'

If number is always at the line beginning, this will work:
^\d{3,}(?=_)

How to parse R123[i] pattern inside string in variable using regex?

I have expression like this
R4013[i] == 3 and R4014[i] == 2 AND R40[i] == 1 and R403[i+1] == 5 and R404[i+1] == 2 AND R405[i+1] == 1 R231[2]
I want to grap all the variable here with regex so I get variable R4013[i],R4014[i],R40[i] and so on.
I have already have regex pattern like this but not work
[RMB].+\[.+\]

You could try the below,
#"[RMB][^\[]*\[[^\]]*\]"
DEMO
[RMB] Picks up a single character from the given list.
[^\[]* negated character class which matches any character but not [ symbol, zero or more times.
\[ Matches a literal [ symbol.
[^\]]* Matches any character but not of ], zero or more times.
\] Matches the literal ] symbol.
Code:
String input = #"R4013[i] == 3 and R4014[i] == 2 AND R40[i] == 1 and R403[i+1] == 5 and R404[i+1] == 2 AND R405[i+1] == 1 R231[2]";
Regex rgx = new Regex(#"[RMB][^\[]*\[[^\]]*\]");
foreach (Match m in rgx.Matches(input))
Console.WriteLine(m.Groups[0].Value);
IDEONE

R\d+\[[^\]]*\]
This simple regex should do it, see demo:
https://regex101.com/r/wU7sQ0/3
Or you can simply modify your own regex to make it non greedy. Your regex uses .+ which is greedy and so it was capturing all data up to last [] instead of each variable. .+ is greedy and will not stop at the first instance of [ which you want. So use .+? instead
[RMB].+?\[.+?\]
https://regex101.com/r/wU7sQ0/4
Your regex [RMB].+\[.+\] will capture up to last [] as you have .+ after [RMB]. See here
string strRegex = #"[RMB].+?\[.+?\]";
Regex myRegex = new Regex(strRegex, RegexOptions.Multiline);
string strTargetString = #"R4013[i] == 3 and R4014[i] == 2 AND R40[i] == 1 and R403[i+1] == 5 and R404[i+1] == 2 AND R405[i+1] == 1 R231[2]";
foreach (Match myMatch in myRegex.Matches(strTargetString))
{
if (myMatch.Success)
{
// Add your code here
}
}

Regex string for phrase

I am trying to extract a ticket number with the phrase "Ticket ID: (20 digit number)"
This phrase can be written as:
"Ticket ID: (20 digit number)"
"TicketID:(20 digit number)" - Spaces do not matter
Here is the regex string I am using that fails to work. I am trying to understand what I am doing wrong here. This regex should be looking for any phrases reguardless of space with the word Ticket followed by ID: followed by a 20 digit number of any kind.
Regex newExpression = new Regex(#"\bTicket\b.*\bID:\b.*\d{20}",
RegexOptions.IgnoreCase
| RegexOptions.Singleline
| RegexOptions.IgnorePatternWhitespace);

With this pattern you obtain directly the number since the lookbehind (?<=..) is just a check and is not in the match result:
Regex newExpression = new Regex(#"(?<=\bTicket\s*ID\s*:\s*)\d{20}",
RegexOptions.IgnoreCase);

A word boundary doesn’t happen after a :. Just use \s* to ignore spaces:
Regex newExpression = new Regex(#"Ticket\s*ID:\s*(\d{20})");
Now you can use newExpression.Match(someString).Groups[1].Value.

Just skip everything before the : or :, like
#".*\s*\d{20}"

Well, you can try this code. It captures the 20 digit number as named capture group in Regex:
var newExpression = new Regex(//#"\bTicket\b.*\bID:\b.*\d{20}",
#"Ticket\s*ID\:\s*(?<num>\(\d{20}\))",
RegexOptions.IgnoreCase
| RegexOptions.Singleline
| RegexOptions.IgnorePatternWhitespace);
var items=new List<string>();
var r=new Random();
var sb=new StringBuilder();
var i=0;
var formats=new []{"TicketID:({0})", "Ticket ID:({0})", "Ticket ID: ({0})",
"Ticket ID: ({0})"};
for(;i<5;i++){
for(var j=0;j<20;j++){
sb.Append(r.Next(0,9));
}
items.Add(string.Format(formats[r.Next(0,4)],sb));
sb.Remove(0,20);
}
for(;i<10;i++){
for(var j=0;j<20;j++){
sb.Append(r.Next(0,9));
}
items.Add(string.Format(formats[r.Next(0,2)],sb));
sb.Remove(0,20);
}
for(;i<15;i++){
for(var j=0;j<20;j++){
sb.Append(r.Next(0,9));
}
items.Add(string.Format(formats[r.Next(0,2)],sb));
sb.Remove(0,20);
}
foreach(var s in items){
var m = newExpression.Match(s);
if(m.Success && m.Groups!=null && m.Groups.Count>0){
string.Format("{0} - {1}",s,m.Groups["num"].Value).Dump();
}
}
NOTE: This was ran in Linqpad.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex for Backus-Naur Form - c#

i'm trying to make a regex to match a string like: i<A> | n<B> | <C> It needs to return the values: ("i", "A") ("n", "B") ("", "C") Currently i'm using the following regex: ^([A-Za-z0-9])\<(.?)\> but it only matches the first pair ("i", "A"). I can't find a way to fix it.

Related

Simplify regex code in C#: Add a space between a digit/decimal and unit

Create a dictionary from a string by separating the string using a word

Regex to extract one (or more) product numbers at the start of a filename seperated by an _

How to parse R123[i] pattern inside string in variable using regex?

Regex string for phrase

Categories

Resources

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex for Backus-Naur Form - c#

i'm trying to make a regex to match a string like: i<A> | n<B> | <C> It needs to return the values: ("i", "A") ("n", "B") ("", "C") Currently i'm using the following regex: ^([A-Za-z0-9]*)\<(.*?)\> but it only matches the first pair ("i", "A"). I can't find a way to fix it.

Related

Simplify regex code in C#: Add a space between a digit/decimal and unit

Create a dictionary from a string by separating the string using a word

Regex to extract one (or more) product numbers at the start of a filename seperated by an _

How to parse R123[i] pattern inside string in variable using regex?

Regex string for phrase

Categories

Resources

i'm trying to make a regex to match a string like: i<A> | n<B> | <C> It needs to return the values: ("i", "A") ("n", "B") ("", "C") Currently i'm using the following regex: ^([A-Za-z0-9])\<(.?)\> but it only matches the first pair ("i", "A"). I can't find a way to fix it.