What would be the regex for this value? C# - c#

I'm trying to get the regex for this value:
<5fond[3.4,550],[5.4,6.4,7.4, 8.4, 32.4],[ 9.4, 239.8662]
The numbers (minus the second one which appears to just be an integer) can be any decimal value.
I have tried the following but it doesn't seem to work.
private static readonly Regex RegexExp = new Regex(#"<5fond\[[0-9]*\.[0-9]+,[0-9]*\.[0-9]+],\[[0-9]*\.[0-9]+,[0-9]*\.[0-9]+,[0-9]*\.[0-9]+,[0-9]*\.[0-9]+\],\[[0-9]*\.[0-9]+,[0-9]*\.[0-9]+\]", RegexOptions.IgnorePatternWhitespace);
Any idea what I might be doing wrong?

You can use
<5fond\[\s*\d*\.?\d+(?:,\s*\d*\.?\d+)*\s*](?:,\s*\[\s*\d*\.?\d+(?:,\s*\d*\.?\d+)*\s*])*
See the regex demo. Details:
<5fond\[ - a <5fond[ string
\s* - zero or more whitespaces
\d*\.?\d+ - an int/float number pattern
(?:,\s*\d*\.?\d+)* - zero or more sequences of a comma, zero or more whitespaces, an int/float number
\s* - zero or more whitespaces
] - a ] char
(?:,\s*\[\s*\d*\.?\d+(?:,\s*\d*\.?\d+)*])* - zero or more occurrences of
, - a comma
\s*\[\s* - a [ char enclosed with zero or more whitespaces
\d*\.?\d+(?:,\s*\d*\.?\d+)* - an int/float and then zero or more occurrences of a comma, zero or more whitespaces, an int/float number
] - a ] char.

Related

Comma is breaking Grouping

I want a regular expression that's match anything as a parameter for this string concat(1st,2nd) and extract three matching groups as below :
Group1: concat
Group2: 1st
Group3: 2nd.
I have tried this :^\s*(concat)\(\s*(.*?)\s*\,\s*(.*)\)\s*$, and it worked fine until I had a parameter with comma as below:
concat(regex(3,4),regex(3,4)). It seams the comma is breaking it down, how to ignore the parameter content and take it as a seperate group?
You may use
^\s*(concat)\(\s*((?>\w*\((?>[^()]+|(?<o>)\(|(?<-o>)\))*(?(o)(?!))\)|\w+))\s*,\s*((?>\w*\((?>[^()]+|(?<o>)\(|(?<-o>)\))*(?(o)(?!))\)|\w+))\)\s*$
See the regex demo.
Details
^ - start of string
\s* - 0+ whitespaces
(concat) - Group 1: concat word
\( - a ( char
\s* - 0+ whitespaces
({arg}) - Group 2: arg pattern:
\w* - 0+ word chars
\((?>[^()]+|(?<o>)\(|(?<-o>)\))*(?(o)(?!))\) - (, then any amount of nested parentheses or chars other than ( and ) and then )
|\w+ - or just 1+ word chars
\s*,\s* - a comma enclosed with 0+ whitespaces
({arg}) - Group 3: arg pattern
\) - a ) char
\s* - 0+ whitespaces
$ - end of string.
See C# demo:
var arg = #"(?>\w*\((?>[^()]+|(?<o>)\(|(?<-o>)\))*(?(o)(?!))\)|\w+)";
var pattern = $#"^\s*(concat)\(\s*({arg})\s*,\s*({arg})\)\s*$";
var match = Regex.Match("concat(regex(3,4),regex(3,4))", pattern);
if (match.Success)
{
Console.WriteLine(match.Groups[1].Value);
Console.WriteLine(match.Groups[2].Value);
Console.WriteLine(match.Groups[3].Value);
}
// => concat regex(3,4) regex(3,4)
Results:

Replace floating numbers in math equation with letter variables

I want to replace all the floating numbers from a mathematical expression with letters using regular expressions. This is what I've tried:
Regex rx = new Regex("[-]?([0-9]*[.])?[0-9]+");
string expression = "((-30+5.2)*(2+7))-((-3.1*2.5)-9.12)";
char letter = 'a';
while (rx.IsMatch(expression))
{
expression = rx.Replace(expression , letter.ToString(), 1);
letter++;
}
The problem is that if I have for example (5-2)+3 it will replace it to: (ab)+c
So it gets the -2 as a number but I don't want that.
I am not experienced with Regex but I think I need something like this:
Check for '-', if there is a one, check if there is a number or right parenthesis before it. If there is NOT then save the '-'.
After that check for digits + dot + digits
My above Regex also works with values like: .2 .3 .4 but I don't need that, it should be explicit: 0.2 0.3 0.4
Following the suggested logic, you may consider
(?:(?<![)0-9])-)?[0-9]+(?:\.[0-9]+)?
See the regex demo.
Regex details
(?:(?<![)0-9])-)? - an optional non-capturing group matching 1 or 0 occurrences of
(?<![)0-9]) - a place in string that is not immediately preceded with a ) or digit
- - a minus
[0-9]+ - 1+ digits
(?:\.[0-9]+)? - an optional non-capturing group matching 1 or 0 occurrences of a . followed with 1+ digits.
In code, it is better to use a match evaluator (see the C# demo online):
Regex rx = new Regex(#"(?:(?<![)0-9])-)?[0-9]+(?:\.[0-9]+)?");
string expression = "((-30+5.2)*(2+7))-((-3.1*2.5)-9.12)";
char letter = (char)96; // char before a in ASCII table
string result = rx.Replace(expression, m =>
{
letter++; // char is incremented
return letter.ToString();
}
);
Console.WriteLine(result); // => ((a+b)*(c+d))-((e*f)-g)

RegEx split string into words by space and containing chars

How can one perform this split with the Regex.Split(input, pattern) method?
This is a [normal string ] made up of # different types # of characters
Array of strings output:
1. This
2. is
3. a
4. [normal string ]
5. made
6. up
7. of
8. # different types #
9. of
10. characters
Also it should keep the leading spaces, so I want to preserve everything. A string contains 20 chars, array of strings should total 20 chars across all elements.
What I have tried:
Regex.Split(text, #"(?<=[ ]|# #)")
Regex.Split(text, #"(?<=[ ])(?<=# #")
I suggest matching, i.e. extracting words, not splitting:
string source = #"This is a [normal string ] made up of # different types # of characters";
// Three possibilities:
// - plain word [A-Za-z]+
// - # ... # quotation
// - [ ... ] quotation
string pattern = #"[A-Za-z]+|(#.*?#)|(\[.*?\])";
var words = Regex
.Matches(source, pattern)
.OfType<Match>()
.Select(match => match.Value)
.ToArray();
Console.WriteLine(string.Join(Environment.NewLine, words
.Select((w, i) => $"{i + 1}. {w}")));
Outcome:
1. This
2. is
3. a
4. [normal string ]
5. made
6. up
7. of
8. # different types #
9. of
10. characters
You may use
var res = Regex.Split(s, #"(\[[^][]*]|#[^#]*#)|\s+")
.Where(x => !string.IsNullOrEmpty(x));
See the regex demo
The (\[[^][]*]|#[^#]*#) part is a capturing group whose value is output to the resulting list along with the split items.
Pattern details
(\[[^][]*]|#[^#]*#) - Group 1: either of the two patterns:
\[[^][]*] - [, followed with 0+ chars other than [ and ] and then ]
#[^#]*# - a #, then 0+ chars other than # and then #
| - or
\s+ - 1+ whitespaces
C# demo:
var s = "This is a [normal string ] made up of # different types # of characters";
var results = Regex.Split(s, #"(\[[^][]*]|#[^#]*#)|\s+")
.Where(x => !string.IsNullOrEmpty(x));
Console.WriteLine(string.Join("\n", results));
Result:
This
is
a
[normal string ]
made
up
of
# different types #
of
characters
It would be easier using matching approach however it can be done using negative lookeaheads :
[ ](?![^\]\[]*\])(?![^#]*\#([^#]*\#{2})*[^#]*$)
matches a space not followed by
any character sequence except [ or ] followed by ]
# followed by an even number of #

Regex with balancing groups

I need to write regex that capture generic arguments (that also can be generic) of type name in special notation like this:
System.Action[Int32,Dictionary[Int32,Int32],Int32]
lets assume type name is [\w.]+ and parameter is [\w.,\[\]]+
so I need to grab only Int32, Dictionary[Int32,Int32] and Int32
Basically I need to take something if balancing group stack is empty, but I don't really understand how.
UPD
The answer below helped me solve the problem fast (but without proper validation and with depth limitation = 1), but I've managed to do it with group balancing:
^[\w.]+ #Type name
\[(?<delim>) #Opening bracet and first delimiter
[\w.]+ #Minimal content
(
[\w.]+
((?(open)|(?<param-delim>)),(?(open)|(?<delim>)))* #Cutting param if balanced before comma and placing delimiter
((?<open>\[))* #Counting [
((?<-open>\]))* #Counting ]
)*
(?(open)|(?<param-delim>))\] #Cutting last param if balanced
(?(open)(?!) #Checking balance
)$
Demo
UPD2 (Last optimization)
^[\w.]+
\[(?<delim>)
[\w.]+
(?:
(?:(?(open)|(?<param-delim>)),(?(open)|(?<delim>))[\w.]+)?
(?:(?<open>\[)[\w.]+)?
(?:(?<-open>\]))*
)*
(?(open)|(?<param-delim>))\]
(?(open)(?!)
)$
I suggest capturing those values using
\w+(?:\.\w+)*\[(?:,?(?<res>\w+(?:\[[^][]*])?))*
See the regex demo.
Details:
\w+(?:\.\w+)* - match 1+ word chars followed with . + 1+ word chars 1 or more times
\[ - a literal [
(?:,?(?<res>\w+(?:\[[^][]*])?))* - 0 or more sequences of:
,? - an optional comma
(?<res>\w+(?:\[[^][]*])?) - Group "res" capturing:
\w+ - one or more word chars (perhaps, you would like [\w.]+)
(?:\[[^][]*])? - 1 or 0 (change ? to * to match 1 or more) sequences of a [, 0+ chars other than [ and ], and a closing ].
A C# demo below:
var line = "System.Action[Int32,Dictionary[Int32,Int32],Int32]";
var pattern = #"\w+(?:\.\w+)*\[(?:,?(?<res>\w+(?:\[[^][]*])?))*";
var result = Regex.Matches(line, pattern)
.Cast<Match>()
.SelectMany(x => x.Groups["res"].Captures.Cast<Capture>()
.Select(t => t.Value))
.ToList();
foreach (var s in result) // DEMO
Console.WriteLine(s);
UPDATE: To account for unknown depth [...] substrings, use
\w+(?:\.\w+)*\[(?:\s*,?\s*(?<res>\w+(?:\[(?>[^][]+|(?<o>\[)|(?<-o>]))*(?(o)(?!))])?))*
See the regex demo

Regular expression [0-99]p/sec

A part of my application requires to find out the occurrence of "1p/sec" or "22p/sec" or "22p/ sec" or ( [00-99]p/sec also [00-99]p/ sec) in a string.
so far I am able to get only the first occurrence(i.e if its a single digit, like the one in the above string). I should be able to get 'n' number of occurrence
Someone pl provide guidance
string input = "US Canada calling # 1p/ sec (Base Tariff - 11p/sec). Validity : 30 Days.";
// Here we call Regex.Match.
Match match = Regex.Match(input, #"(\d)[p/sec]",
RegexOptions.IgnorePatternWhitespace);
// input.IndexOf("p/sec");
// Here we check the Match instance.
if (match.Success)
{
// Finally, we get the Group value and display it.
string key = match.Groups[1].Value;
Console.WriteLine(key);
}
You need to quantify your \d in the regex, for example by adding a + quantifier, which will then cause \d+ to match at least one but possibly more digits. To restrict to a specific number of digits, you can use the {n,m} quantifier, e.g. \d{1,2} which will then match either one or two digits.
Note also that [p/sec] as you use it in the regex is a character class, matching a single character from the set { c, e, p, s, / }, which is probably not what you want because you'd want to match the p/sec literally.
A more robust option would probably be the following
(\d+)\s*p\s*/\s*sec
which a) matches p/sec literally and also allows for whitespace between the number and the unit as well as around the /.
Use
Match match = Regex.Match(input, #"(\d{1,2})p/sec" ...
instead.
\d mathces a single digit. If you append {1,2} to that you instead match one - two digits. \d* would match zero or more and \d+ would match one or more. \d{1,10} would match 1-10 digits.
If you need to know if it was surrounded by brackets or not you could do
Match match = Regex.Match(input, #"(([\d{1,2}])|(\d{1,2}))p/sec"
...
bool hasBrackets = match.Groups[1].Value[0] == '[';
How about this regex:
^\[?\d\d?\]?/\s*sec$
Explanation:
The regular expression:
^\[?\d\d?\]?/\s*sec$
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
\[? '[' (optional (matching the most amount
possible))
----------------------------------------------------------------------
\d digits (0-9)
----------------------------------------------------------------------
\d? digits (0-9) (optional (matching the most
amount possible))
----------------------------------------------------------------------
\]? ']' (optional (matching the most amount
possible))
----------------------------------------------------------------------
/ '/'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
sec 'sec'
----------------------------------------------------------------------
$ before an optional \n, and the end of the
string
----------------------------------------------------------------------
If i get you right, you want to find all occurrances, right? Use a matchcollection
string input = "US Canada calling # 1p/sec (Base Tariff - 11p/sec). Validity : 30 Days.";
// Here we call Regex.Match.
Regex regex = new Regex(#"(\d){1,2}[p/sec]", RegexOptions.IgnorePatternWhitespace);
MatchCollection matchCollection = regex.Matches(input);
// input.IndexOf("p/sec");
// Here we check the Match instance.
foreach (Match match in matchCollection)
{
Console.WriteLine(match.Groups[0].Value);
}
You can do this:
string input = "US Canada calling # 1p/ sec (Base Tariff - 11p/sec). 91p/ sec , 123p/ sec Validity : 30 Days.";
MatchCollection matches = Regex.Matches(input, #"\b(\d{1,2})p/\s?sec");
foreach (Match m in matches) {
string key = m.Groups[1].Value;
Console.WriteLine(key);
}
Output:
1
11
91
\b is a word boundary, to anchor the match to the start of a "word" (notice that it will not match the "123p/ sec" in the test string!)
\d{1,2} Will match one or two digits. See Quantifiers
p/\s?sec matches a literal "p/sec" with an optional whitespace before "sec"
Regex Expression:
((?<price>(\d+))p/sec)|(\[(?<price>(\d+[-]\d+))\]p/sec)
In C# you need to run a for loop to check multiple captures:
if(match.Success)
{
for(int i = 0; i< match.Groups["price"].Captures.Count;i++)
{
string key = match.Groups["price"].Value;
Console.WriteLine(key);
}
}

Categories

Resources