using regex to split equations with variables C# - c#

I've been struggling with this for quite awhile (not being a regex ninja), searching stackoverflow and through trial an error. I think I'm close, but there are still a few hiccups that I need help sorting out.
The requirements are such that a given equation, that includes variables, exponents, etc, are split by the regex pattern after variables, constants, values, etc. What I have so far
Regex re = new Regex(#"(\,|\(|\)|(-?\d*\.?\d+e[+-]?\d+)|\+|\-|\*|\^)");
var tokens = re.Split(equation)
So an equation such as
2.75423E-19* (var1-5)^(1.17)* (var2)^(1.86)* (var3)^(3.56)
should parse to
[2.75423E-19 ,*, (, var1,-,5, ), ^,(,1.17,),*....,3.56,)]
However the exponent portion is getting split as well which I think is due to the regex portion: |+|-.
Other renditions I've tried are:
Regex re1 = new Regex(#"([\,\+\-\*\(\)\^\/\ ])"); and
Regex re = new Regex(#"(-?\d*\.?\d+e[+-]?\d+)|([\,\+\-\*\(\)\^\/\ ])");
which both have there flaws. Any help would be appreciated.

For the equations like the one posted in the original question, you can use
[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?|[-^+*/()]|\w+
See regex demo
The regex matches:
[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)? - a float number
| - or...
[-^+*/()] - any of the arithmetic and logical operators present in the equation posted
| - or...
\w+ - 1 or more word characters (letters, digits or underscore).
For more complex tokenization, consider using NCalc suggested by Lucas Trzesniewski's comment.
C# sample code:
var line = "2.75423E-19* (var1-5)^(1.17)* (var2)^(1.86)* (var3)^(3.56)";
var matches = Regex.Matches(line, #"[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?|[-^+*/()]|\w+");
foreach (Match m in matches)
Console.WriteLine(m.Value);
And updated code for you to show that Regex.Split is not necessary here:
var result = Regex.Matches(line, #"\d+(?:[,.]\d+)*(?:e[-+]?\d+)?|[-^+*/()]|\w+", RegexOptions.IgnoreCase)
.Cast<Match>()
.Select(p => p.Value)
.ToList();
Also, to match formatted numbers, you can use \d+(?:[,.]\d+)* rather than [0-9]*\.?[0-9]+ or \d+(,\d+)*.

So I think I've got a solution thanks to #stribizhev solution lead me to the regex solution
Regex re = new Regex(#"(\d+(,\d+)*(?:.\d+)?(?:[eE][-+]?[0-9]+)?|[-^+/()]|\w+)");
tokenList = re.Split(InfixExpression).Select(t => t.Trim()).Where(t => t != "").ToList();
When split gives me the desired array.

Related

RegEx for matching special chars no spaces or newlines

I have a string and want to use regex to match all the chars, but no spaces.
I tried to replace all the spaces with nothing, using:
Regex.Replace(seller, #"[A-z](.+)", m => m.Groups[1].Value);
//rating
var betyg = Regex.Replace(seller, #"[A-z](.+)", m => m.Groups[1].Value);`
I expect the output of
"Iris-presenter | 5"
but, the output is
"Iris-presenter"
seen in this also seen in this demo.
The string is:
<spaces>Iris-presenter
<spaces>|
<spaces>5
Great question! I'm not quite sure, if this would be what you might be looking for. This expression however matches your input string:
^((?!\s|\n).)*
Graph
The graph shows how it might work:
Edit
Based on revo's advice, the expression can be much simplified, because
^((?!\s|\n).)* is equal to ^((?!\s).)* and both are equal to ^\S*.
I used (\s(.*?)) for it to work. This removes all spaces and new lines seen here

Regex to match format of (x), (x)

Can somebody help me to write a regex for format like (x), (x)
where x can be any single digit number. I am able to write match a format like (x) as follows:
Regex rgx = new Regex(#"^\(([^)]+\)$", RegexOptions.IgnoreCase)
If you don't need to capture the non numbers, then only pattern actually required is \d for a numeric.
Each match of \d will be the individual number found as the parser works across the string.
For example:
var values = Regex.Matches("(1) (2)", #"\d")
.OfType<Match>()
.Select (mt => mt.ToString())
.ToArray();
Console.WriteLine ("Numbers found: {0}", string.Join(", ", values));
// Writes out->
// Numbers found: 1, 2
Eratta
The example you gave has RegexOptions.IgnoreCase. This actually does slow down pattern matching because the parser has to convert any character to its neutral counterpart of either upper or lower case before it compares to the words in the target match. Culture is taken into account so 'a' is also connected with 'À', 'Ã', and 'Ä' etc which too have to be processed.
Since you are dealing with numbers using that option makes no sense.
If you don't believe me, look at Jeff Atwood's (Stackoverflow's co-founder) answer to Is regex case insensitivity slower?
Are you looking for something like this?
\(([0-9])\),\s?\([0-9]\)
Also, when trying to write Regexps, I would recommend using Regex101.com.

How to get each tablerow in regex from a string (C#)?

Basically, I've got a string like this:
<table><tr class="even"><td></td><td></td></tr><tr class="odd"><td></td><td></td></tr></table>
I want to get each tablerow using Regex.Matches (c#)
What would my regex pattern be?
I've tried the following but I think it's probably pretty wrong:
"<tr"(.*)"</tr>"
[<][t][r](.*)[<][/][t][r][>]"
But even
[t][r].*[t][r]
Gives 0 matches.
Any idea?
Thanks a lot!
\<tr[\s\S]*?\/tr\>
.?* : All Characters, stop at first match.
\s\S : All Characters, even new lines.
I use RAD Software Regular Expression Designer to create and test REGEX expressions. It is a great tool!
Try this:
var s = #"<table><tr class=""even""><td></td><td></td></tr><tr class=""odd""><td></td><td></td></tr></table>";
Regex.Matches(s, #"(<tr .*?</tr>)", RegexOptions.Singleline)
var str = "<table><tr class=\"even\"><td></td><td></td></tr><tr class=\"odd\"><td></td><td></td></tr></table>";
string[] trs = Regex.Matches(str, #"<tr[^>]*>(?<content>.*)</tr>", RegexOptions.Multiline)
.Cast<Match>()
.Select(t => t.Groups["content"].Value)
.ToArray();
the "trs" array is your result, containing <td></td><td></td> two times.
edit: This regex is not resilient to missing a </tr> (wich would display fine in any browser). This could be solved using a negative lookahead
Consider the following Regex...
(?<=\<tr.*?\>).*?(?=\</tr\>)

Regular expression to parse [A][B] into A and B

I am trying to separate the following string into a separate lines with regular expression
[property1=text1][property2=text2]
and the desired result should be
property1=text1
property2=text2
here is my code
string[] attrs = Regex.Split(attr_str, #"\[(.+)\]");
Results is incorrect, am probably doing something wrong
UPDATE: after applying the suggested answers. Now it shows spaces and empty string
.+ is a greedy match, so it grabs as much as possible.
Use either
\[([^]]+)\]
or
\[(.+?)\]
In the first case, matching ] is not allowed, so "as much as possible" becomes shorter. The second uses a non-greedy match.
Your dot is grabbing the braces as well. You need to exclude braces:
\[([^]]+)\]
The [^]] matches any character except a close brace.
Try adding the 'lazy' specifier:
Regex.Split(attr_str, #"\[(.+?)\]");
Try:
var s = "[property1=text1][property2=text2]";
var matches = Regex.Matches(s, #"\[(.+?)\]")
.Cast<Match>()
.Select(m => m.Groups[1].Value);

Why is my .NET regex not working correctly?

I have a text file which is in the format:
key1:val1,
key2:val2,
key3:val3
and I am trying to parse the key/value pairs out with a regex. Here is the regex code I am using with the same example:
string input = #"key1:val1,
key2:val2,
key3:val3";
var r = new Regex(#"^(?<name>\w+):(?<value>\w+),?$", RegexOptions.Multiline | RegexOptions.ExplicitCapture);
foreach (Match m in r.Matches(input))
{
Console.WriteLine(m.Groups["name"].Value);
Console.WriteLine(m.Groups["value"].Value);
}
When I loop through r.Matches, sometimes certain key/value pairs don't appear, and it seems to be the ones with the comma at the end of the line - but I should be taking that into account with the ,?. What am I missing here?
this might be a good situation for String.Split rather than a regex:
foreach(string pair in input.Split(new Char [] {','}))
{
string [] items = pair.Split(new Char [] {':'});
Console.WriteLine(items[0]);
Console.WriteLine(items[1]);
}
The problem is that your regular expression is not matching the newline in the first two lines.
Try changing it to
#"^(?<name>\w+):(?<value>\w+),?(\n|\r|\r\n)?$"
and it should work.
By the way, I love regular expressions, but given the problem you are trying to solve, go for the string.Split solution. It will be much easier to read...
EDIT: after reading your comment, where you say that this is a simplified version of your problem, then maybe you could simplify the expression by adding some "tolerance" for spaces / newline at the end of the match with
#"^(?<name>\w+):(?<value>\w+),?\s*$"
Also, when you play with regular expressions, test them with a tool like Expresso, it saves a lot of time.
Get rid of the RegexOptions.Multiline option.

Categories

Resources