How to get each tablerow in regex from a string (C#)? - c#

Basically, I've got a string like this:
<table><tr class="even"><td></td><td></td></tr><tr class="odd"><td></td><td></td></tr></table>
I want to get each tablerow using Regex.Matches (c#)
What would my regex pattern be?
I've tried the following but I think it's probably pretty wrong:
"<tr"(.*)"</tr>"
[<][t][r](.*)[<][/][t][r][>]"
But even
[t][r].*[t][r]
Gives 0 matches.
Any idea?
Thanks a lot!

\<tr[\s\S]*?\/tr\>
.?* : All Characters, stop at first match.
\s\S : All Characters, even new lines.
I use RAD Software Regular Expression Designer to create and test REGEX expressions. It is a great tool!

Try this:
var s = #"<table><tr class=""even""><td></td><td></td></tr><tr class=""odd""><td></td><td></td></tr></table>";
Regex.Matches(s, #"(<tr .*?</tr>)", RegexOptions.Singleline)

var str = "<table><tr class=\"even\"><td></td><td></td></tr><tr class=\"odd\"><td></td><td></td></tr></table>";
string[] trs = Regex.Matches(str, #"<tr[^>]*>(?<content>.*)</tr>", RegexOptions.Multiline)
.Cast<Match>()
.Select(t => t.Groups["content"].Value)
.ToArray();
the "trs" array is your result, containing <td></td><td></td> two times.
edit: This regex is not resilient to missing a </tr> (wich would display fine in any browser). This could be solved using a negative lookahead

Consider the following Regex...
(?<=\<tr.*?\>).*?(?=\</tr\>)

Related

using regex to split equations with variables C#

I've been struggling with this for quite awhile (not being a regex ninja), searching stackoverflow and through trial an error. I think I'm close, but there are still a few hiccups that I need help sorting out.
The requirements are such that a given equation, that includes variables, exponents, etc, are split by the regex pattern after variables, constants, values, etc. What I have so far
Regex re = new Regex(#"(\,|\(|\)|(-?\d*\.?\d+e[+-]?\d+)|\+|\-|\*|\^)");
var tokens = re.Split(equation)
So an equation such as
2.75423E-19* (var1-5)^(1.17)* (var2)^(1.86)* (var3)^(3.56)
should parse to
[2.75423E-19 ,*, (, var1,-,5, ), ^,(,1.17,),*....,3.56,)]
However the exponent portion is getting split as well which I think is due to the regex portion: |+|-.
Other renditions I've tried are:
Regex re1 = new Regex(#"([\,\+\-\*\(\)\^\/\ ])"); and
Regex re = new Regex(#"(-?\d*\.?\d+e[+-]?\d+)|([\,\+\-\*\(\)\^\/\ ])");
which both have there flaws. Any help would be appreciated.
For the equations like the one posted in the original question, you can use
[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?|[-^+*/()]|\w+
See regex demo
The regex matches:
[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)? - a float number
| - or...
[-^+*/()] - any of the arithmetic and logical operators present in the equation posted
| - or...
\w+ - 1 or more word characters (letters, digits or underscore).
For more complex tokenization, consider using NCalc suggested by Lucas Trzesniewski's comment.
C# sample code:
var line = "2.75423E-19* (var1-5)^(1.17)* (var2)^(1.86)* (var3)^(3.56)";
var matches = Regex.Matches(line, #"[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?|[-^+*/()]|\w+");
foreach (Match m in matches)
Console.WriteLine(m.Value);
And updated code for you to show that Regex.Split is not necessary here:
var result = Regex.Matches(line, #"\d+(?:[,.]\d+)*(?:e[-+]?\d+)?|[-^+*/()]|\w+", RegexOptions.IgnoreCase)
.Cast<Match>()
.Select(p => p.Value)
.ToList();
Also, to match formatted numbers, you can use \d+(?:[,.]\d+)* rather than [0-9]*\.?[0-9]+ or \d+(,\d+)*.
So I think I've got a solution thanks to #stribizhev solution lead me to the regex solution
Regex re = new Regex(#"(\d+(,\d+)*(?:.\d+)?(?:[eE][-+]?[0-9]+)?|[-^+/()]|\w+)");
tokenList = re.Split(InfixExpression).Select(t => t.Trim()).Where(t => t != "").ToList();
When split gives me the desired array.

Regex in c# to find files in log

I'm trying to find some filenames that are written into a logfile that end on 'K.TIF'.
I'm trying to find:
20130629VGM180ZZ001001K.TIF
20130629VGM180ZZ001002K.TIF
etc.
As I'm terrible in regex's, I tried this:
Regex.Match(line, #"([A-Z0-9]+){23}\.TIF", RegexOptions.IgnoreCase);
Regex.Match(line, #"(?<=\\)(.>)(?=K\.TIF){23}", RegexOptions.IgnoreCase);
The first one is terrible, doesn't perform and gives bad results.
The second one actually gives all the TIF that end on Z.TIF if I change K\ to Z. However, it does not find any K.TIF's with the current regex.
This seems to work for me:
^.*\\(\w*K.TIF)$
It searches for the last slash and then captures the word characters followed by K.TIF. Example: http://www.regex101.com/r/nH6gV4
This should work:
#"\w+K\.TIF$"
The first regular expression is very close to the answer, but it has an extra '+'. I think you can try the following code.
Regex.Match(line, #"([A-Z0-9]){22}K\.TIF", RegexOptions.IgnoreCase);
This regex will get what you want:
\\([A-Z0-9]{22}K\.TIF)$
You shouldn't use IgnoreCase as you specifically made the regex to match just caps.
The extract value will be inside a match group so use:
string MatchedFileName = Regex.Match(line, #"[A-Z0-9]{22}K\.TIF$").Value;
(Updated, thanks Tyler for pointing out I hadn't read the OP's question properly)
(Updated again as it didnt need the backslash at the start or the capture group)
use this regex var res = Regex.Match(line, #"(?im)^.+k\.tif$";

Regular expression to parse [A][B] into A and B

I am trying to separate the following string into a separate lines with regular expression
[property1=text1][property2=text2]
and the desired result should be
property1=text1
property2=text2
here is my code
string[] attrs = Regex.Split(attr_str, #"\[(.+)\]");
Results is incorrect, am probably doing something wrong
UPDATE: after applying the suggested answers. Now it shows spaces and empty string
.+ is a greedy match, so it grabs as much as possible.
Use either
\[([^]]+)\]
or
\[(.+?)\]
In the first case, matching ] is not allowed, so "as much as possible" becomes shorter. The second uses a non-greedy match.
Your dot is grabbing the braces as well. You need to exclude braces:
\[([^]]+)\]
The [^]] matches any character except a close brace.
Try adding the 'lazy' specifier:
Regex.Split(attr_str, #"\[(.+?)\]");
Try:
var s = "[property1=text1][property2=text2]";
var matches = Regex.Matches(s, #"\[(.+?)\]")
.Cast<Match>()
.Select(m => m.Groups[1].Value);

Why is my .NET regex not working correctly?

I have a text file which is in the format:
key1:val1,
key2:val2,
key3:val3
and I am trying to parse the key/value pairs out with a regex. Here is the regex code I am using with the same example:
string input = #"key1:val1,
key2:val2,
key3:val3";
var r = new Regex(#"^(?<name>\w+):(?<value>\w+),?$", RegexOptions.Multiline | RegexOptions.ExplicitCapture);
foreach (Match m in r.Matches(input))
{
Console.WriteLine(m.Groups["name"].Value);
Console.WriteLine(m.Groups["value"].Value);
}
When I loop through r.Matches, sometimes certain key/value pairs don't appear, and it seems to be the ones with the comma at the end of the line - but I should be taking that into account with the ,?. What am I missing here?
this might be a good situation for String.Split rather than a regex:
foreach(string pair in input.Split(new Char [] {','}))
{
string [] items = pair.Split(new Char [] {':'});
Console.WriteLine(items[0]);
Console.WriteLine(items[1]);
}
The problem is that your regular expression is not matching the newline in the first two lines.
Try changing it to
#"^(?<name>\w+):(?<value>\w+),?(\n|\r|\r\n)?$"
and it should work.
By the way, I love regular expressions, but given the problem you are trying to solve, go for the string.Split solution. It will be much easier to read...
EDIT: after reading your comment, where you say that this is a simplified version of your problem, then maybe you could simplify the expression by adding some "tolerance" for spaces / newline at the end of the match with
#"^(?<name>\w+):(?<value>\w+),?\s*$"
Also, when you play with regular expressions, test them with a tool like Expresso, it saves a lot of time.
Get rid of the RegexOptions.Multiline option.

Extract string between braces using RegEx, ie {{content}}

I am given a string that has place holders in the format of {{some_text}}. I would like to extract this into a collection using C# and believe RegEx is the best way to do this. RegEx is a little over my head but it seems powerful enough to work in this case. Here is my example:
<a title="{{element='title'}}" href="{{url}}">
<img border="0" alt="{{element='title'}}" src="{{element='photo' property='src' maxwidth='135'}}" width="135" height="135" /></a>
<span>{{element='h1'}}</span>
<span><strong>{{element='price'}}<br /></strong></span>
I would like to end up with something like this:
collection[0] = "element='title'";
collection[1] = "url";
collection[2] = "element='photo' property='src' maxwidth='135'";
collection[3] = "element='h1'";
collection[4] = "element='price'";
Notice that there are no duplicates either, but I do not want to complicate things if it is difficult to do.
I saw this post that does something similar but within brackets:
How to extract the contents of square brackets in a string of text in c# using Regex
My problem here is that I have double braces instead of just one character. How can I do this?
Taking exactly from the question you linked:
ICollection<string> matches =
Regex.Matches(s.Replace(Environment.NewLine, ""), #"\{\{([^}]*)\}\}")
.Cast<Match>()
.Select(x => x.Groups[1].Value)
.ToList();
foreach (string match in matches)
Console.WriteLine(match);
I've changed the [ and ] to {{ and }} (escaped). This should make the collection you need. Be sure to read the first answer to the other question for the regex breakdown. It's important to understand it if you use it.
RegEx is more than powerful enough for what you need.
Try this regular expression:
\{\{.*?\}\}
That will match expressions between double brackets, lazily.
Edit: that will give you the strings, including the double brackets. You can parse them manually, but if the regex engine supports lookahead and lookbehind, you can get what's inside directly with something like:
(?<=\{\{).*?(?=\}\})
You will need to get rid of the duplicates after you have the matches.
\{\{(.*?)}}
Result 1
element='title'
Result 2
url
Result 3
element='title'
Result 4
element='photo' property='src' maxwidth='135'
Result 5
element='h1'
Result 6
element='price'

Categories

Resources