How to exclude regex match inside nested parentheses - c#

I have a text like this:
UseProp1?(Prop1?Prop1:Test):(UseProp2?Prop2:(Test Text: '{TextProperty}' Test Reference:{Reference}))
I'm trying to use regex in c# to extract the nested if/else-segments.
To find '?' I've used:
Pattern 1: \?\s*(?![^()]*\))
and to find ':' I've used:
Pattern 2: \:\s*(?![^()]*\))
This works fine when there is one level of parentheses but not when nesting them.
I've used this online tool to simplify the testing: http://regexstorm.net/tester (and insert pattern-1 and input from above)
As you can see, it highlights two matches but I only want the first. You'll also notice that first parentheses is overlooked but not the next one with the nested levels
I expect the match list to be:
1) UseProp1
2) (Prop1?Prop1:Test):(UseProp2?Prop2:(Test Text: '{TextProperty}' Test Reference:{Reference}))
What I'm getting now is:
1) UseProp1
2) (Prop1?Prop1:Test):(UseProp2
3) Prop2:(Test Text: '{TextProperty}' Test Reference:{Reference}))

Expanding on #bobble bubble's comment, here's my regex:
It will capture the first layer of ternary functions. Capture groups: $1 is the conditional, $2 is the true clause, and $3 is the false clause. You will then have to match the regex on each of those to step further down the tree:
((?:\((?>\((?<c>)|[^()]+|\)(?<-c>))*(?(c)(?!))\))+|\b[^)(?:]+)+\?((?:\((?>\((?<c>)|[^()]+|\)(?<-c>))*(?(c)(?!))\))+|\b[^)(?:]+)+\:((?:\((?>\((?<c>)|[^()]+|\)(?<-c>))*(?(c)(?!))\))+|\b[^)(?:]+)+
Code in Tester
That being said, if you are evaluating math in these expressions as well, it may be more valuable to use a runtime compiler to do all the heavy lifting for you. This answer will help you design in that direction if you so choose.

If I understand it right, and we wish to capture only the two listed formats, we can start with a simple expression using alternation, then we'd modify its compartments, if we would like so:
UseProp1|(\(?Prop1\?Prop1(:Test)\)):(\(UseProp2\?Prop2):\((Test\sText):\s+'\{(.+?)}'\s+Test\sReference:\{(.+?)}\)\)
Demo
Test
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"UseProp1|(\(?Prop1\?Prop1(:Test)\)):(\(UseProp2\?Prop2):\((Test\sText):\s+'\{(.+?)}'\s+Test\sReference:\{(.+?)}\)\)";
string input = #"UseProp1
(Prop1?Prop1:Test):(UseProp2?Prop2:(Test Text: '{TextProperty}' Test Reference:{Reference}))
";
RegexOptions options = RegexOptions.Multiline;
foreach (Match m in Regex.Matches(input, pattern, options))
{
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
}
}
RegEx
If this expression wasn't desired and you wish to modify it, please visit this link at regex101.com.
RegEx Circuit
jex.im visualizes regular expressions:

Related

What will be regular expression of domain URL of "www.google.com"?

What will be the regex for RegularExpressionValidator in asp.net for domain name like "www.google.com"?
Valid Cases:
www.google.com
www.youwebsite.com
Invalid Cases:
http://www.google.com
https://www.google.com
google.com
www.google
Currently I used (?:[a-z0-9](?:[a-z0-9-]{0,61}[a-z0-9])?\.)+[a-z0-9][a-z0-9-]{0,61}[a-z0-9] but it fails for invalid case number 3 and 4.
The pattern that you tried fails for the third and fourth of the invalid cases because in general you are matching a-z0-9 and then repeat 1+ times . followed by a-z0-9 which does not take a www into account.
If you want to keep your pattern, you should make sure that it starts with www.
^www\.(?:[a-z0-9](?:[a-z0-9-]{0,61}[a-z0-9])?\.)+[a-z0-9][a-z0-9-]{0,61}[a-z0-9]$
Regex demo
You might shorten your pattern and make the match a bit broader:
^www\.[a-z0-9-]+(?:\.[a-z0-9-]+)*\.com$
Regex demo
You can always extend the character class if you want to allow matching more characters.
Assuming that we would have valid ULRs as listed, we can start with a simple expression such as:
^www\..+\.com
Demo 1
Then, we can add additional boundaries, if desired. For instance, we could add char class and end anchor, such as:
^www\..+\.com$
^www\.[A-Za-z_]+\.com$
Demo 2
If necessary, we would continue adding more constraints and test:
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"^www\.[A-Za-z_]+\.com";
string input = #"www.google.com
www.youwebsite.com
http://www.google.com
https://www.google.com
google.com
www.google";
RegexOptions options = RegexOptions.Multiline;
foreach (Match m in Regex.Matches(input, pattern, options))
{
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
}
}
RegEx Circuit
jex.im visualizes regular expressions:
This matches the part you want.
\bwww\.[a-zA-Z0-9]{2,256}\.com\b
But easier way to go with such a simple pattern is to use StartsWith, EndsWith, and then check what is in between.

Regex to capture groups of parentheses including inner and outer parentheses

I want to match all parentheses including the inner and outer parentheses.
Input: abc(test)def(rst(another test)uv)xy
Desired Output: (test)
(rst(another test)uv)
(another test)
My following c# code returns only (test) and (rst(another test)uv):
string input = "abc(test)def(rst(another test)uv)xy";
Regex regex = new Regex(#"\(([^()]+| (?<Level>\()| (?<-Level>\)))+(?(Level)(?!))\)", RegexOptions.IgnorePatternWhitespace);
foreach (Match c in regex.Matches(input))
{
Console.WriteLine(c.Value);
}
You are looking for overlapping matches. Thus, just place your regex into a capturing group and put it inside a non-anchored positive lookahead:
Regex regex = new Regex(#"(?=(\(([^()]+| (?<Level>\()| (?<-Level>\)))+(?(Level)(?!))\)))", RegexOptions.IgnorePatternWhitespace);
The value you need will be inside match.Groups[1].Value.
See the IDEONE demo:
using System;
using System.Text.RegularExpressions;
using System.IO;
using System.Linq;
public class Test
{
public static void Main()
{
var input = "abc(test)def(rst(another test)uv)xy";
var regex = new Regex(#"(?=(\(([^()]+| (?<Level>\()| (?<-Level>\)))+(?(Level)(?!))\)))", RegexOptions.IgnorePatternWhitespace);
var results = regex.Matches(input).Cast<Match>()
.Select(p => p.Groups[1].Value)
.ToList();
Console.WriteLine(String.Join(", ", results));
}
}
Results: (test), (rst(another test)uv), (another test).
Note that unanchored positive look-aheads can be used to find overlapping matches with capturing in place because they do not consume text (i.e. the regex engine index stays at its current position when trying to match with all the subpatterns inside the lookahead) and the regex engine automatically moves its index after match/failure making the matching process "global" (i.e. tests for a match at every position inside an input string).
Although lookahead subexpressions do not match, they still can capture into groups.
Thus, when the look-ahead comes to the (, it may match a zero-width string and place they value you need into Group 1. Then, it goes on and finds another ( inside the first (...), and can capture a substring inside it again.
You could use this one : \((?>[^()]+|\((?<P>)|(?<C-P>)\))*(?(P)(?!))\) but you'll have to dig through captures, groups and groups' captures to get what you want (see demo)
Edit: This answer is flat out wrong for .Net regular expressions - see nam's comment below.
Original answer:
Regular expressions match regular languages. Nested parentheses are not a regular language, they require a context-free grammar to match. So the short answer is there is no way to do what you're attempting.
https://stackoverflow.com/a/133684/361631

Regex to extract string between quotes

I'm trying to extract a string between two quotes, and I thought I had my regex working, but it's giving me two strings in my GroupCollection, and I can't get it to ignore the first one, which includes the first quote and ID=
The string that I want to parse is
Test ID="12345" hello
I want to return 12345 in a group, so that I can manipulate it in code later. I've tried the following regex: http://regexr.com/3bgtl, with this code:
nodeValue = "Test ID=\"12345\" hello";
GroupCollection ids = Regex.Match(nodeValue, "ID=\"([^\"]*)").Groups;
The problem is that the GroupCollection contains two entries:
ID="12345
12345
I just want it to return the second one.
Use positive lookbehind operator:
GroupCollection ids = Regex.Match(nodeValue, "(?<=ID=\")[^\"]*").Groups;
You also used a capturing group (the parenthesis), this is why you get 2 results.
There are a few ways to accomplish this. I like named capture groups for readability.
Regex with named capture group:
"(?<capture>.*?)"
And your code would be:
match.Groups["capture"].Value
Your code is totally OK and is the most efficient from all the solutions suggested here. Capturing groups allow the quickest and least resource-consuming way to match substrings inside larger texts.
All you need to do with your regex is just access the captured group 1 that is defined by the round brackets. Like this:
var nodeValue = "Test ID=\"12345\" hello";
GroupCollection ids = Regex.Match(nodeValue, "ID=\"([^\"]*)").Groups;
Console.WriteLine(ids[1].Value);
// or just on one line
// Console.WriteLine(Regex.Match(nodeValue, "ID=\"([^\"]*)").Groups[1].Value);
See IDEONE demo
Please have a look at Grouping Constructs in Regular Expressions:
Grouping constructs delineate the subexpressions of a regular expression and capture the substrings of an input string. You can use grouping constructs to do the following:
Match a subexpression that is repeated in the input string.
Apply a quantifier to a subexpression that has multiple regular expression language elements. For more information about quantifiers, see [Quantifiers in Regular Expressions][3].
Include a subexpression in the string that is returned by the [Regex.Replace][4] and [Match.Result][5] methods.
Retrieve individual subexpressions from the [Match.Groups][6] property and process them separately from the matched text as a whole.
Note that if you do not need overlapping matches, capturing group mechanism is the best solution here.

C# regex optional group not working

I am trying to get parts from these strings:
first:
2F4449534301224E4F204445534352495054494F4E20415641494C41424C45011F30303034342D30313230382D
second:
2F4449534301224E4F204445534352495054494F4E20415641494C41424C45011F30303130312D3032323534012630303130312D31303932342D
basically I want to return for both strings:
first:
2F(.+)011F(.+)2D
second:
2F(.+)011F(.+)0126(.+)2D
I am trying to use this pattern:
Match m = Regex.Match(this.__line,
#"^2F.*22(.*)011F(.*)(0126.*)?.{2}$",
RegexOptions.IgnoreCase);
However, when I try:
if (m.Success)
{
if (m.Groups[3].Value != "")
{
Console.WriteLine("good");
}
}
else
{
Console.WriteLine("bad");
}
I get "bad" from the second string because it is not matching the pattern. Am I not using the correct pattern?
The problem is that your pattern is greedy. You should use this patten instead:
^2F.*22(.*?)011F(.*?)(0126.*?)?.{2}$
The second group in your regex matches everything until the last 2 charcaters at the end because it is greedy and the last group is optional.
To make your matches nongreedy use a ? after the quantifier.
Here is more info about greedy and nongreedy.
Hope this helps.
Take out the "^".
2F.22(.)011F(.)(0126.)?.{2}$
http://regexpal.com/ is my hands down favorite regex tool.
I would like to give you some advices. These are not answers to your question, just some good practices tips:
The anything but new line (.) symbol has a very poor performance, you should avoid using it whenever possible. As I can see, you could replace it with \S
For case insensitive match, use the syntax (?i:pattern). This gives you the option of choosing RegexOptions.Compiled, which will give you a better performance
For retrieving text, the use of named capture groups is recommended. Use the syntax (?<name>pattern). This way you can retrieve it by regexMatch.Groups["name"].Captures[0].Value
Whenever you have a group that you do not want to retrive (only for matching purposes), mark it as a non-capturing group, using the syntax (?:pattern)
Lastly, RegexBuddy is a great (yet paid) tool. Highly recommended.
Regards.

Want any alphanumeric and underscore, dash and period to pass

In my controller, I current set a constraint that has the following regex:
#"\w+"
I want to also allow for underscore,dash and period.
THe more effecient the better since this is called allot.
any tips?
try this:
#"[._\w-]+"
(?:\w|[-_.])+
Will match either one or more word characters or a hyphen, underscore or period. They are bundled in a non-capturing group.
I guess we don't really want to add the _ to our expression here, it is already part of the \w construct, which would account for uppers, lowers, digits and underscore [A-Za-z0-9_], and
[\w.-]+
would work just fine.
We can also add start and end anchors, if you'd wanted to:
^[\w.-]+$
Test
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"[\w.-]+";
string input = #"abcABC__.
ABCabc_.-
-_-_-abc.
";
RegexOptions options = RegexOptions.Multiline;
foreach (Match m in Regex.Matches(input, pattern, options))
{
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
}
}
C# Demo
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
RegEx Circuit
jex.im visualizes regular expressions:
Does the following help?
#"[\w_-.]+"
P.S. I use Rad Software Regular Expression Designer to design complex Regexes.
#"[\w_-.]+"
im no regex guru was just a guess so verify that it works...
I'd use #"[\w\-._]+" as regex since the dash might be interpreted as a range delimiter. It is of no danger with \w but if you later on add say # it's safer to have the dash already escaped.
There's a few suggestions that have _-. already on the page and I believe that will be interpreted as a "word character" or anything from "_" to "." in a range.
Pattern to include all: [_.-\w]+
You can suffix the _ \. and - with ? to make any of the characters optional (none or more) e.g. for the underscore:
[_?\.-\w]+
but see Skurmedel's pattern to make all optional.

Categories

Resources