Using RegEx to balance match parenthesis - c#

I am trying to create a .NET RegEx expression that will properly balance out my parenthesis. I have the following RegEx expression:
func([a-zA-Z_][a-zA-Z0-9_]*)\(.*\)
The string I am trying to match is this:
"test -> funcPow((3),2) * (9+1)"
What should happen is Regex should match everything from funcPow until the second closing parenthesis. It should stop after the second closing parenthesis. Instead, it is matching all the way to the very last closing parenthesis. RegEx is returning this:
"funcPow((3),2) * (9+1)"
It should return this:
"funcPow((3),2)"
Any help on this would be appreciated.

Regular Expressions can definitely do balanced parentheses matching. It can be tricky, and requires a couple of the more advanced Regex features, but it's not too hard.
Example:
var r = new Regex(#"
func([a-zA-Z_][a-zA-Z0-9_]*) # The func name
\( # First '('
(?:
[^()] # Match all non-braces
|
(?<open> \( ) # Match '(', and capture into 'open'
|
(?<-open> \) ) # Match ')', and delete the 'open' capture
)+
(?(open)(?!)) # Fails if 'open' stack isn't empty!
\) # Last ')'
", RegexOptions.IgnorePatternWhitespace);
Balanced matching groups have a couple of features, but for this example, we're only using the capture deleting feature. The line (?<-open> \) ) will match a ) and delete the previous "open" capture.
The trickiest line is (?(open)(?!)), so let me explain it. (?(open) is a conditional expression that only matches if there is an "open" capture. (?!) is a negative expression that always fails. Therefore, (?(open)(?!)) says "if there is an open capture, then fail".
Microsoft's documentation was pretty helpful too.

Using balanced groups, it is:
Regex rx = new Regex(#"func([a-zA-Z_][a-zA-Z0-9_]*)\(((?<BR>\()|(?<-BR>\))|[^()]*)+\)");
var match = rx.Match("funcPow((3),2) * (9+1)");
var str = match.Value; // funcPow((3),2)
(?<BR>\()|(?<-BR>\)) are a Balancing Group (the BR I used for the name is for Brackets). It's more clear in this way (?<BR>\()|(?<-BR>\)) perhaps, so that the \( and \) are more "evident".
If you really hate yourself (and the world/your fellow co-programmers) enough to use these things, I suggest using the RegexOptions.IgnorePatternWhitespace and "sprinkling" white space everywhere :-)

Regular Expressions only work on Regular Languages. This means that a regular expression can find things of the sort "any combination of a's and b's".(ab or babbabaaa etc) But they can't find "n a's, one b, n a's".(a^n b a^n) Regular expressions can't guarantee that the first set of a's matches the second set of a's.
Because of this, they aren't able to match equal numbers of opening and closing parenthesis. It would be easy enough to write a function that traverses the string one character at a time. Have two counters, one for opening paren, one for closing. increment the pointers as you traverse the string, if opening_paren_count != closing_parent_count return false.

func[a-zA-Z0-9_]*\((([^()])|(\([^()]*\)))*\)
You can use that, but if you're working with .NET, there may be better alternatives.
This part you already know:
func[a-zA-Z0-9_]*\( --weird part-- \)
The --weird part-- part just means; ( allow any character ., or | any section (.*) to exist as many times as it wants )*. The only issue is, you can't match any character ., you have to use [^()] to exclude the parenthesis.
(([^()])|(\([^()]*\)))*

Related

C# Regex substring should be at start and end but not in the middle

Let's consider ${ the opening tag and }$ the closing tag. The opening tag should only occur at the start and the closing tag. The chars {,},$ are allowed as long as they do not form one of the tags: So ${Macro{Inner}}$ is allowed.
This is what I tried: \$\{[^((\$\{)|(\}\$))]+\}\$
If the curly's don't have to be balances, you might use
(?<!\S)\${[^{}]*(?>(?:(?<!\$){|}(?!\$))[^{}]*)*}\$(?!\S)
The pattern matches:
(?<!\S) Assert a whitespace boundary to the left
\${ match ${
[^{}]* Optionally repeat matching any char other than { and }
(?> Atomic group
(?: Non capture group
(?<!\$){ Match { asserting not { to the left
| Or
}(?!\$) Match } asserting not } to the right
) Close non capture group
[^{}]* Optionally repeat matching any char other than { and }
)* Close the atomic group and optionally repeat
}\$ Match }$
(?!\S) Assert a whitespace boundary to the right
.NET regex demo
If the parenthesis should be balanced, you could use:
(?<!\S)\${(?>(?<!\$){(?<c>)|[^{}]+|}(?!\$)(?<-c>))*(?(c)(?!))}\$(?!\S)
Regex demo
Don't need regex for this
s.StartsWith("${") && s.EndsWith("}$") && new[]{"${", "}$"}.All(x => x.IndexOf(x, 2, s.Length-4) == -1)
Why do I advocate not using a regex?
go for the simple solution, not the perfect one;
this code is more readable/self documenting
it's not so a regex so complicated that you have to ask on SO to make it work
you or the developer that replaces you has a more reasonable chance at maintaining it than a regex of the required complexity
You have tried the pattern [^((\$\{)|(\}\$))]+ to prevent ${ or }$ being matched, but that is a misunderstanding of how character groups work.
[^((\$\{)|(\}\$))] means match a single character that is not a (, $, {, ), |, or }.
The following working regex is an example of how to use a negative lookahead to avoid ${ or }$ being matched:
\$\{(?:(?!\$\{|\}\$).)*\}\$
If you want to match across newlines use RegexOptions.Singleline.
(Although I have done so, it is not necessary to escape the { and } in the regex above because the regex engine can determine from the surrounding context that they should be interpreted as match the literal character.)

Regular Expression get text between braces including other braces

I have a "main"-string like:
((Gripper|Open==true OR RIT|Turning==false) AND Robot|PosX >=3 OR (Test|Close==false OR (Gripper|Open==false AND RIT|Turning==false)))
I want to get three sub strings in the best case:
1: (Gripper|Open==true OR RIT|Turning==false)
2: Robot|PosX >=3
3: (Test|Close==false OR (Gripper|Open==false AND RIT|Turning==false))
But only two (the one in braces [1,3]) would be fine too, since they can be replaced in the main-string, getting the 3rd[2] as a result.
Ideally with the help of regex.
All the sub strings go into a class as children so I can apply the regex for each child and get their sub strings as well.
1: Test|Close==false
2: (Gripper|Open==false AND RIT|Turning==false)
For child number three (where the first result without the braces would be optional again.
I tried something similar to Regular expression to extract text between braces and putting positions of the matches onto a stack, but not with the expected results.
The best regex I found so far is
([^()]+(?:[^()]+)+) or
([^()]+(?:)+)
(seriously, regex is powerful, but I have no idea what the above statements really do) which gives me
1. Gripper|Open == true OR RIT|Turning==false
2. AND Robot|PosX >=3 OR
3. Test|Close==false OR
4. Gripper|Open==false AND RIT|Turning==false
But still, 3+4 should be in only one group as
Test|Close==false OR (Gripper|Open==false AND RIT|Turning==false)
Does anyone know how to achieve this?
It seems like you are looking for balanced parenthesis where the matches start with 2 words divided by a pipe and then an operator followed by an equals sign
In C# you might match either the balanced parenthesis or match a pattern that does not contain them using an alternation.
(?:\(\w+\|\w+\s*[<>!=]{1,2}[^()]*(?>[^()]+|(?<o>)\(|(?<-o>)\))*(?(o)(?!)|)\)|\w+\|\w+\s*[<>!=]{1,2}\S+)
(?: Non capture group
\(\w+\|\w+\s* Match ( then 2 words divided by a pipe and 0+ whitespace chars
[<>!=]{1,2}[^()]* Match any of the operators and match any char except ()
(?> Atomic group
[^()]+ Match 1+ times any char except ()
| Or
(?<o>)\( Add to stack
| Or
(?<-o>)\) Remove from stack
)* Close atomic group and repeat 0+ times
(?(o)(?!)|)\) Conditional with capturing group, evaluate the final subpattern
| Or
\w+\|\w+\s*[<>!=]{1,2}\S+ Match 2 words divided by a pipe and match operators
) Close non capture group
Regex demo
You may try with that:
(?<=\))(?!\()[^()]+|\((?!\()[^)]+\)
Regex101
Explanation:
(?<=\))(?!\()[^()]+ OR \((?!\()[^)]+\)
The first part before 'OR' basically matches AND Robot|PosX >=3 OR
(?<=\)) negative lookbehind: match current character if the
previous character is not )
(?!\() negative lookahead : match current character if the next
charcter is not ( or )
[^()]+ matches anything that is Neither ( nor ).
The last part after OR matches anything that starts with ( and ends with ) while ignoring any opening braces inside it.

Regex.Matches throws exception for regex formula c# [duplicate]

I am trying to create a .NET RegEx expression that will properly balance out my parenthesis. I have the following RegEx expression:
func([a-zA-Z_][a-zA-Z0-9_]*)\(.*\)
The string I am trying to match is this:
"test -> funcPow((3),2) * (9+1)"
What should happen is Regex should match everything from funcPow until the second closing parenthesis. It should stop after the second closing parenthesis. Instead, it is matching all the way to the very last closing parenthesis. RegEx is returning this:
"funcPow((3),2) * (9+1)"
It should return this:
"funcPow((3),2)"
Any help on this would be appreciated.
Regular Expressions can definitely do balanced parentheses matching. It can be tricky, and requires a couple of the more advanced Regex features, but it's not too hard.
Example:
var r = new Regex(#"
func([a-zA-Z_][a-zA-Z0-9_]*) # The func name
\( # First '('
(?:
[^()] # Match all non-braces
|
(?<open> \( ) # Match '(', and capture into 'open'
|
(?<-open> \) ) # Match ')', and delete the 'open' capture
)+
(?(open)(?!)) # Fails if 'open' stack isn't empty!
\) # Last ')'
", RegexOptions.IgnorePatternWhitespace);
Balanced matching groups have a couple of features, but for this example, we're only using the capture deleting feature. The line (?<-open> \) ) will match a ) and delete the previous "open" capture.
The trickiest line is (?(open)(?!)), so let me explain it. (?(open) is a conditional expression that only matches if there is an "open" capture. (?!) is a negative expression that always fails. Therefore, (?(open)(?!)) says "if there is an open capture, then fail".
Microsoft's documentation was pretty helpful too.
Using balanced groups, it is:
Regex rx = new Regex(#"func([a-zA-Z_][a-zA-Z0-9_]*)\(((?<BR>\()|(?<-BR>\))|[^()]*)+\)");
var match = rx.Match("funcPow((3),2) * (9+1)");
var str = match.Value; // funcPow((3),2)
(?<BR>\()|(?<-BR>\)) are a Balancing Group (the BR I used for the name is for Brackets). It's more clear in this way (?<BR>\()|(?<-BR>\)) perhaps, so that the \( and \) are more "evident".
If you really hate yourself (and the world/your fellow co-programmers) enough to use these things, I suggest using the RegexOptions.IgnorePatternWhitespace and "sprinkling" white space everywhere :-)
Regular Expressions only work on Regular Languages. This means that a regular expression can find things of the sort "any combination of a's and b's".(ab or babbabaaa etc) But they can't find "n a's, one b, n a's".(a^n b a^n) Regular expressions can't guarantee that the first set of a's matches the second set of a's.
Because of this, they aren't able to match equal numbers of opening and closing parenthesis. It would be easy enough to write a function that traverses the string one character at a time. Have two counters, one for opening paren, one for closing. increment the pointers as you traverse the string, if opening_paren_count != closing_parent_count return false.
func[a-zA-Z0-9_]*\((([^()])|(\([^()]*\)))*\)
You can use that, but if you're working with .NET, there may be better alternatives.
This part you already know:
func[a-zA-Z0-9_]*\( --weird part-- \)
The --weird part-- part just means; ( allow any character ., or | any section (.*) to exist as many times as it wants )*. The only issue is, you can't match any character ., you have to use [^()] to exclude the parenthesis.
(([^()])|(\([^()]*\)))*

How do I find a match which has already been captured by another match?

How can I replace all occurrences of matches in a string if some parts have already been captured:
E.g. Given the pattern "AB|BC" and the target "ABC" we match "AB" but not "BC"
I've been trying to understand the various regex grouping options (Grouping Constructs in Regular Expressions) without success. I'm probably barking up the wrong tree. :-(
var test = Regex.Replace("(AB)(BC)(AC)(ABC)", #"AB|BC", string.Empty);
In the example, test evaluates to "()()(AC)(C)", but what I actually want is "()()(AC)()"
Without taking care of the parenthesis, you cou use and alternation with an optional character using the question mark.
Match AB with an optional C or Match an optional A followed by BC. In the replacement use an empty string.
ABC?|A?BC
Regex demo
Including the parenthesis you might use a capturing group or lookarounds to assert what is on the left and on the right are opening and closing parenthesis.
(?<=\()(?:ABC?|A?BC)(?=\))
Explanation
(?<=\() Assert what is on the left is (
(?: Non capturing group
ABC? Match AB with optional C
-| Or
A?BC Match optional A and BC
) Close non capturing group
(?=\)) Assert what is on the right is )
Regex demo
In order to consume the overlaps buddy, it has to be matched.
Therefore, one side of the alternation has to include its buddies last
or first literal (doesn't have to be both).
AB|BC ~ ABC?|BC = A?BC|AB

regex for javascript regular expressions

I need to parse some JavaScript code in C# and find the regular expressions in that.
When the regular expressions are created using RegExp, I am able to find. (Since the expression is enclosed in quotes.) When it comes to inline definition, something like:
var x = /as\/df/;
I am facing difficulty in matching the pattern. I need to start at a /, exclude all chars until a / is found but should ignore \/.
I may not relay on the end of statement (;) because of Automatic Semicolon Insertion or the regex may be part of other statement, something like:
foo(/xxx/); //assume function takes regex param
If I am right, a line break is not allowed within the inline regex in JavaScript to save my day. However, there the following is allowed:
var a=/regex1def/;var b=/regex2def/;
foo(/xxx/,/yyy/)
I need regular expression someting like /.*/ that captures right data.
You cannot reliably parse programming languages with regular expressions only. Especially Javascript, because its grammar is quite ambiguous. Consider:
a = a /b/ 1
foo = /*bar*/ + 1
a /= 5 //.*/hi
This code is valid Javascript, but none of /.../'s here are regular expressions.
In case you know what you're doing ;), an expression for matching escaped strings is "delimiter, (something escaped or not delimiter), delimiter":
delim ( \\. | [^delim] ) * delim
where delim is / in your case.
After several trials with RegexHero, this seems working. /.*?[^\\]/. But not sure if I am missing any corner case.
How about this:
Regex regexObj = new Regex(#"/(?:\\/|[^/])*/");
Explanation:
/ # Match /
(?: # Non-capturing group:
\\ # Either match \
/ # followed by /
| # or
[^/] # match any character except /
)* # Repeat any number of times
/ # Match /
I think that this may help you
var patt=/pattern/modifiers;
•pattern specifies the pattern of an expression
•modifiers specify if a search should be global, case-sensitive, etc.

Categories

Resources