Say you have a string:
string s = "GameObject.Find(\"obj\").GetComponent(\"comp\").GetMethod(\"method\").Get...";
The string can have any number of GetX() methods appended to it.
And you need to separate each method without the "." separator. Although, GameObject.Find can keep the (dot).
Here is my code so far :
Match match = Regex.Match(s, "(.+?\\(\".+?\"\\))(?:\\.??)*");
This produces only one group. What is the correct solution to this problem?
Edit :
Updated with non-capturing group.
First I'd recommend using verbatim string literals for writing regular expressions in C#. This cuts down the number of backslashes you need to write.
#"(.+?\("".+?""\)\.??)*"
To get all the captures, inspect Match.Captures.
See it working online: ideone
Related
I am writing a regular expression in Visual Studio 2013 using C#
I have the following scenario:
Match match = Regex.Match("%%Text%%More text%%More more text", "(?<!^)%%[^%]+%%");
But my problem is that I don't want to capture groups. The reason is that with capture groups match.Value contains %%More text%% and my idea is the get on match.Value directly the string: More text
The string to get will be always between the second and the third group of %%
Another approach is that the string will be always between the fourth and fifth %
I tried:
Regex.Match("%%Text%%More text%%More more text", "(?:(?<!^)%%[^%]+%%)");
But with no luck.
I want to use match.Value because all my regex are in a database table.
Is there a way to "transform" that regex to one not using capturing groups and the in match.value the desired string?
If you are sure you have no %s inside double %%s, you can just use lookarounds like this:
(?<=^%%[^%]*%%)[^%]+(?=%%)
^^^^^^^^^^^^^^ ^^^^^
If you have single-% delimited strings (like %text1%text2%text3%text4%text5%text6, see demo):
(?<=^%[^%]*%)[^%]+(?=%)
See regex demo
And in case it is between the 4th and the 5th:
(?<=^%%(?:[^%]*%%){3})[^%]+(?=%%)
^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^
For single-% delimited strings (see demo):
(?<=^%(?:[^%]*%){3})[^%]+(?=%)
See another demo
Both the regexps contain a variable-width lookbehind and the same lookahead to restrict the context the 1 or more characters other than % appears in.
The (?<=^%%[^%]*%%) makes sure the is %%[something_other_then_%]%% right after the beginning of the string, and (?<=^%%(?:[^%]*%%){3}) matches %%[substring_not_having_%]%%[substring_not_having_%]%%[substring_not_having_%]%% after the string start.
In case there can be single % symbols inside the double %%, you can use an unroll-the-loop regex (see demo):
(?<=^%%(?:[^%]*(?:%(?!%)[^%]*)*%%){3})[^%]*(?:%(?!%)[^%]*)*(?=%%)
Which is matching the same stuff that can be matched with (?<=^%%(?:.*?%%){3}).*?(?=%%). For short strings, the .*? based solution should work faster. For very long input texts, use the unrolled version.
Can someone please explain how to get this regex working? I'm trying to take this string:
"Test0/1"
and turn it into:
"Test0\/1"
I'm using this, but it is not working:
var test = Regex.Replace("Test0/1", #"/", #"\/");
It keeps giving me
"Test0\\/1"
Then I want to take the results of the string and put it into a Regex statement like so:
var match = new Regex(test).Match(myString);
So the string 'test' has to be a valid regex statement.
Basically what I'm trying to do is take a list of interfaces off a device, create a regex statement out of them and then use that regex to compare results for other things in my code. Because of the way interfaces are formatted "FastEthernet0/1" for example, it is causing my regex to fail because you have to escape all forward slashes. I have to build this regex on the fly though because every device will have a different set of interfaces.
This is a function of Visual Studio automatically escaping the \ on your behalf. Look at the following question: What's the use/meaning of the # character in variable names in C#?. Removing the # symbol from #"\" turns the string into "\\".
I am trying to use regular expressions to parse a method in the following format from a text:
mvAddSell[value, type1, reference(Moving, 60)]
so using the regular expressions, I am doing the following
tokensizedStrs = Regex.Split(target, "([A-Za-z ]+[\\[ ][A-Za-z0-9 ]+[ ,][A-Za-z0-9 ]+[ ,][A-Za-z0-9 ]+[\\( ][A-Za-z0-9 ]+[, ].+[\\) ][\\] ])");
It is working, but the problem is that it always gives me an empty array at the beginning if the string started with a method in the given format and the same happens if it comes at the end. Also if two methods appeared in the string, it catches only the first one! why is that ?
I think what is causing the parser not to catch two methods is the existance of ".+" in my patern, what I wanted to do is that I want to tell it that there will be a number of a date in that location, so I tell it that there will be a sequence of any chars, is that wrong ?
it woooorked with ,e =D ... I replaced ".+" by ".+?" which meant as few as possible of any number of chars ;)
Your goal is quite unclear to me. What do you want as result? If you split on that method pattern, you will get the part before your pattern and the part after your pattern in an array, but not the method itself.
Answer to your question
To answer your concrete question: your .+ is greedy, that means it will match anything till the last )] (in the same line, . does not match newline characters by default).
You can change this behaviour by adding a ? after the quantifier to make it lazy, then it matches only till the first )].
tokensizedStrs = Regex.Split(target, "([A-Za-z ]+[\\[ ][A-Za-z0-9 ]+[ ,][A-Za-z0-9 ]+[ ,][A-Za-z0-9 ]+[\\( ][A-Za-z0-9 ]+[, ].+?[\\) ][\\] ])");
Problems in your regex
There are several other problems in your regex.
I think you misunderstood character classes, when you write e.g. [\\[ ]. this construct will match either a [ or a space. If you want to allow optional space after the [ (would be logical to me), do it this way: \\[\\s*
Use a verbatim string (with a leading #) to define your regex to avoid excessive escaping.
tokensizedStrs = Regex.Split(target, #"([A-Za-z ]+\[\s*[A-Za-z0-9 ]+\s*,\s*[A-Za-z0-9 ]+\s*,\s*[A-Za-z0-9 ]+\(\s*[A-Za-z0-9 ]+\s*,\s*.+?\)s*\]\s*)");
You can simplify your regex, by avoiding repeating parts
tokensizedStrs = Regex.Split(target, #"([A-Za-z ]+\[\s*[A-Za-z0-9 ]+(?:\s*,\s*[A-Za-z0-9 ]+){2}\(\s*[A-Za-z0-9 ]+\s*,\s*.+?\)s*\]\s*)");
This is an non capturing group (?:\s*,\s*[A-Za-z0-9 ]+){2} repeated two times.
I'm currently facing a (little) blocking issue. I'd like to replace a substring by one another using regular expression. But here is the trick : I suck at regex.
Regex.Replace(contenu, "Request.ServerVariables("*"))",
"ServerVariables('test')");
Basically I'd like to replace whatever is between the " by "test". I tried ".{*}" as a pattern but it doesn't work.
Could you give me some tips, I'd appreciate it!
There are several issues you need to take care of.
You are using special characters in your regex (., parens, quotes) -- you need to escape these with a slash. And you need to escape the slashes with another slash as well because we 're in a C# string literal, unless you prefix the string with # in which case the escaping rules are different.
The expression to match "any number of whatever characters" is .*. In this case, you would want to match any number of non-quote characters, which is [^"]*.
In contrast to (1) above, the replacement string is not a regular expression so you don't want any slashes there.
You need to store the return value of the replace somewhere.
The end result is
var result = Regex.Replace(contenu,
#"Request\.ServerVariables\(""[^""]*""\)",
"Request.ServerVariables('test')");
Based purely on my knowledge of regex (and not how they are done in C#), the pattern you want is probably:
"[^"]*"
ie - match a " then match everything that's not a " then match another "
You may need to escape the double-quotes to make your regex-parser actually match on them... that's what I don't know about C#
Try to avoid where you can the '.*' in regex, you can usually find what you want to get by avoiding other characters, for example [^"]+ not quoted, or ([^)]+) not in parenthesis. So you may just want "([^"]+)" which should give you the whole thing in [0], then in [1] you'll find 'test'.
You could also just replace '"' with '' I think.
Taryn Easts regex includes the *. You should remove it, if it is just a placeholder for any value:
"[^"]"
BTW: You can test this regex with this cool editor: http://rubular.com/r/1MMtJNF3kM
Given $displayHeight = "800";, replace whatever number is at 800 with int value y_res.
resultString = Regex.Replace(
im_cfg_contents,
#"\$displayHeight[\s]*=[\s]*""(.*)"";",
Convert.ToString(y_res));
In Python I'd use re.sub and it would work. In .NET it replaces the whole line, not the matched group.
What is a quick fix?
Building on a couple of the answers already posted. The Zero-width assertion allows you to do a regular expression match without placing those characters in the match. By placing the first part of the string in a group we've separated it from the digits that you want to be replaced. Then by using a zero-width lookbehind assertion in that group we allow the regular expression to proceed as normal but omit the characters in that group in the match. Similarly, we've placed the last part of the string in a group, and used a zero-width lookahead assertion. Grouping Constructs on MSDN shows the groups as well as the assertions.
resultString = Regex.Replace(
im_cfg_contents,
#"(?<=\$displayHeight[\s]*=[\s]*"")(.*)(?="";)",
Convert.ToString(y_res));
Another approach would be to use the following code. The modification to the regular expression is just placing the first part in a group and the last part in a group. Then in the replace string, we add back in the first and third groups. Not quite as nice as the first approach, but not quite as bad as writing out the $displayHeight part. Substitutions on MSDN shows how the $ characters work.
resultString = Regex.Replace(
im_cfg_contents,
#"(\$displayHeight[\s]*=[\s]*"")(.*)("";)",
"${1}" + Convert.ToString(y_res) + "${3}");
Try this:
resultString = Regex.Replace(
im_cfg_contents,
#"\$displayHeight[\s]*=[\s]*""(.*)"";",
#"\$displayHeight = """ + Convert.ToString(y_res) + #""";");
It replaces the whole string because you've matched the whole string - nothing about this statement tells C# to replace just the matched group, it will find and store that matched group sure, but it's still matching the whole string overall.
You can either change your replacer to:
#"\$displayHeight = """ + Convert.ToString(y_res) + #""";"
..or you can change your pattern to just match the digits, i.e.:
#"[0-9]+"
..or you could see if C# regex supports lookarounds (I'm not sure if it does offhand) and change your match accordingly.
You could also try this, though I think it is a little slower than my other method:
resultString = Regex.Replace(
im_cfg_contents,
"(?<=\\$displayHeight[\\s]*=[\\s]*\").*(?=\";)",
Convert.ToString(y_res));
Check this pattern out
(?<=(\$displayHeight\s*=\s*"))\d+(?=";)
A word about "lookaround".