Regex Extract String - c#

I have a string:
string s = "GameObject.Find(\"MyObj\").GetComponent(\"MyComponent\")";
I want to extract "GameObject.Find(\"MyObj\")" where MyObj can include any number or type of characters except newline.
This is my code:
Match match = Regex.Match(s, "GameObject.Find(\".+\")");
I know I'm doing something wrong, but I'm not sure where to go from here. How can we make this expression work as intended?

Match match = Regex.Match(s, "GameObject.Find(\".+?\")");
You should do non-greedy search, but beware that it will only match from parantheses+quotation mark to first quotation mark+parantheses.
So for,
string s = "GameObject.Find(\"seckin(\\\"hand\\\").thumb()\").GetComponent(\"MyComponent\")"
it will match "GameObject.Find(\"seckin(\\\"hand\\\")"
But there is no way to match enclosing parenthesis using RegExp, so it is the best sub-optimal solution.

Maybe you should try :
Match match = Regex.Match(s, "GameObject.Find(\".+?\")");

Related

Regex working in Regexr but not C#, why?

From the below mentioned input string, I want to extract the values specified in {} for s:ds field. I have attached my regex pattern. Now the pattern I used for testing on http://www.regexr.com/ is:
s:ds=\\\"({[\d\w]{8}\-([\d\w]{4}\-){3}[\d\w]{12}})\\\"
and it works absolutely fine.
But the same in C# code does not work. I have also added \\ instead of \ for c# code and replaced \" with \"" . Let me know if Im doing something wrong. Below is the code snippet.
string inputString is "s:ds=\"{46C01EB7-6D43-4E2A-9267-608DE8AFA311}\" s:ds=\"{37BA4BA0-581C-40DC-A542-FFD9E99BC345}\" s:id=\"{C091E71D-4817-49BC-B120-56CE88BC52C2}\"";
string regex = #"s:ds=\\\""({[\d\w]{8}\-(?:[\d\w]{4}\-){3}[\d\w]{12}})\\\""";
MatchCollection matchCollection = Regex.Matches(layoutField, regex);
if (matchCollection.Count > 1)
{
Log.Info("Collection Found.", this);
}
If you only watch to match the values...
You should be able to just use ([\d\w]{8}\-([\d\w]{4}\-){3}[\d\w]{12}) for your expression if you only want to match the withing your gullwing braces :
string input = "s:ds=\"{46C01EB7-6D43-4E2A-9267-608DE8AFA311} ...";
// Use the following expression to just match your GUID values
string regex = #"([\d\w]{8}\-([\d\w]{4}\-){3}[\d\w]{12})";
// Store your matches
MatchCollection matchCollection = Regex.Matches(input, regex);
// Iterate through each one
foreach(var match in matchCollection)
{
// Output the match
Console.WriteLine("Collection Found : {0}", match);
}
You can see a working example of this in action here and example output demonstrated below :
If you want to only match those following s:ds...
If you only want to capture the values for s:ds sections, you could consider appending (?<=(s:ds=""{)) to the front of your expression, which would be a look-behind that would only match values that were preceded by "s:ds={" :
string regex = #"(?<=(s:ds=""{))([\d\w]{8}\-([\d\w]{4}\-){3}[\d\w]{12})";
You can see an example of this approach here and demonstrated below (notice it doesn't match the s:id element :
Another Consideration
Currently you are using \w to match "word" characters within your expression and while this might work for your uses, it will match all digits \d, letters a-zA-z and underscores _. It's unlikely that you would need some of these, so you may want to consider revising your character sets to use just what you would expect like [A-Z\d] to only match uppercase letters and numbers or [0-9A-Fa-f] if you are only expected GUID values (e.g. hex).
Looks like you might be over-escaping.
Give this a shot:
#"s:ds=\""({[\d\w]{8}\-([\d\w]{4}\-){3}[\d\w]{12}})\"""

Regex to parse Active Directory string fails

I have this block of code in C# code-behind:
string input = "CN=L_WDJACK127_WDC_SSIS_USER_CH,OU=ALOSup,OU=Infra,DC=internal, DC=mycompany,DC=com"
string pattern = #"CN\=(.+)\,";
MatchCollection matches = Regex.Matches(input, pattern);
foreach (Match match in matches)
{
Console.WriteLine(match.Groups[1].Value);
}
When I run this, match.Groups[1].Value is equal to
L_WDJACK127_WDC_SSIS_USER_CH,OU=ALOSup,OU=Infra,DC=internal,
DC=mycompany
I need it to be equal to
L_WDJACK127_WDC_SSIS_USER_CH
Can someone please fix my Regex?
Basic Greedy/Lazy quantifier problem:
string pattern = #"CN\=(.+?)\,";
This resource should help as to why: http://www.regular-expressions.info/repeat.html
Basically, the .+ tries to match as many of any character, and at least one of which, as possible before hitting the last comma. By adding a ? to the end of it (.+?) you tell the Regex Engine to match as many of any, and at least one of which, character as possible before you hit the first comma.

Regex to match a word beginning with a period and ending with an underscore?

I'm quite the Regex novice, but I have a series of strings similar to this "[$myVar.myVar_STATE]" I need to replace the 2nd myVar that begins with a period and ends with an underscore. I need it to match it exactly, as sometimes I'll have "[$myVar.myVar_moreMyVar_STATE]" and in that case I wouldn't want to replace anything.
I've tried things like "\b.myVar_\b", "\.\bmyVar_\b" and several more, all to no luck.
How about this:
\[\$myVar\.([^_]+)_STATE\]
Matches:
[$myVar.myVar_STATE] // matches and captures 'myvar'
[$myVar.myVar_moreMyVar_STATE] // no match
Working regex example:
http://regex101.com/r/yM9jQ3
Or if _STATE was variable, you could use this: (as long as the text in the STATE part does not have underscores in it.)
\[\$myVar\.([^_]+)_[^_]+\]
Working regex example:
http://regex101.com/r/kW8oE1
Edit: Conforming to OP's comments below, This should be what he's going for:
(\[\$myVar\.)([^_]+)(_[^_]+\])
Regex replace example:
http://regex101.com/r/pU6yL8
C#
var pattern = #"(\[\$myVar\.)([^_]+)(_[^_]+\])";
var replaced = Regex.Replace(input, pattern, "$1"+ newVar + "$3")
What about something like:
.*.(myVar_).*
This looks for anything then a . and "myVar_" followed by anything.
It matches:
"[$myVar.myVar_STATE]"
And only the first myVar_ here:
"[$myVar.myVar_moremyVar_STATE]"
See it in action.
This should do it:
\[\$myVar\.(.*?)_STATE\]
You can use this little trick to pick out the groups, and build the replacement at the end, like so:
var replacement = "something";
var input = #"[$myVar.myVar_STATE]";
var pattern = #"(\[\$myVar\.)(.*?)_(.*?)]";
var replaced = Regex.Replace(input, pattern, "$1"+ replacement + "_$2]")
C# already has builtin method to do this
string text = ".asda_";
Response.Write((text.StartsWith(".") && text.EndsWith("_")));
Is Regex really required?
string input = "[$myVar.myVar_STATE]";
string oldVar = "myVar";
string newVar = "myNewVar";
string result = input.Replace("." + oldVar + "_STATE]", "." + newVar + "_STATE]");
In case "STATE" is a variable part, then we'll need to use Regex. The easiest way is to use this Regex pattern which matches a position between a prefix and a suffix. Prefix and suffix are used for searching but are not included in the resulting match:
(?<=prefix)find(?=suffix)
result =
Regex.Replace(input, #"(?<=\.)" + Regex.Escape(oldVar) + "(?=_[A-Z]+])", newVar);
Explanation:
The prefix part is \., which stand for ".".
The find part is the escaped old variable to be replaced. Regex escaping makes sure that characters with a special meaning in Regex are escaped.
The suffix part is _[A-Z]+], an underscore followed by at least one letter followed by "]". Note: the second ] needs not to be escaped. An opening bracket [ would have to be escaped like this: \[. We cannot use \w for word characters for the STATE-part as \w includes underscores. You might have to adapt the [A-Z] part to exactly match all possible states (e.g. if state has digits, use [A-Z0-9].

Regex goes wrong - which characters need escaping?

I want to extract an RTMP link from a website and has so far managed to find the line where it's located:
string line = GetLine(innerHTML, "turbo:");
// The string line now contains something like this:
// turbo: 'rtmp://fcs21-1.somewebsite.com/reflect/2996910732;0',
Match match = Regex.Match(line, #"turbo: '(rtmp://[*]+);0',$",
RegexOptions.IgnoreCase);
string key;
if (match.Success)
key = match.Groups[1].Value;
There aren't any matches. What I would like to extract from this line:
turbo: 'rtmp://fcs21-1.somewebsite.com/reflect/2996910732;0',
is this piece:
rtmp://fcs21-1.somewebsite.com/reflect/2996910732
What am I missing in the Regex?
Your character class - [*] matches just a *, with quantifier +, it matches 1 or more *, nothing else. Clearly it won't match your string.
I guess you meant to use .* instead, which matches 0 or more occurrences of any character but \n.
Try changing your regex to:
"turbo: '(rtmp://.*);0',$"
or even better, given your text, and what you want to extract, you can simply use:
"turbo: '([^;]*);0',$
[*] matches only *. To match any character, prefer .. (Re
(actually, . fails to match a newline. If a newline may appear, prefer something to the effect of (.|\n) -- note that the backslash will need to be escaped.)
Try this:
Match match = Regex.Match(line, #"^turbo: '(rtmp://[^;]+);0',$", RegexOptions.IgnoreCase);
This will take into account the start of the string with the ^ symbol, and the matching selection will match anything that isn't a ; all the way up to an actual ;.

Regex.Match() won't match a substring

This is something simple but I cannot figure this out. I want to find a substring with this regex. It will mach "M4N 3M5", but doesn't match the below :
const string text = "asdf M4N 3M5 adsf";
Regex regex = new Regex(#"^[ABCEGHJKLMNPRSTVXY]{1}\d{1}[A-Z]{1} *\d{1}[A-Z]{1}\d{1}$", RegexOptions.None);
Match match = regex.Match(text);
string value = match.Value;
Try removing ^ and $:
Regex regex = new Regex(#"[ABCEGHJKLMNPRSTVXY]{1}\d{1}[A-Z]{1} *\d{1}[A-Z]{1}\d{1}", RegexOptions.None);
^ : The match must start at the beginning of the string or line.
$ : The match must occur at the end of the string or before \n at the
end of the line or string.
If you want to match only in word boundaries you can use \b as suggested by Mike Strobel:
Regex regex = new Regex(#"\b[ABCEGHJKLMNPRSTVXY]{1}\d{1}[A-Z]{1} *\d{1}[A-Z]{1}\d{1}\b", RegexOptions.None);
I know this question has been answered but i have noticed two thing in your pattern which i want to highlight:
No need to mention the single instance of any token.
For example: (Notice the missing {1})
\d{1} = \d
[A-Z]{1} = [A-Z]
Also I won't recommend you to enter a <space>in your pattern use '\s' instead because if mistakenly a backspace is pressed you might not
be able to figure out the mistake and running code will stop
working.
Personally, for this case i would recommend you to use \b since it is best fit here.

Categories

Resources