Regex to parse Active Directory string fails

Regex to parse Active Directory string fails - c#

I have this block of code in C# code-behind:
string input = "CN=L_WDJACK127_WDC_SSIS_USER_CH,OU=ALOSup,OU=Infra,DC=internal, DC=mycompany,DC=com"
string pattern = #"CN\=(.+)\,";
MatchCollection matches = Regex.Matches(input, pattern);
foreach (Match match in matches)
{
Console.WriteLine(match.Groups[1].Value);
}
When I run this, match.Groups[1].Value is equal to
L_WDJACK127_WDC_SSIS_USER_CH,OU=ALOSup,OU=Infra,DC=internal,
DC=mycompany
I need it to be equal to
L_WDJACK127_WDC_SSIS_USER_CH
Can someone please fix my Regex?

Basic Greedy/Lazy quantifier problem:
string pattern = #"CN\=(.+?)\,";
This resource should help as to why: http://www.regular-expressions.info/repeat.html
Basically, the .+ tries to match as many of any character, and at least one of which, as possible before hitting the last comma. By adding a ? to the end of it (.+?) you tell the Regex Engine to match as many of any, and at least one of which, character as possible before you hit the first comma.

Related

Regex working in Regexr but not C#, why?

From the below mentioned input string, I want to extract the values specified in {} for s:ds field. I have attached my regex pattern. Now the pattern I used for testing on http://www.regexr.com/ is:
s:ds=\\\"({[\d\w]{8}\-([\d\w]{4}\-){3}[\d\w]{12}})\\\"
and it works absolutely fine.
But the same in C# code does not work. I have also added \\ instead of \ for c# code and replaced \" with \"" . Let me know if Im doing something wrong. Below is the code snippet.
string inputString is "s:ds=\"{46C01EB7-6D43-4E2A-9267-608DE8AFA311}\" s:ds=\"{37BA4BA0-581C-40DC-A542-FFD9E99BC345}\" s:id=\"{C091E71D-4817-49BC-B120-56CE88BC52C2}\"";
string regex = #"s:ds=\\\""({[\d\w]{8}\-(?:[\d\w]{4}\-){3}[\d\w]{12}})\\\""";
MatchCollection matchCollection = Regex.Matches(layoutField, regex);
if (matchCollection.Count > 1)
{
Log.Info("Collection Found.", this);
}

If you only watch to match the values...
You should be able to just use ([\d\w]{8}\-([\d\w]{4}\-){3}[\d\w]{12}) for your expression if you only want to match the withing your gullwing braces :
string input = "s:ds=\"{46C01EB7-6D43-4E2A-9267-608DE8AFA311} ...";
// Use the following expression to just match your GUID values
string regex = #"([\d\w]{8}\-([\d\w]{4}\-){3}[\d\w]{12})";
// Store your matches
MatchCollection matchCollection = Regex.Matches(input, regex);
// Iterate through each one
foreach(var match in matchCollection)
{
// Output the match
Console.WriteLine("Collection Found : {0}", match);
}
You can see a working example of this in action here and example output demonstrated below :
If you want to only match those following s:ds...
If you only want to capture the values for s:ds sections, you could consider appending (?<=(s:ds=""{)) to the front of your expression, which would be a look-behind that would only match values that were preceded by "s:ds={" :
string regex = #"(?<=(s:ds=""{))([\d\w]{8}\-([\d\w]{4}\-){3}[\d\w]{12})";
You can see an example of this approach here and demonstrated below (notice it doesn't match the s:id element :
Another Consideration
Currently you are using \w to match "word" characters within your expression and while this might work for your uses, it will match all digits \d, letters a-zA-z and underscores _. It's unlikely that you would need some of these, so you may want to consider revising your character sets to use just what you would expect like [A-Z\d] to only match uppercase letters and numbers or [0-9A-Fa-f] if you are only expected GUID values (e.g. hex).

Looks like you might be over-escaping.
Give this a shot:
#"s:ds=\""({[\d\w]{8}\-([\d\w]{4}\-){3}[\d\w]{12}})\"""

Lookbehind with equal sign

I want to match
===Something===
but not
====Something====
I've come up with the following regular expression
Regex.Match("====Something====", #"^\s*===\s*(?<!=====\s*)(?<Title>.*?)\s*===\s*$").Groups["Title"]
but it returns
=Something=
please help what's the issue with the lookbehind pattern.

Match for the full word! the angle brackets are all important. The below expression translated - if we are talking to the computer is like this: computer, search for a word starting with with three = signs then have any number of letters then end the word with three equals signs.
Hence if 4 equals signs are there at the start of the word - it won't match.
string regExpression = #"<={3}(\w+)={3}>";
static void Main(string[] args)
{
// searches for the first specified instance.
string textToSearchThrough = "===Something===";
string textToSearchThrough2 = "====Something====";
// add in \s+ to the below if you wish
string regexExpression = #"<={3}(\w+)={3}>";
Regex r = new Regex(regexExpression);
// change the text to search through to the second variable textToSearchThrough2 if you wish to check
Match m = r.Match(textToSearchThrough);
Console.WriteLine(m.Success.ToString());
Console.ReadLine();
}

One more possible solution:
(?<!=)===(?!=)(?<Title>.*?)(?<!=)===(?!=)

Your regex works wrong because you use .*? which can also match =. So it looks for === then accepts anything (other = also), and look for a match which will end with === again. So it will match also === in ========= string, and it is not what you are looking for. However if you change . (match any character) on \w (match word character). Also it would be better to use \w+ insted \w* to avoid maching only ====== without any word (if you don't want to) it should work nad match only ===Something=== even without lookbehind, like:
^\s*===\s*(?<Title>\w+?)\s*===\s*$
Try it HERE.

Regex goes wrong - which characters need escaping?

I want to extract an RTMP link from a website and has so far managed to find the line where it's located:
string line = GetLine(innerHTML, "turbo:");
// The string line now contains something like this:
// turbo: 'rtmp://fcs21-1.somewebsite.com/reflect/2996910732;0',
Match match = Regex.Match(line, #"turbo: '(rtmp://[*]+);0',$",
RegexOptions.IgnoreCase);
string key;
if (match.Success)
key = match.Groups[1].Value;
There aren't any matches. What I would like to extract from this line:
turbo: 'rtmp://fcs21-1.somewebsite.com/reflect/2996910732;0',
is this piece:
rtmp://fcs21-1.somewebsite.com/reflect/2996910732
What am I missing in the Regex?

Your character class - [*] matches just a *, with quantifier +, it matches 1 or more *, nothing else. Clearly it won't match your string.
I guess you meant to use .* instead, which matches 0 or more occurrences of any character but \n.
Try changing your regex to:
"turbo: '(rtmp://.*);0',$"
or even better, given your text, and what you want to extract, you can simply use:
"turbo: '([^;]*);0',$

[*] matches only *. To match any character, prefer .. (Re
(actually, . fails to match a newline. If a newline may appear, prefer something to the effect of (.|\n) -- note that the backslash will need to be escaped.)

Try this:
Match match = Regex.Match(line, #"^turbo: '(rtmp://[^;]+);0',$", RegexOptions.IgnoreCase);
This will take into account the start of the string with the ^ symbol, and the matching selection will match anything that isn't a ; all the way up to an actual ;.

Regex.Match() won't match a substring

This is something simple but I cannot figure this out. I want to find a substring with this regex. It will mach "M4N 3M5", but doesn't match the below :
const string text = "asdf M4N 3M5 adsf";
Regex regex = new Regex(#"^[ABCEGHJKLMNPRSTVXY]{1}\d{1}[A-Z]{1} *\d{1}[A-Z]{1}\d{1}$", RegexOptions.None);
Match match = regex.Match(text);
string value = match.Value;

Try removing ^ and $:
Regex regex = new Regex(#"[ABCEGHJKLMNPRSTVXY]{1}\d{1}[A-Z]{1} *\d{1}[A-Z]{1}\d{1}", RegexOptions.None);
^ : The match must start at the beginning of the string or line.
$ : The match must occur at the end of the string or before \n at the
end of the line or string.
If you want to match only in word boundaries you can use \b as suggested by Mike Strobel:
Regex regex = new Regex(#"\b[ABCEGHJKLMNPRSTVXY]{1}\d{1}[A-Z]{1} *\d{1}[A-Z]{1}\d{1}\b", RegexOptions.None);

I know this question has been answered but i have noticed two thing in your pattern which i want to highlight:
No need to mention the single instance of any token.
For example: (Notice the missing {1})
\d{1} = \d
[A-Z]{1} = [A-Z]
Also I won't recommend you to enter a <space>in your pattern use '\s' instead because if mistakenly a backspace is pressed you might not
be able to figure out the mistake and running code will stop
working.
Personally, for this case i would recommend you to use \b since it is best fit here.

Regex Extract String

I have a string:
string s = "GameObject.Find(\"MyObj\").GetComponent(\"MyComponent\")";
I want to extract "GameObject.Find(\"MyObj\")" where MyObj can include any number or type of characters except newline.
This is my code:
Match match = Regex.Match(s, "GameObject.Find(\".+\")");
I know I'm doing something wrong, but I'm not sure where to go from here. How can we make this expression work as intended?

Match match = Regex.Match(s, "GameObject.Find(\".+?\")");
You should do non-greedy search, but beware that it will only match from parantheses+quotation mark to first quotation mark+parantheses.
So for,
string s = "GameObject.Find(\"seckin(\\\"hand\\\").thumb()\").GetComponent(\"MyComponent\")"
it will match "GameObject.Find(\"seckin(\\\"hand\\\")"
But there is no way to match enclosing parenthesis using RegExp, so it is the best sub-optimal solution.

Maybe you should try :
Match match = Regex.Match(s, "GameObject.Find(\".+?\")");

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex to parse Active Directory string fails - c#

Related

Regex working in Regexr but not C#, why?

Lookbehind with equal sign

Regex goes wrong - which characters need escaping?

Regex.Match() won't match a substring

Regex Extract String

Categories

Resources