Regular expressions in C# for extracting parts

Regular expressions in C# for extracting parts - c#

I have this text:
" </SYM field/NN name=/IN ""/"" object/NN ""/"" >/SYM Categories/NNS :/: Cars/NNS ,/, About/RB Model/NNP :/: "
I would like to extract values such as
Categories/NNS :/: Cars/NNS ,/, About/RB
where the pattern is
WORD + /NNS + :/: ANYTHING until you reach the same pattern
I tried:
Match match = Regex.Match(input, #"([A-Za-z0-9\-]+)/NNS :/: ([A-Za-z0-9\-/s]+)",
RegexOptions.IgnoreCase);
if (match.Success)
{
string key = match.Groups[1].Value;
Console.WriteLine(key);
}
and the answer I got back was:
Categories
instead of
Categories/NNS :/: Cars/NNS ,/, About/RB
What I am doing wrong?

You need to enclose the bits of the regex you want as result inside parenthesis.
To obtain what you're looking for, you need to replace your regexp by (not tested, moreover I don't know C# regex specifics but the below should be OK):
"((?:[A-Za-z0-9\-]+)/NNS :/: (?:[A-Za-z0-9\-/s]+))"
The main parenthesis mean that you'll get the entire string as result.
The opening parenthesis followed by ?: mean that you don't want that part in the result.
If you would not put the ?:, it would result in a tuple with your entire string, then the string matching the first sub-regex, then the string matching the second sub-regex.

Why don't you use match.Value? Everything you put in parenthesis represents a group, but it looks like you want the whole thing.
Match match = Regex.Match(input, #"([A-Za-z0-9\-]+)/NNS :/: ([A-Za-z0-9\-/s]+)",
RegexOptions.IgnoreCase);
if (match.Success)
{
string key = match.Value;
Console.WriteLine(key);
}

Related

Regex to find next word (which contains special character) after given word

I am facing problem with writing REGEX to get desired output from a string.
I have a string like string simpleInput = #"Website address www.yahoo[mail].com AND Following is the";
I want to specify "address" word and in result want the next word after it, i.e."www.yahoo[mail].com"
I have written following piece of code.
string pattern = #"address (?<after>\w+)";
MatchCollection matches = Regex.Matches(simpleInput, pattern, RegexOptions.Multiline | RegexOptions.IgnoreCase);
string nextWord = string.Empty;
foreach (Match match in matches)
{
nextWord = match.Groups["after"].ToString();
}
Console.WriteLine("Word is: " + nextWord );
This gives me output as:
Word is: www
Where as I expect output to be www.yahoo[mail].com
Can anyone please help?
I tried with \D+, that gives me entire string.. till the end of string, so gives additional text like "AND Following is the" also comes in result.
Where as I just wanted the single word "www.yahoo[mail].com"

\w+ doesn't match . or some other characters in the string you want to match. Try using \S+ instead which means non-space characters:
string pattern = #"address (\S+)";

Improve RegEx search

Using DirectoryServices.AccountManagement I'm getting users DistinguishedName which looks like so:
CN=Adam West,OU=STORE,OU=COMPANY,DC=mycompany,DC=group,DC=eu
I need to get first OU value from this.
I found similar solution: C# Extracting a name from a string
And using some tweaks I created this code:
string input = #"CN=Adam West,OU=STORE,OU=COMPANY,DC=mycompany,DC=group,DC=eu";
Match m = Regex.Match(input, #"OU=([a-zA-Z\\]+)\,.*$");
Console.WriteLine(m.Groups[1].Value);
This code returns STORE as expected, but if I change Groups[1] to Groups[0] I get almost same result as input string:
OU=STORE,OU=COMPANY,DC=mycompany,DC=group,DC=eu
How can I change this regex so it will return only values of OU? SO that in this example I get array of 2 matches. If I would have more OU in my string then array would be longer.
EDIT:
I've converted my code (using #dasblinkenlight suggestions) into function:
private static List<string> GetOUs()
{
var input = #"CN=Adam West,OU=STORE,OU=COMPANY,DC=mycompany,DC=group,DC=eu";
var mm = Regex.Matches(input, #"OU=([a-zA-Z\\]+)");
return (from Match m in mm select m.Groups[1].Value).ToList();
}
Is that correct?

Your regular expression is fine (almost), you are just using a wrong API.
Remove the parts of the regexp that match up to the ending anchor $, and change the call of Match for a call of Matches, and get the matches in a loop, like this:
var input = #"CN=Adam West,OU=STORE,OU=COMPANY,DC=mycompany,DC=group,DC=eu";
var mm = Regex.Matches(input, #"OU=([a-zA-Z\\]+)");
foreach (Match m in mm)
Console.WriteLine(m.Groups[1].Value);
}

Your existing regex:
#"OU=([a-zA-Z\\]+)\,.*$"
Matches OU=, then some letters and backslashes ([a-zA-Z\\]+), then a comma, then any characters (.*) to the end of the line ($).
Thus a single match will always match the entire line after the first OU section.
Modify your regex by removing the ,.*$ at the end, at it will match each OU group:
#"OU=([a-zA-Z\\]+)"
Also note that the parentheses are a capturing group. They are useful if you also want to capture just the value part by itself, but if you are not using that, they are not necessary, and you can just have this:
#"OU=[a-zA-Z\\]+"

It's beacuse you are mixing up matches and groups
string input = #"CN=Adam West,OU=STORE,OU=COMPANY,DC=mycompany,DC=group,DC=eu";
MatchCollection mc = Regex.Matches(input, #"OU=([a-zA-Z\\]+),");
foreach(Match m in mc)
{
Console.WriteLine(m.Result("$1"));
}

Group[0] returns the full match:
Group[1] returns the first Pattern in the match [i.e. everything in the first parenthesis '(' ')' ]
So if you wanted to get exactly those 2 occurances of OU.. you could do this:
Match m = Regex.Match(input, #"OU=([a-zA-Z\\]+)\,OU=([a-zA-Z\\]+)\,.*$");
Console.WriteLine(m.Groups[1].Value);
Console.WriteLine(m.Groups[2].Value);
Group[0] returns the full match: (which you don't want)
Group[1] returns the first Pattern in the match [i.e everything in the first parenthesis '(' ')' ]
Group[2] returns the second Pattern in the match [i.e. everything in the second parenthesis '(' ')' ]
Giving:
STORE
COMPANY
But I'm assuming you don't want to be so explicit with your Regex for each Pattern you are interested in.
If you want to get multiple matches, then you need to do Regex's Matches call that returns a Matchcollection.
MatchCollection ms = Regex.Matches(...);
This still won't work with your current Regex though, because everything from STORE so the end of the line will be in the first match. If you only want to get the pattern "1-or-more-letters" after a "OU="
You only need:
#"OU=([a-zA-Z\\]+)"
So your code would be:
string input = #"CN=Adam West,OU=STORE,OU=COMPANY,DC=mycompany,DC=group,DC=eu";
MatchCollection ms = Regex.Matches(input, #"OU=([a-zA-Z\\]+)");
foreach (Match m in ms)
{
Console.WriteLine(m.Groups[1].Value);// get the string in the first "(" ")"
}

Pattern Matching c#

Lets say I have a text file with the line below within it. I want to take both values within the quotations by matching between (" and "), so that would be I retreive ABC and DEF and put them in a string list or something, what's the best way of doing this? It's so annoying
If EXAMPLEA("ABC") AND EXAMPLEB("DEF")

Assuming a case where the value between the double quotes can not contain escaped double quotes might work like this:
var text = "If EXAMPLEA(\"ABC\") AND EXAMPLEB(\"DEF\")";
Regex pattern = new Regex("\"[^\"]*\"");
foreach (Match match in pattern.Matches(text))
{
Console.WriteLine(match.Value.Trim('"'));
}
But this is only one of the many ways you could do it and maybe not the smartest way out there. Try something yourself!

Best way...
List<string> matches=Regex.Matches(File.ReadAllText(yourPath),"(?<="")[^""]*(?="")")
.Cast<Match>()
.Select(x=>x.Value)
.ToList();

This pattern should do the trick:
\"([^"]*)\"
string str = "If EXAMPLEA(\"ABC\") AND EXAMPLEB(\"DEF\")";
MatchCollection matched = Regex.Matches(str, #"\""([^\""]*)\""");
foreach (Match match in matched)
{
Console.WriteLine(match.Groups[1].Value);
}
Note that the quotation marks are doubled in the actual code in order to escape them. And the code refers to group [1] to get just the part inside the parentheses.

IEnumerable<string> matches =
from Match match
in Regex.Matches(File.ReadAllText(filepath), #"\""([^\""]*)\""")
select match.Groups[1].Value;
Others already posted some answers, but my takes into account that you just want ABC and DEF in your example, without quotation marks and save it in a IEnumerable<string>.

Regular expression to get url collection from string

I have a string.An example is given below.
[playlist]\r\npath1=url1\r\npath2=url2\r\npath=url3\r\npath4=url4\r\ncount=1
How can I extract path properties values from the above string.There may be many properties other than path properties.
Thr result i am expecting is
url1
url2
url3
url4
I think regular expression is best to do this. Any ideas(regular expressions) regarding the Rgular expression needed. How about using string.split method.. Which one is efficient? ..
Thanks in advance

Well, this regex works in your particular example:
path\d?=(.+?)\\r\\n
What isn't immediately obvious is if \r\n in your strings are literally the characters \r\n, or a carriage return + new line. The regex above matches those characters literally. If your text is actually this:
[playlist]
path1=url1
path2=url2
path=url3
path4=url4
count=1
Then this regex will work:
path\d?=(.+?)\n
And a quick example of how to use that in C#:
var str = #"[playlist]\r\npath1=url1\r\npath2=url2\r\npath=url3\r\npath4=url4\r\ncount=1";
var matches = Regex.Matches(str, #"path\d?=(.+?)\\r\\n");
foreach (Match match in matches)
{
var path = match.Groups[1].Value;
Console.WriteLine(path);
}

Regex to check a string

I'm trying to check a string and then extract all the variables which starts with #. I can't find the appropriate regular expression to check the string. The string may start with # or " and if it's started with " it should have a matching pair ".
Example 1:
"ip : "+#value1+"."+#value2+"."+#value3+"."+#value4
Example 2:
#nameParameter "#yahoo.com"
Thanks

It would probably be easiest to first split the string on each quoted string, then check the unquoted parts for #'s. For example all quoted strings could be: /"[^"]*"/, calling Regex.Split on your string would return an array of strings of the non-quoted parts, which you could then use the expression /#\w+/ to find any #'s.

Try this:
string text = "#nameParameter \"#yahoo.com\"";
Regex variables = new Regex(#"(?<!"")#\w+", RegexOptions.Compiled);
foreach (Match match in variables.Matches(text))
{
Console.WriteLine(match.Value);
}

To check the strings you have provided in your post:
(^("[^"\r\n]"\s+#[\w.]+\s*+?)+)|(((^#[\w.]+)|("#[\w.]+"))\s*)+

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regular expressions in C# for extracting parts - c#

Related

Regex to find next word (which contains special character) after given word

Improve RegEx search

Pattern Matching c#

Regular expression to get url collection from string

Regex to check a string

Categories

Resources