C# Regex: Get group names? - c#

myRegex.GetGroupNames()
Seems to return the numbered groups as well... how do I get only the named ones?
A solution using the actual Match object would be fine as well.

Does using the RegexOptions.ExplicitCapture option when creating the regex do what you want ? e.g.
Regex theRegex = new Regex(#"\b(?<word>\w+)\b", RegexOptions.ExplicitCapture);
From MSDN:
Specifies that the only valid captures
are explicitly named or numbered
groups of the form (?<name>...). This
allows unnamed parentheses to act as
noncapturing groups without the
syntactic clumsiness of the expression
(?:...).
So you won't need to worry about whether users of your API may or may not use non-capturing groups if this option is set.

See the other comments/answers about using (?:) and/or sticking with "one style". Here is my best approach that tries to directly solve the question:
var named = regex.GetGroupNames().Where(x => !Regex.IsMatch(x, "^\\d+$"));
However, this will fail for regular expressions like (?<42>...).
Happy coding.

public string[] GetGroupNames(Regex re)
{
var groupNames = re.GetGroupNames();
var groupNumbers = re.GetGroupNumbers();
Contract.Assert(groupNames.Length == groupNumbers.Length);
return Enumerable.Range(0, groupNames.Length)
.Where(i => groupNames[i] != groupNumbers[i].ToString())
.Select(i => groupNames[i])
.ToArray();
}
Actually, this will still fail when the group name and number happen to be the same :\ But it will succeed even when the group name is a number, as long as the number is not the same as its index.

Related

c# Regex named groups access non existent group name [duplicate]

In my regex the pattern is something like this:
#"Something\(\d+, ""(.+)""(, .{1,5}, \d+, (?<somename>\d+)?\),"
So I would like to know if <somename> exists. If it was a normal capture group, I could just check if the capture groups are greater than the number of groups without that/those capture group(s), but I don't have the option here.
Could anyone help me find a way round this? I don't need it to be efficient, it's just for a one-time program that's used for sorting, so I don't mind if it takes a bit to run. It's not going to be for public code.
According to the documentation:
If groupname is not the name of a capturing group in the collection,
or if groupname is the name of a capturing group that has not been
matched in the input string, the method returns a Group object whose
Group.Success property is false and whose Group.Value property is
String.Empty.
var regex = new Regex(#"Something\(\d+, ""(.+)""(, .{1,5}, \d+, (?<somename>\d+)?\),");
var match = regex.Match(input);
var group = match.Groups["somename"];
bool exists = group.Success;

C# LINQ Contains method with two clauses

What is the best practice way to use two clauses in LINQ Contains method..
Title is string
This is my If statement :
if (oWeb.ParentWeb.Title.Contains("Ricky") || oWeb.ParentWeb.Title.Contains("John"))
I need solution like this :
if (oWeb.ParentWeb.Title.Contains("Ricky", "John"))
Since Title is string this actually has nothing to do with LINQ as the used Contains method is an instance method of string.
Assuming you have more strings to check, you can do something like that:
var strings = new[] {"Ricky", "John"};
if (strings.Any(oWeb.ParentWeb.Title.Contains))
// do something
But roryap's answer using a regex seems preferable (as long as the number of strings to check is not too big).
I don't think LINQ is the best option for that. Why not use regular expressions?
Regex.IsMatch(oWeb.ParentWeb.Title, "ricky|john", RegexOptions.IgnoreCase);
Contains takes only one parameter, so you cannot do it this way.
You can make an array of items, and check containment with it:
var titles = new[] {"Ricky", "John"};
if (titles.Any(t => oWeb.ParentWeb.Title.Contains(t))) {
...
}

How to use css selector with "not" in c#?

I want to find elements by css selector, but I want to skip some of them, how can I do it?
driver.FindElements(By.CssSelector("a[href]"))
but I need not to take href that has logoff and client/contract in it
You probably don't need LINQ for this. You can use the :not() pseudo-class, one for each value you want to exclude (note the *= in each negation; I'm assuming substring matches here):
driver.FindElements(By.CssSelector("a[href]:not([href*='logoff']):not([href*='client'])"))
An exception is if you have a dynamic list of values that you want to exclude: you can either build your selector programmatically using string.Join() or a loop and then pass it to By.CssSelector(), or you can follow Richard Schneider's answer using LINQ to keep things clean.
After using CSS to get the elements then use LINQ to filter on the href.
Using System.Linq;
string[] blackList = new string[] { "logoff, "client",... };
string[] urls = driver
.FindElements(By.CssSelector("a[href]"))
.Select(e => e.GetAttribute("href"))
.Where(url => !blackList.Any(b => url.Contains(b)))
.ToArray();
Checking the URL is case-sensitive, because that is what W3C states. You may want to change this.

RegEx: Correct usage of lookbehind assertion and group definitions

I have the following string:
i:0#.w|domain\x123456
I know about the possibility to group searchterms by using <mysearchterm> and calling it via RegEx.Match(myRegEx).Result("${mysearchtermin}");.
I also know that I can lookbehind assertions like (?<= subexpression) via MSDN. Could someone help me in geting the (including the possibility to search for them via groups as shown before):
domain ("domain")
user account ("x12345")
I don't need anything from before the pipe character (nor the pipe character itself) - so basically I am interested in domain\x123456.
As others have noted, this can be done without regex, or without lookbehinds. That being said, I can think of reasons you might want them: to write a RegexValidator instead of having to roll up a CustomValidator, for example. In ASP.NET, CustomValidators can be a little longer to write, and sometimes a RegexValidator does the job just fine.
As far as lookbehinds, the main reason you'd want one for something like this is if the target string could contain irrelevant copies of the |domain\x123456 pattern:
foo#bar|domain\x999999 says: 'i:0#.w|domain\x888888i:0#.w|domain\x123456|domain\x000000'
If you only wanted to grab domain\x888888 and domain\x123456 out of that, a lookbehind could be useful. Or maybe you just want to learn about lookbehinds. Anyway, since we only have one sample input, I can only guess at the rules; so perhaps something like this:
#"(?<=[a-z]:\d#\.[a-z]\|)(?<domain>[^\\]*)\\(?<user>x\d+)"
Lookarounds are one of the most subtle and misunderstood features of regex, IMHO. I've gotten a lot of use out of them in preventing false positives, or in limiting the length of matches when I'm not trying to match the entire string (for example, if I want only the 3-digit numbers in blah 1234 123 1234567 123 foo, I can use (?<!\d)\d{3}(?!\d)). Here's a good reference if you want to learn more about named groups and lookarounds.
You can just use the regex #"\|([^\\]+)\\(.+)".
The domain and user will be in groups 1 and 2, respectively.
You don't need regular expressions for that.
var myString = #"i:0#.w|domain\x123456";
var relevantParts = myString.Split('|')[1].Split('\\');
var domain = relevantParts[0];
var user = relevantParts[1];
Explanation: String.Split(separator) returns an array of substrings separated by separator.
If you insist of using regular expressions, this is how you do it with named groups and Match.Result, based on SLaks answer (+1, by the way):
var myString = #"i:0#.w|domain\x123456";
var r = new Regex(#"\|(?<domain>[^\\]+)\\(?<user>.+)");
var match = r.Matches(myString)[0]; // get first match
var domain = match.Result("${domain}");
var user = match.Result("${user}");
Personally, however, I would prefer the following syntax, if you are just extracting the values:
var domain = match.Groups["domain"];
var user = match.Groups["user"];
And you really don't need lookbehind assertions here.

File sequence > Find Name Pattern

I'm trying to figure out a solid way to solve multiple types of file sequences.
Consider these sequences
file_0000.jpg
file_0001.jpg
file_0002.jpg etc
&
new1File001.jpg
new1File002.jpg
new1File003.jpg
So it needs to find out where the first decimal of the sequence code starts.
FileInfo[] files = new DirectoryInfo(#"\\fileserver\").GetFiles("*.*", SearchOption.AllDirectories);
var grouped = files.OrderBy(f => f.Name).GroupBy(f => f.Name.Substring(0, f.Name.LastIndexOf("_")));
Obviously this finds file sequences where the sequence numbering is separated by "_". I want it to be filtered by the position of the first decimal of the last decimal sequence. My regex skills are not good and even then I don't know how to use it in the lamba expression.
The main question is, how can I find out where the number string starts for the above mentioned cases.
Any pointers would be great!
Thanks,
-Johan
Yes, regex is to rescue:
var r = new Regex(#".+(\d{2,}).");
var grouped =
files.
OrderBy(f => f.Name).
GroupBy(f => r.Match(f.Name).Groups[0].Value);

Categories

Resources