c# Regex named groups access non existent group name [duplicate]

c# Regex named groups access non existent group name [duplicate] - c#

In my regex the pattern is something like this:
#"Something\(\d+, ""(.+)""(, .{1,5}, \d+, (?<somename>\d+)?\),"
So I would like to know if <somename> exists. If it was a normal capture group, I could just check if the capture groups are greater than the number of groups without that/those capture group(s), but I don't have the option here.
Could anyone help me find a way round this? I don't need it to be efficient, it's just for a one-time program that's used for sorting, so I don't mind if it takes a bit to run. It's not going to be for public code.

According to the documentation:
If groupname is not the name of a capturing group in the collection,
or if groupname is the name of a capturing group that has not been
matched in the input string, the method returns a Group object whose
Group.Success property is false and whose Group.Value property is
String.Empty.
var regex = new Regex(#"Something\(\d+, ""(.+)""(, .{1,5}, \d+, (?<somename>\d+)?\),");
var match = regex.Match(input);
var group = match.Groups["somename"];
bool exists = group.Success;

Related

How specific are LINQ .Contains IEnumerable results?

Assuming I have a Users table having several users with Username column values that follow:
Ortund
Richard
Happy McHappyFace
Flapjack
Harvey
Tabitha
Asha
If I query users with .Contains() on my LINQ query based on user input "Happy":
var users = db.Users.Where(x => x.Username.Contains("Happy")).ToList();
Is the IEnumerable going to return every user record having "Ha" in the name (Richard, Happy McHappyFace, Harvey, Tabitha Asha), for example or will it return just the "Happy McHappyFace" user?

Why would it return any match for Ha? You're specifically asking for users whose name contains Happy (in that capitalization, even), so you're only going to get one result here.

The documentation from String.Contains says:
Returns a value indicating whether a specified substring occurs within this string.
So the entire match string (Happy) must match the whole or a part from the input string (Happy McHappyFace). It will only match on Happy, not on H or Ha.
It will translate to where username like '%Happy%', which will match Happy McHappyFace only.
There is a slight catch though (but not in this case): LINQ has a Contains method (specifically Enumerable.Contains) that could match the signature (which would match on a character enumerable, which string happens to be). Since it is an extension method, the method belonging to the class will match first.

how to use c# regex to fetch substring between recognized patterns?

My c# code stores a text.
I want to fetch some words without a known pattern which appear among words with known patterns. I don't want to fetch the words with the patterns.
i.e.
My company! 02-45895438 more details: myDomain.mysite.com
can I fetch like this?
<vendorName?>\\s*\\d{2}-d{6}\\s*more details: <site?>
vendorName = "My company!" or "My company! "
site = "myDomain.mysite.com"
Is there any way to do so with regex?

from your description, it seems like you want to find "myDomain.mysite.com" from the string "My company! 02-45895438 more details: myDomain.mysite.com", if that's the case you can use a regex simmilar to this one to get the string you want
(?<=My company! 02-45895438 more details: ).*?
that should give you the substring based on the preceeding match, but will ommit that from the capture.

You can do this by using parentheses. For example, this will give you the contents of a bold tag:
<b>([^>]+)</b>
You can then use Regex.Match to get a Match object, then get the groups via Match.Groups. Each group is a set of parentheses, so in this case there's one group that contains the tag's content.

THis is the syntax I was looking for:
(?<TheServer>\w*)
like in:
string matchPattern = #"\\\\(?<TheServer>\w*)\\(?<TheService>\w*)\\";
see
http://en.csharp-online.net/CSharp_Regular_Expression_Recipes%E2%80%94Extracting_Groups_from_a_MatchCollection

Extracting Groups and Sub-groups in RegEx

This question is, in a way, continuation of my previously answered question: Getting "Unterminated [] set." Error in C#
I'm using regular expression in C# to extract URLs:
Regex find = new Regex(#"(?<First>[,""]url=)(?<Url>[^\\]+)(?<Last>\\u00)");
Where the text contains URLs in the format:
,url=http://domain.com?itag=25\u0026,url=http://hello.com?itag=11\u0026
I'm getting the entire URL in 'Url' group, but I'd also like to have the itag value in a separate "iTag" group. I know this can be done using sub-groups and I've been trying but can't figure out exactly how to do this.

You already have named groups defined in the Regex. The syntax ?<First> is naming everything within those parenthesis First.
When you match using Regex, using the Groups property to access the GroupCollection and extract a group value by name.
var first = regex.Match(line).Groups["First"].Value;
This will add an additional group for iTag, but retain the full Url. Move it outside the other parenthesis to change this.
(?<First>[,""]url=)(?<Url>[^\?]+?itag=(?<iTag>[0-9]*))(?<Last>\\u0026)
Here's the code.
Regex regex = new Regex("(?<First>[,\"]url=)(?<Url>[^\\?]*\\?itag=(?<iTag>[0-9]*))(?<Last>\\u0026)");
string input = ",url=http://domain.com?itag=25\u0026,url=http://hello.com?itag=11\u0026";
foreach(Match match in regex.Matches(input))
{
System.Console.WriteLine("1. "+match);
System.Console.WriteLine(" 1. "+match.Groups["First"]);
System.Console.WriteLine(" 2. "+match.Groups["Url"]);
System.Console.WriteLine(" 3. "+match.Groups["iTag"]);
System.Console.WriteLine(" 4. "+match.Groups["Last"]);
}
Results:
1. ,url=http://domain.com?itag=25&
1. ,url=
2. http://domain.com?itag=25
3. 25
4. &
1. ,url=http://hello.com?itag=11&
1. ,url=
2. http://hello.com?itag=11
3. 11
4. &

RegEx: Correct usage of lookbehind assertion and group definitions

I have the following string:
i:0#.w|domain\x123456
I know about the possibility to group searchterms by using <mysearchterm> and calling it via RegEx.Match(myRegEx).Result("${mysearchtermin}");.
I also know that I can lookbehind assertions like (?<= subexpression) via MSDN. Could someone help me in geting the (including the possibility to search for them via groups as shown before):
domain ("domain")
user account ("x12345")
I don't need anything from before the pipe character (nor the pipe character itself) - so basically I am interested in domain\x123456.

As others have noted, this can be done without regex, or without lookbehinds. That being said, I can think of reasons you might want them: to write a RegexValidator instead of having to roll up a CustomValidator, for example. In ASP.NET, CustomValidators can be a little longer to write, and sometimes a RegexValidator does the job just fine.
As far as lookbehinds, the main reason you'd want one for something like this is if the target string could contain irrelevant copies of the |domain\x123456 pattern:
foo#bar|domain\x999999 says: 'i:0#.w|domain\x888888i:0#.w|domain\x123456|domain\x000000'
If you only wanted to grab domain\x888888 and domain\x123456 out of that, a lookbehind could be useful. Or maybe you just want to learn about lookbehinds. Anyway, since we only have one sample input, I can only guess at the rules; so perhaps something like this:
#"(?<=[a-z]:\d#\.[a-z]\|)(?<domain>[^\\]*)\\(?<user>x\d+)"
Lookarounds are one of the most subtle and misunderstood features of regex, IMHO. I've gotten a lot of use out of them in preventing false positives, or in limiting the length of matches when I'm not trying to match the entire string (for example, if I want only the 3-digit numbers in blah 1234 123 1234567 123 foo, I can use (?<!\d)\d{3}(?!\d)). Here's a good reference if you want to learn more about named groups and lookarounds.

You can just use the regex #"\|([^\\]+)\\(.+)".
The domain and user will be in groups 1 and 2, respectively.

You don't need regular expressions for that.
var myString = #"i:0#.w|domain\x123456";
var relevantParts = myString.Split('|')[1].Split('\\');
var domain = relevantParts[0];
var user = relevantParts[1];
Explanation: String.Split(separator) returns an array of substrings separated by separator.
If you insist of using regular expressions, this is how you do it with named groups and Match.Result, based on SLaks answer (+1, by the way):
var myString = #"i:0#.w|domain\x123456";
var r = new Regex(#"\|(?<domain>[^\\]+)\\(?<user>.+)");
var match = r.Matches(myString)[0]; // get first match
var domain = match.Result("${domain}");
var user = match.Result("${user}");
Personally, however, I would prefer the following syntax, if you are just extracting the values:
var domain = match.Groups["domain"];
var user = match.Groups["user"];
And you really don't need lookbehind assertions here.

C# Regex: Get group names?

myRegex.GetGroupNames()
Seems to return the numbered groups as well... how do I get only the named ones?
A solution using the actual Match object would be fine as well.

Does using the RegexOptions.ExplicitCapture option when creating the regex do what you want ? e.g.
Regex theRegex = new Regex(#"\b(?<word>\w+)\b", RegexOptions.ExplicitCapture);
From MSDN:
Specifies that the only valid captures
are explicitly named or numbered
groups of the form (?<name>...). This
allows unnamed parentheses to act as
noncapturing groups without the
syntactic clumsiness of the expression
(?:...).
So you won't need to worry about whether users of your API may or may not use non-capturing groups if this option is set.

See the other comments/answers about using (?:) and/or sticking with "one style". Here is my best approach that tries to directly solve the question:
var named = regex.GetGroupNames().Where(x => !Regex.IsMatch(x, "^\\d+$"));
However, this will fail for regular expressions like (?<42>...).
Happy coding.

public string[] GetGroupNames(Regex re)
{
var groupNames = re.GetGroupNames();
var groupNumbers = re.GetGroupNumbers();
Contract.Assert(groupNames.Length == groupNumbers.Length);
return Enumerable.Range(0, groupNames.Length)
.Where(i => groupNames[i] != groupNumbers[i].ToString())
.Select(i => groupNames[i])
.ToArray();
}
Actually, this will still fail when the group name and number happen to be the same :\ But it will succeed even when the group name is a number, as long as the number is not the same as its index.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

c# Regex named groups access non existent group name [duplicate] - c#

Related

How specific are LINQ .Contains IEnumerable results?

how to use c# regex to fetch substring between recognized patterns?

Extracting Groups and Sub-groups in RegEx

RegEx: Correct usage of lookbehind assertion and group definitions

C# Regex: Get group names?

Categories

Resources