I've inherited a code block that contains the following regex and I'm trying to understand how it's getting its results.
var pattern = #"\[(.*?)\]";
var matches = Regex.Matches(user, pattern);
if (matches.Count > 0 && matches[0].Groups.Count > 1)
...
For the input user == "Josh Smith [jsmith]":
matches.Count == 1
matches[0].Value == "[jsmith]"
... which I understand. But then:
matches[0].Groups.Count == 2
matches[0].Groups[0].Value == "[jsmith]"
matches[0].Groups[1].Value == "jsmith" <=== how?
Looking at this question from what I understand the Groups collection stores the entire match as well as the previous match. But, doesn't the regexp above match only for [open square bracket] [text] [close square bracket] so why would "jsmith" match?
Also, is it always the case the the groups collection will store exactly 2 groups: the entire match and the last match?
match.Groups[0] is always the same as match.Value, which is the entire match.
match.Groups[1] is the first capturing group in your regular expression.
Consider this example:
var pattern = #"\[(.*?)\](.*)";
var match = Regex.Match("ignored [john] John Johnson", pattern);
In this case,
match.Value is "[john] John Johnson"
match.Groups[0] is always the same as match.Value, "[john] John Johnson".
match.Groups[1] is the group of captures from the (.*?).
match.Groups[2] is the group of captures from the (.*).
match.Groups[1].Captures is yet another dimension.
Consider another example:
var pattern = #"(\[.*?\])+";
var match = Regex.Match("[john][johnny]", pattern);
Note that we are looking for one or more bracketed names in a row. You need to be able to get each name separately. Enter Captures!
match.Groups[0] is always the same as match.Value, "[john][johnny]".
match.Groups[1] is the group of captures from the (\[.*?\])+. The same as match.Value in this case.
match.Groups[1].Captures[0] is the same as match.Groups[1].Value
match.Groups[1].Captures[1] is [john]
match.Groups[1].Captures[2] is [johnny]
The ( ) acts as a capture group. So the matches array has all of matches that C# finds in your string and the sub array has the values of the capture groups inside of those matches. If you didn't want that extra level of capture jut remove the ( ).
Groups[0] is your entire input string.
Groups[1] is your group captured by parentheses (.*?). You can configure Regex to capture Explicit groups only (there is an option for that when you create a regex), or use (?:.*?) to create a non-capturing group.
The parenthesis is identifying a group as well, so match 1 is the entire match, and match 2 are the contents of what was found between the square brackets.
How? The answer is here
(.*?)
That is a subgroup of #"[(.*?)];
Related
I am trying to make a regular expression that will tell me if a string has {0#} where zero can be repeated. Once I confirm that a string has this I am then trying to set it to a variable so I can count the number of 0s and replace the # with another number. I have /([{0]})([#}])/g which works on detection but not on pulling it out to another variable.
Edit:
Thanks to all, the answer was
Regex regex = new Regex(#"\{(0+)(#)\}");
Match match = regex.Match(text);
if (match.Success)
{
int zeros = Regex.Matches(match.Value, "0").Count;
}
Use this:
\{(0+)(#)\}
character {
then one or more occurance of 0
a # sign
character }
Live Demo
You are super close. The problem you are having is because your capture group - the ( ) needs to be just around the zeroes. You also don't strictly need the other capture group unless you are doing something with it. You can rewrite your regex like this:
{(0+)#}
{ - match '{'
(0+) - match and capture one or more '0'
# - match '#'
} - match '}'
I have a string that looks like the following:
<#399969178745962506> hello to <#!104729417217032192>
I have a dictionary containing both that looks like following:
{"399969178745962506", "One"},
{"104729417217032192", "Two"}
My goal here is to replace the <#399969178745962506> into the value of that number key, which in this case would be One
Regex.Replace(arg.Content, "(?<=<)(.*?)(?=>)", m => userDic.ContainsKey(m.Value) ? userDic[m.Value] : m.Value);
My current regex is as following: (?<=<)(.*?)(?=>) which only matches everything in between < and > which would in this case leave both #399969178745962506 and #!104729417217032192
I can't just ignore the # sign, because the ! sign is not there every time. So it could be optimal to only get numbers with something like \d+
I need to figure out how to only get the numbers between < and > but I can't for the life of me figure out how.
Very grateful for any help!
In C#, you may use 2 approaches: a lookaround based on (since lookbehind patterns can be variable width) and a capturing group approach.
Lookaround based approach
The pattern that will easily help you get the digits in the right context is
(?<=<#!?)\d+(?=>)
See the regex demo
The (?<=<#!?) is a positive lookbehind that requires <= or <=! immediately to the left of the current location and (?=>) is a positive lookahead that requires > char immediately to the right of the current location.
Capturing approach
You may use the following pattern that will capture the digits inside the expected <...> substrings:
<#!?(\d+)>
Details
<# - a literal <# substring
!? - an optional exclamation sign
(\d+) - capturing group 1 that matches one or more digits
> - a literal > sign.
Note that the values you need can be accessed via match.Groups[1].Value as shown in the snippet above.
Usage:
var userDic = new Dictionary<string, string> {
{"399969178745962506", "One"},
{"104729417217032192", "Two"}
};
var p = #"<#!?(\d+)>";
var s = "<#399969178745962506> hello to <#!104729417217032192>";
Console.WriteLine(
Regex.Replace(s, p, m => userDic.ContainsKey(m.Groups[1].Value) ?
userDic[m.Groups[1].Value] : m.Value
)
); // => One hello to Two
// Or, if you need to keep <#, <#! and >
Console.WriteLine(
Regex.Replace(s, #"(<#!?)(\d+)>", m => userDic.ContainsKey(m.Groups[2].Value) ?
$"{m.Groups[1].Value}{userDic[m.Groups[2].Value]}>" : m.Value
)
); // => <#One> hello to <#!Two>
See the C# demo.
To extract just the numbers from you're given format, use this regex pattern:
(?<=<#|<#!)(\d+)(?=>)
See it work in action: https://regexr.com/3j6ia
You can use non-capturing groups to exclude parts of the needed pattern to be inside the group:
(?<=<)(?:#?!?)(.*?)(?=>)
alternativly you could name the inner group and use the named group to get it:
(?<=<)(?:#?!?)(?<yourgroupname>.*?)(?=>)
Access it via m.Groups["yourgroupname"].Value - more see f.e. How do I access named capturing groups in a .NET Regex?
Regex: (?:<#!?(\d+)>)
Details:
(?:) Non-capturing group
<# matches the characters <# literally
? Matches between zero and one times
(\d+) 1st Capturing Group \d+ matches a digit (equal to [0-9])
Regex demo
string text = "<#399969178745962506> hello to <#!104729417217032192>";
Dictionary<string, string> list = new Dictionary<string, string>() { { "399969178745962506", "One" }, { "104729417217032192", "Two" } };
text = Regex.Replace(text, #"(?:<#!?(\d+)>)", m => list.ContainsKey(m.Groups[1].Value) ? list[m.Groups[1].Value] : m.Value);
Console.WriteLine(text); \\ One hello to Two
Console.ReadLine();
I'm reading from a file, and need to find a string that is encapsulated by two identical non-ascii values/control seperators, in this case 'RS'
How would I go about doing this? Would I need some form of regex?
RS stands for Record Separator, and it has a value of 30 (or 0x1E in hexadecimal). You can use this regular expression:
\x1E([\w\s]*?)\x1E
That matches the RS, then matches any letter, number or space, and then again the RS. The ? is to make the regex match as less characters as possible, in case there are more RS characters afterwards.
If you prefer not to match numbers, you could use [a-zA-Z\s] instead of [\w\s].
Example:
string fileContents = "Something \u001Eyour string\u001E more things \u001Eanother text\u001E end.";
MatchCollection matches = Regex.Matches(fileContents, #"\x1E([\w\s]*?)\x1E");
if (matches.Count == 0)
return; // Not found, display an error message and exit.
foreach (Match match in matches)
{
if (match.Groups.Count > 1)
Console.WriteLine(match.Groups[1].Value);
}
As you can see, you get a collection of Match, and each match.Value will have the whole matched string including the separators. match.Groups will have all matched groups, being the first one again the whole matched string (that's by default) and then each of your groups (those between parenthesis). In this case, you only have one in your regex, so you just need the second one on that list.
Using regex you can do something like this:
string pattern = string.Format("{0}(.*){1}",firstString,secondString);
var matches = Regex.Matches(myString, pattern);
foreach (Match match in matches)
{
foreach (Capture capture in match.Captures)
{
//Do stuff, with the current you should remove firstString and secondString from the capture.Value
}
}
After that use Regex.match to find the string that match with the pattern built before.
Remember to escape all the special char for regex.
You can use Regex.Matches, I'm using X as the separator in this example:
var fileContents = "Xsomething1X Xsomething2X Xsomething3X";
var results = Regex.Matches(fileContents, #"(X).*?(\1)");
The you can loop on results to do anything you want with the matches.
The \1 in the regex means "reference first group". I've put X between () so it is going to be group 1, the I use \1 to say that the match in this place should be exactly the same as the group 1.
You don't need a regular expression for that.
Read the contents of the file (File.ReadAllText).
Split on the separator character (String.Split).
If you know there's only one occurrence of your string, take the second array element (result[1]). Otherwise, take every other entry (result.Where((x, i) => i % 2 == 1)).
I've tested my regex in a regex tester and the statement itself appears that it should be working, however instead of matching 4 objects as it should, it only matches 1 (the entire string) which I'm not sure why its even doing that...
rgx = new Regex(#"^([0-9]+)\.([0-9]+)\.([0-9]+)\.([0-9]+)$");
matches = rgx.Matches("0.0.0.95");
at this point if I do:
foreach (Match m in matches)
{
Console.WriteLine(m.Value);
}
it will just show "0.0.0.95" when it should be matching 0, 0, 0, and 95 and not the entire string. What am I doing wrong here?
ANSWER - The single match of the entire string contained the group matches I was looking for, accessed in this manner:
r.r1 = Convert.ToInt32(m.Groups[1].Value);
r.r2 = Convert.ToInt32(m.Groups[2].Value);
r.r3 = Convert.ToInt32(m.Groups[3].Value);
r.r4 = Convert.ToInt32(m.Groups[4].Value);
In this case you don't get multiple matches - there is only one match in there, but it has four capturing groups:
^([0-9]+)\.([0-9]+)\.([0-9]+)\.([0-9]+)$
// ^^^^^^^^ ^^^^^^^^ ^^^^^^^^ ^^^^^^^^
// Group 1 Group 2 Group 3 Group 4
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
// Group 0
There is a special group number zero that includes the entire match.
So you need to modify your program like this:
Console.WriteLine("One:'{0}' Two:'{1}' Three:'{2}' Four:'{3}'"
, m.Groups[1].Value
, m.Groups[2].Value
, m.Groups[3].Value
, m.Groups[4].Value
);
Capturing a repetition group is always returning the last element but that is not quite helpfull. For example:
var regex = new RegEx("^(?<somea>a)+$");
var match = regex.Match("aaa");
match.Group["somea"]; // return "a"
I would like to have a collection of match element instead of the last match item.
Is that possible?
CaptureCollection
You can use CaptureCollection which represents the set of captures made by a single capturing group.
If a quantifier is not applied to a capturing group, the CaptureCollection includes a single Capture object that represents the same captured substring as the Group object.
If a quantifier is applied to a capturing group, the CaptureCollection includes one Capture object for each captured substring, and the Group object provides information only about the last captured substring.
So you can do this
var regex = new Regex("^(?<somea>a)+$");
var match = regex.Match("aaa");
List<string> aCaptures=match.Groups["somea"]
.Captures.Cast<Capture>()
.Select(x=>x.Value)
.ToList<string>();
//aCaptures would now contain a list of a
Take a look in the Captures collection:
match.Groups["somea"].Captures
You can also try something like this :
var regex = new RegEx("^(?<somea>a)+$");
var matches = regex.Matches("aaa");
foreach(Match _match in matches){
match.Group["somea"]; // return "a"
}
This is just a sample but it should give a good start.
I did not check the validity of your regular expression though
You must use the quantifier + to the thing you want to match, not the group, if you quantify the group that will create as many groups as matches are.
So (a)+ in aaa Will create 1 group and will replace the match with the new occurrence of the match and (a+) will create 1 group with aaa
So you know what to do with your problem, just move the + inside the capturing group.