Regex to match multiple number groups between two characters

Regex to match multiple number groups between two characters - c#

I have a string that looks like the following:
<#399969178745962506> hello to <#!104729417217032192>
I have a dictionary containing both that looks like following:
{"399969178745962506", "One"},
{"104729417217032192", "Two"}
My goal here is to replace the <#399969178745962506> into the value of that number key, which in this case would be One
Regex.Replace(arg.Content, "(?<=<)(.*?)(?=>)", m => userDic.ContainsKey(m.Value) ? userDic[m.Value] : m.Value);
My current regex is as following: (?<=<)(.*?)(?=>) which only matches everything in between < and > which would in this case leave both #399969178745962506 and #!104729417217032192
I can't just ignore the # sign, because the ! sign is not there every time. So it could be optimal to only get numbers with something like \d+
I need to figure out how to only get the numbers between < and > but I can't for the life of me figure out how.
Very grateful for any help!

In C#, you may use 2 approaches: a lookaround based on (since lookbehind patterns can be variable width) and a capturing group approach.
Lookaround based approach
The pattern that will easily help you get the digits in the right context is
(?<=<#!?)\d+(?=>)
See the regex demo
The (?<=<#!?) is a positive lookbehind that requires <= or <=! immediately to the left of the current location and (?=>) is a positive lookahead that requires > char immediately to the right of the current location.
Capturing approach
You may use the following pattern that will capture the digits inside the expected <...> substrings:
<#!?(\d+)>
Details
<# - a literal <# substring
!? - an optional exclamation sign
(\d+) - capturing group 1 that matches one or more digits
> - a literal > sign.
Note that the values you need can be accessed via match.Groups[1].Value as shown in the snippet above.
Usage:
var userDic = new Dictionary<string, string> {
{"399969178745962506", "One"},
{"104729417217032192", "Two"}
};
var p = #"<#!?(\d+)>";
var s = "<#399969178745962506> hello to <#!104729417217032192>";
Console.WriteLine(
Regex.Replace(s, p, m => userDic.ContainsKey(m.Groups[1].Value) ?
userDic[m.Groups[1].Value] : m.Value
)
); // => One hello to Two
// Or, if you need to keep <#, <#! and >
Console.WriteLine(
Regex.Replace(s, #"(<#!?)(\d+)>", m => userDic.ContainsKey(m.Groups[2].Value) ?
$"{m.Groups[1].Value}{userDic[m.Groups[2].Value]}>" : m.Value
)
); // => <#One> hello to <#!Two>
See the C# demo.

To extract just the numbers from you're given format, use this regex pattern:
(?<=<#|<#!)(\d+)(?=>)
See it work in action: https://regexr.com/3j6ia

You can use non-capturing groups to exclude parts of the needed pattern to be inside the group:
(?<=<)(?:#?!?)(.*?)(?=>)
alternativly you could name the inner group and use the named group to get it:
(?<=<)(?:#?!?)(?<yourgroupname>.*?)(?=>)
Access it via m.Groups["yourgroupname"].Value - more see f.e. How do I access named capturing groups in a .NET Regex?

Regex: (?:<#!?(\d+)>)
Details:
(?:) Non-capturing group
<# matches the characters <# literally
? Matches between zero and one times
(\d+) 1st Capturing Group \d+ matches a digit (equal to [0-9])
Regex demo
string text = "<#399969178745962506> hello to <#!104729417217032192>";
Dictionary<string, string> list = new Dictionary<string, string>() { { "399969178745962506", "One" }, { "104729417217032192", "Two" } };
text = Regex.Replace(text, #"(?:<#!?(\d+)>)", m => list.ContainsKey(m.Groups[1].Value) ? list[m.Groups[1].Value] : m.Value);
Console.WriteLine(text); \\ One hello to Two
Console.ReadLine();

Related

c# Regex of value after certain words

I have a question at regex I have a string that looks like:
Slot:0 Module:No module in slot
And what I need is a regex that well get values after slot and module, slot will allways be a number but i have a problem with module (this can be word with spaces), I tried:
var pattern = "(?<=:)[a-zA-Z0-9]+";
foreach (string config in backplaneConfig)
{
List<string> values = Regex.Matches(config, pattern).Cast<Match>().Select(x => x.Value).ToList();
modulesInfo.Add(new ModuleIdentyfication { ModuleSlot = Convert.ToInt32(values.First()), ModuleType = values.Last() });
}
So slot part works, but module works only if it is a word with no spaces, in my example it will give me only "No". Is there a way to do that

You may use a regex to capture the necessary details in the input string:
var pattern = #"Slot:(\d+)\s*Module:(.+)";
foreach (string config in backplaneConfig)
{
var values = Regex.Match(config, pattern);
if (values.Success)
{
modulesInfo.Add(new ModuleIdentyfication { ModuleSlot = Convert.ToInt32(values.Groups[1].Value), ModuleType = values.Groups[2].Value });
}
}
See the regex demo. Group 1 is the ModuleSlot and Group 2 is the ModuleType.
Details
Slot: - literal text
(\d+) - Capturing group 1: one or more digits
\s* - 0+ whitespaces
Module: - literal text
(.+) - Capturing group 2: the rest of the string to the end.

The most simple way would be to add 'space' to your pattern
var pattern = "(?<=:)[a-zA-Z0-9 ]+";
But the best solution would probably the answer from #Wiktor Stribiżew

Another option is to match either 1+ digits followed by a word boundary or match a repeating pattern using your character class but starting with [a-zA-Z]
(?<=:)(?:\d+\b|[a-zA-Z][a-zA-Z0-9]*(?: [a-zA-Z0-9]+)*)
(?<=:) Assert a : on the left
(?: Non capturing group
\d+\b Match 1+ digits followed by a word boundary
| Or
[a-zA-Z][a-zA-Z0-9]* Start a match with a-zA-Z
(?: [a-zA-Z0-9]+)* Optionally repeat a space and what is listed in the character class
) Close on capturing group
Regex demo

Plase replace this:
// regular exp.
(\d+)\s*(.+)

You don't need to use regex for such simple parsing. Try below:
var str = "Slot:0 Module:No module in slot";
str.Split(new string[] { "Slot:", "Module:"},StringSplitOptions.RemoveEmptyEntries)
.Select(s => s.Trim());

Match properties using regex

I have a string like that represent a set of properties, for example:
AB=0, TX="123", TEST=LDAP, USR=" ", PROPS="DN=VB, XN=P"
I need to extract this properties in:
AB=0
TX=123
TEST=LDAP
USR=
PROPS=DN=VB, XN=P
To resolve this problem I tried to use a regex, but without success.
public IEnumerable<string> SplitStr(string input)
{
Regex reg= new Regex("((?<=\")[^\"]*(?=\"(,|$)+)|(?<=,|^)[^,\"]*(?=,|$))", RegexOptions.Compiled);
foreach (Match match in reg.Matches(input))
{
yield return match.Value.Trim(',');
}
}
I can't find the ideal regex to expected output. With the above regex the output is:
AB=0
123
TEST=LDAP
DN=VB, XN=P
Anyone can help me?

You may use
public static IEnumerable<string> SplitStr(string input)
{
var matches = Regex.Matches(input, #"(\w+=)(?:""([^""]*)""|(\S+)\b)");
foreach (Match match in matches)
{
yield return string.Concat(match.Groups.Cast<Group>().Skip(1).Select(x => x.Value)).Trim();
}
}
The regex details:
(\w+=) - Group 1: one or more word chars and a = char
(?:""([^""]*)""|(\S+)\b) - a non-capturing group matching either of the two alternatives:
"([^"]*)" - a ", then 0 or more chars other than " and then a "
| - or
(\S+)\b - any 1+ chars other than whitespace, as many as possible, up to the word boundary position.
See the regex demo.
The string.Concat(match.Groups.Cast<Group>().Skip(1).Select(x => x.Value)).Trim() code omits the Group 0 (whole match) value from the groups, takes Group 1, 2 and 3 and concats them into a single string, and trims it afterwards.
C# test:
var s = "AB=0, TX=\"123\", TEST=LDAP, USR=\" \", PROPS=\"DN=VB, XN=P\"";
Console.WriteLine(string.Join("\n", SplitStr(s)));
Output:
AB=0
TX=123
TEST=LDAP
USR=
PROPS=DN=VB, XN=P

Another way could be to use 2 capturing groups where the first group captures the first part including the equals sign and the second group captures the value after the equals sign.
Then you can concatenate the groups and use Trim to remove the double quotes. If you also want to remove the whitespaces after that, you could use Trim again.
([^=\s,]+=)("[^"]+"|[^,\s]+)
That will match
( First capturing group
[^=\s,]+= Match 1+ times not an equals sign, comma or whitespace char, then match = (If the property name can contain a comma, you could instead use character class and specify what you would allow to match like for example[\w,]+)
) Close group
( Second capturing group
"[^"]+" Match from opening till closing double quote
| Or
[^,\s]+ Match 1+ times not a comma or whitespace char
)
Regex demo | C# demo
Your code might look like:
public IEnumerable<string> SplitStr(string input)
{
foreach (Match m in Regex.Matches(input, #"([^=\s,]+=)(""[^""]+""|[^,\s]+)"))
{
yield return string.Concat(m.Groups[1].Value, m.Groups[2].Value.Trim('"'));
}
}

Regex first digits occurrence

My task is extract the first digits in the following string:
GLB=VSCA|34|speed|1|
My pattern is the following:
(?x:VSCA(\|){1}(\d.))
Basically I need to extract "34", the first digits occurrence after the "VSCA". With my pattern I obtain a group but would be possibile to get only the number? this is my c# snippet:
string regex = #"(?x:VSCA(\|){1}(\d.))";
Regex rx = new Regex(regex);
string s = "GLB=VSCA|34|speed|1|";
if (rx.Match(s).Success)
{
var test = rx.Match(s).Groups[1].ToString();
}

You could match 34 (the first digits after VSCA) using a positive lookbehind (?<=VSCA\D*) to assert that what is on the left side is VSCA followed by zero or times not a digit \D* and then match one or more digits \d+:
(?<=VSCA\D*)\d+
If you need the pipe to be after VSCA the you could include that in the lookbehind:
(?<=VSCA\|)\d+
Demo

This regex pattern: (?<=VSCA\|)\d+?(?=\|) will match only the number. (If your number can be negative / have decimal places you may want to use (?<=VSCA\|).+?(?=\|) instead)

You don't need Regex for this, you can simply split on the '|' character:
string s = "GLB=VSCA|34|speed|1|";
string[] parts = s.Split('|');
if(parts.Length >= 2)
{
Console.WriteLine(parts[1]); //prints 34
}
The benefit here is that you can access all parts of the original string based on the index:
[0] - "GLB=VSCA"
[1] - "34"
[2] - "speed"
[3] - "1"
Fiddle here

While the other answers work really well, if you really must use a regular expression, or are interested in knowing how to get to that straight away you can use a named group for the number. Consider the following code:
string regex = #"(?x:VSCA(\|){1}(?<number>\d.?))";
Regex rx = new Regex(regex);
string s = "GLB:VSCA|34|speed|1|";
var match = rx.Match(s);
if(match.Success) Console.WriteLine(match.Groups["number"]);

How about (?<=VSCA\|)[0-9]+?
Try it out here

Extract a string surrounded by two known values c# regex [duplicate]

I've inherited a code block that contains the following regex and I'm trying to understand how it's getting its results.
var pattern = #"\[(.*?)\]";
var matches = Regex.Matches(user, pattern);
if (matches.Count > 0 && matches[0].Groups.Count > 1)
...
For the input user == "Josh Smith [jsmith]":
matches.Count == 1
matches[0].Value == "[jsmith]"
... which I understand. But then:
matches[0].Groups.Count == 2
matches[0].Groups[0].Value == "[jsmith]"
matches[0].Groups[1].Value == "jsmith" <=== how?
Looking at this question from what I understand the Groups collection stores the entire match as well as the previous match. But, doesn't the regexp above match only for [open square bracket] [text] [close square bracket] so why would "jsmith" match?
Also, is it always the case the the groups collection will store exactly 2 groups: the entire match and the last match?

match.Groups[0] is always the same as match.Value, which is the entire match.
match.Groups[1] is the first capturing group in your regular expression.
Consider this example:
var pattern = #"\[(.*?)\](.*)";
var match = Regex.Match("ignored [john] John Johnson", pattern);
In this case,
match.Value is "[john] John Johnson"
match.Groups[0] is always the same as match.Value, "[john] John Johnson".
match.Groups[1] is the group of captures from the (.*?).
match.Groups[2] is the group of captures from the (.*).
match.Groups[1].Captures is yet another dimension.
Consider another example:
var pattern = #"(\[.*?\])+";
var match = Regex.Match("[john][johnny]", pattern);
Note that we are looking for one or more bracketed names in a row. You need to be able to get each name separately. Enter Captures!
match.Groups[0] is always the same as match.Value, "[john][johnny]".
match.Groups[1] is the group of captures from the (\[.*?\])+. The same as match.Value in this case.
match.Groups[1].Captures[0] is the same as match.Groups[1].Value
match.Groups[1].Captures[1] is [john]
match.Groups[1].Captures[2] is [johnny]

The ( ) acts as a capture group. So the matches array has all of matches that C# finds in your string and the sub array has the values of the capture groups inside of those matches. If you didn't want that extra level of capture jut remove the ( ).

Groups[0] is your entire input string.
Groups[1] is your group captured by parentheses (.*?). You can configure Regex to capture Explicit groups only (there is an option for that when you create a regex), or use (?:.*?) to create a non-capturing group.

The parenthesis is identifying a group as well, so match 1 is the entire match, and match 2 are the contents of what was found between the square brackets.

How? The answer is here
(.*?)
That is a subgroup of #"[(.*?)];

C# Regex - starts with pattern1 not contain pattern2

for the following input string contains all of these:
a1.aaa[SUBSCRIBED]
a1.bbb
a1.ccc
b1.ddd
d1.ddd[SUBSCRIBED]
I want to get the output:
bbb
ccc
which means: all the words that come after "a1." And not contain the substring "[SUBSCRIBED]"

all the words comes after "a1." And not contains the substring
"[SUBSCRIBED]"
Why regex? Following is crystal clear:
var result = strings
.Where(s => s.StartsWith("a1.") && !s.Contains("[SUBSCRIBED]"))
.Select(s => s.Substring(3));

Tim's answer makes sense. However if you insist on it I would venture that a Regex would look like this though.
^a1\.(.*)(?<!\[SUBSCRIBED\])$
with ^a1 meaning starts with a1
\.(.*) taking any number of character
and the negative lookbehind (?<!\[SUBSCRIBED\])$ would refuse text ending with [SUBSCRIBED]

You may use
^a1\.(?!.*\[SUBSCRIBED])(.*)
See the regex demo.
Details
^ - start of string
a1\. - a literal a1. substring
(?!.*\[SUBSCRIBED]) - a negative lookahead that fails the match if there is a [SUBSCRIBED] substring is present after any 0+ chars (other than newline if the RegexOptions.Singleline option is not used)
(.*) - Group 1: the rest of the line up to the end (if you use RegexOptions.Singleline option, . will match newlines as well).
C# code:
var result = string.Empty;
var m = Regex.Match(s, #"^a1\.(?!.*\[SUBSCRIBED])(.*)");
if (m.Success)
{
result = m.Groups[1].Value;
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex to match multiple number groups between two characters - c#

To extract just the numbers from you're given format, use this regex pattern: (?<=<#|<#!)(\d+)(?=>) See it work in action: https://regexr.com/3j6ia

Related

c# Regex of value after certain words

Match properties using regex

Regex first digits occurrence

Extract a string surrounded by two known values c# regex [duplicate]

C# Regex - starts with pattern1 not contain pattern2

Categories

Resources