Match properties using regex

Match properties using regex - c#

I have a string like that represent a set of properties, for example:
AB=0, TX="123", TEST=LDAP, USR=" ", PROPS="DN=VB, XN=P"
I need to extract this properties in:
AB=0
TX=123
TEST=LDAP
USR=
PROPS=DN=VB, XN=P
To resolve this problem I tried to use a regex, but without success.
public IEnumerable<string> SplitStr(string input)
{
Regex reg= new Regex("((?<=\")[^\"]*(?=\"(,|$)+)|(?<=,|^)[^,\"]*(?=,|$))", RegexOptions.Compiled);
foreach (Match match in reg.Matches(input))
{
yield return match.Value.Trim(',');
}
}
I can't find the ideal regex to expected output. With the above regex the output is:
AB=0
123
TEST=LDAP
DN=VB, XN=P
Anyone can help me?

You may use
public static IEnumerable<string> SplitStr(string input)
{
var matches = Regex.Matches(input, #"(\w+=)(?:""([^""]*)""|(\S+)\b)");
foreach (Match match in matches)
{
yield return string.Concat(match.Groups.Cast<Group>().Skip(1).Select(x => x.Value)).Trim();
}
}
The regex details:
(\w+=) - Group 1: one or more word chars and a = char
(?:""([^""]*)""|(\S+)\b) - a non-capturing group matching either of the two alternatives:
"([^"]*)" - a ", then 0 or more chars other than " and then a "
| - or
(\S+)\b - any 1+ chars other than whitespace, as many as possible, up to the word boundary position.
See the regex demo.
The string.Concat(match.Groups.Cast<Group>().Skip(1).Select(x => x.Value)).Trim() code omits the Group 0 (whole match) value from the groups, takes Group 1, 2 and 3 and concats them into a single string, and trims it afterwards.
C# test:
var s = "AB=0, TX=\"123\", TEST=LDAP, USR=\" \", PROPS=\"DN=VB, XN=P\"";
Console.WriteLine(string.Join("\n", SplitStr(s)));
Output:
AB=0
TX=123
TEST=LDAP
USR=
PROPS=DN=VB, XN=P

Another way could be to use 2 capturing groups where the first group captures the first part including the equals sign and the second group captures the value after the equals sign.
Then you can concatenate the groups and use Trim to remove the double quotes. If you also want to remove the whitespaces after that, you could use Trim again.
([^=\s,]+=)("[^"]+"|[^,\s]+)
That will match
( First capturing group
[^=\s,]+= Match 1+ times not an equals sign, comma or whitespace char, then match = (If the property name can contain a comma, you could instead use character class and specify what you would allow to match like for example[\w,]+)
) Close group
( Second capturing group
"[^"]+" Match from opening till closing double quote
| Or
[^,\s]+ Match 1+ times not a comma or whitespace char
)
Regex demo | C# demo
Your code might look like:
public IEnumerable<string> SplitStr(string input)
{
foreach (Match m in Regex.Matches(input, #"([^=\s,]+=)(""[^""]+""|[^,\s]+)"))
{
yield return string.Concat(m.Groups[1].Value, m.Groups[2].Value.Trim('"'));
}
}

Related

c# Regex of value after certain words

I have a question at regex I have a string that looks like:
Slot:0 Module:No module in slot
And what I need is a regex that well get values after slot and module, slot will allways be a number but i have a problem with module (this can be word with spaces), I tried:
var pattern = "(?<=:)[a-zA-Z0-9]+";
foreach (string config in backplaneConfig)
{
List<string> values = Regex.Matches(config, pattern).Cast<Match>().Select(x => x.Value).ToList();
modulesInfo.Add(new ModuleIdentyfication { ModuleSlot = Convert.ToInt32(values.First()), ModuleType = values.Last() });
}
So slot part works, but module works only if it is a word with no spaces, in my example it will give me only "No". Is there a way to do that

You may use a regex to capture the necessary details in the input string:
var pattern = #"Slot:(\d+)\s*Module:(.+)";
foreach (string config in backplaneConfig)
{
var values = Regex.Match(config, pattern);
if (values.Success)
{
modulesInfo.Add(new ModuleIdentyfication { ModuleSlot = Convert.ToInt32(values.Groups[1].Value), ModuleType = values.Groups[2].Value });
}
}
See the regex demo. Group 1 is the ModuleSlot and Group 2 is the ModuleType.
Details
Slot: - literal text
(\d+) - Capturing group 1: one or more digits
\s* - 0+ whitespaces
Module: - literal text
(.+) - Capturing group 2: the rest of the string to the end.

The most simple way would be to add 'space' to your pattern
var pattern = "(?<=:)[a-zA-Z0-9 ]+";
But the best solution would probably the answer from #Wiktor Stribiżew

Another option is to match either 1+ digits followed by a word boundary or match a repeating pattern using your character class but starting with [a-zA-Z]
(?<=:)(?:\d+\b|[a-zA-Z][a-zA-Z0-9]*(?: [a-zA-Z0-9]+)*)
(?<=:) Assert a : on the left
(?: Non capturing group
\d+\b Match 1+ digits followed by a word boundary
| Or
[a-zA-Z][a-zA-Z0-9]* Start a match with a-zA-Z
(?: [a-zA-Z0-9]+)* Optionally repeat a space and what is listed in the character class
) Close on capturing group
Regex demo

Plase replace this:
// regular exp.
(\d+)\s*(.+)

You don't need to use regex for such simple parsing. Try below:
var str = "Slot:0 Module:No module in slot";
str.Split(new string[] { "Slot:", "Module:"},StringSplitOptions.RemoveEmptyEntries)
.Select(s => s.Trim());

Regex to match multiple number groups between two characters

I have a string that looks like the following:
<#399969178745962506> hello to <#!104729417217032192>
I have a dictionary containing both that looks like following:
{"399969178745962506", "One"},
{"104729417217032192", "Two"}
My goal here is to replace the <#399969178745962506> into the value of that number key, which in this case would be One
Regex.Replace(arg.Content, "(?<=<)(.*?)(?=>)", m => userDic.ContainsKey(m.Value) ? userDic[m.Value] : m.Value);
My current regex is as following: (?<=<)(.*?)(?=>) which only matches everything in between < and > which would in this case leave both #399969178745962506 and #!104729417217032192
I can't just ignore the # sign, because the ! sign is not there every time. So it could be optimal to only get numbers with something like \d+
I need to figure out how to only get the numbers between < and > but I can't for the life of me figure out how.
Very grateful for any help!

In C#, you may use 2 approaches: a lookaround based on (since lookbehind patterns can be variable width) and a capturing group approach.
Lookaround based approach
The pattern that will easily help you get the digits in the right context is
(?<=<#!?)\d+(?=>)
See the regex demo
The (?<=<#!?) is a positive lookbehind that requires <= or <=! immediately to the left of the current location and (?=>) is a positive lookahead that requires > char immediately to the right of the current location.
Capturing approach
You may use the following pattern that will capture the digits inside the expected <...> substrings:
<#!?(\d+)>
Details
<# - a literal <# substring
!? - an optional exclamation sign
(\d+) - capturing group 1 that matches one or more digits
> - a literal > sign.
Note that the values you need can be accessed via match.Groups[1].Value as shown in the snippet above.
Usage:
var userDic = new Dictionary<string, string> {
{"399969178745962506", "One"},
{"104729417217032192", "Two"}
};
var p = #"<#!?(\d+)>";
var s = "<#399969178745962506> hello to <#!104729417217032192>";
Console.WriteLine(
Regex.Replace(s, p, m => userDic.ContainsKey(m.Groups[1].Value) ?
userDic[m.Groups[1].Value] : m.Value
)
); // => One hello to Two
// Or, if you need to keep <#, <#! and >
Console.WriteLine(
Regex.Replace(s, #"(<#!?)(\d+)>", m => userDic.ContainsKey(m.Groups[2].Value) ?
$"{m.Groups[1].Value}{userDic[m.Groups[2].Value]}>" : m.Value
)
); // => <#One> hello to <#!Two>
See the C# demo.

To extract just the numbers from you're given format, use this regex pattern:
(?<=<#|<#!)(\d+)(?=>)
See it work in action: https://regexr.com/3j6ia

You can use non-capturing groups to exclude parts of the needed pattern to be inside the group:
(?<=<)(?:#?!?)(.*?)(?=>)
alternativly you could name the inner group and use the named group to get it:
(?<=<)(?:#?!?)(?<yourgroupname>.*?)(?=>)
Access it via m.Groups["yourgroupname"].Value - more see f.e. How do I access named capturing groups in a .NET Regex?

Regex: (?:<#!?(\d+)>)
Details:
(?:) Non-capturing group
<# matches the characters <# literally
? Matches between zero and one times
(\d+) 1st Capturing Group \d+ matches a digit (equal to [0-9])
Regex demo
string text = "<#399969178745962506> hello to <#!104729417217032192>";
Dictionary<string, string> list = new Dictionary<string, string>() { { "399969178745962506", "One" }, { "104729417217032192", "Two" } };
text = Regex.Replace(text, #"(?:<#!?(\d+)>)", m => list.ContainsKey(m.Groups[1].Value) ? list[m.Groups[1].Value] : m.Value);
Console.WriteLine(text); \\ One hello to Two
Console.ReadLine();

Find String Between To Identical Control Separators?

I'm reading from a file, and need to find a string that is encapsulated by two identical non-ascii values/control seperators, in this case 'RS'
How would I go about doing this? Would I need some form of regex?

RS stands for Record Separator, and it has a value of 30 (or 0x1E in hexadecimal). You can use this regular expression:
\x1E([\w\s]*?)\x1E
That matches the RS, then matches any letter, number or space, and then again the RS. The ? is to make the regex match as less characters as possible, in case there are more RS characters afterwards.
If you prefer not to match numbers, you could use [a-zA-Z\s] instead of [\w\s].
Example:
string fileContents = "Something \u001Eyour string\u001E more things \u001Eanother text\u001E end.";
MatchCollection matches = Regex.Matches(fileContents, #"\x1E([\w\s]*?)\x1E");
if (matches.Count == 0)
return; // Not found, display an error message and exit.
foreach (Match match in matches)
{
if (match.Groups.Count > 1)
Console.WriteLine(match.Groups[1].Value);
}
As you can see, you get a collection of Match, and each match.Value will have the whole matched string including the separators. match.Groups will have all matched groups, being the first one again the whole matched string (that's by default) and then each of your groups (those between parenthesis). In this case, you only have one in your regex, so you just need the second one on that list.

Using regex you can do something like this:
string pattern = string.Format("{0}(.*){1}",firstString,secondString);
var matches = Regex.Matches(myString, pattern);
foreach (Match match in matches)
{
foreach (Capture capture in match.Captures)
{
//Do stuff, with the current you should remove firstString and secondString from the capture.Value
}
}
After that use Regex.match to find the string that match with the pattern built before.
Remember to escape all the special char for regex.

You can use Regex.Matches, I'm using X as the separator in this example:
var fileContents = "Xsomething1X Xsomething2X Xsomething3X";
var results = Regex.Matches(fileContents, #"(X).*?(\1)");
The you can loop on results to do anything you want with the matches.
The \1 in the regex means "reference first group". I've put X between () so it is going to be group 1, the I use \1 to say that the match in this place should be exactly the same as the group 1.

You don't need a regular expression for that.
Read the contents of the file (File.ReadAllText).
Split on the separator character (String.Split).
If you know there's only one occurrence of your string, take the second array element (result[1]). Otherwise, take every other entry (result.Where((x, i) => i % 2 == 1)).

Improve RegEx search

Using DirectoryServices.AccountManagement I'm getting users DistinguishedName which looks like so:
CN=Adam West,OU=STORE,OU=COMPANY,DC=mycompany,DC=group,DC=eu
I need to get first OU value from this.
I found similar solution: C# Extracting a name from a string
And using some tweaks I created this code:
string input = #"CN=Adam West,OU=STORE,OU=COMPANY,DC=mycompany,DC=group,DC=eu";
Match m = Regex.Match(input, #"OU=([a-zA-Z\\]+)\,.*$");
Console.WriteLine(m.Groups[1].Value);
This code returns STORE as expected, but if I change Groups[1] to Groups[0] I get almost same result as input string:
OU=STORE,OU=COMPANY,DC=mycompany,DC=group,DC=eu
How can I change this regex so it will return only values of OU? SO that in this example I get array of 2 matches. If I would have more OU in my string then array would be longer.
EDIT:
I've converted my code (using #dasblinkenlight suggestions) into function:
private static List<string> GetOUs()
{
var input = #"CN=Adam West,OU=STORE,OU=COMPANY,DC=mycompany,DC=group,DC=eu";
var mm = Regex.Matches(input, #"OU=([a-zA-Z\\]+)");
return (from Match m in mm select m.Groups[1].Value).ToList();
}
Is that correct?

Your regular expression is fine (almost), you are just using a wrong API.
Remove the parts of the regexp that match up to the ending anchor $, and change the call of Match for a call of Matches, and get the matches in a loop, like this:
var input = #"CN=Adam West,OU=STORE,OU=COMPANY,DC=mycompany,DC=group,DC=eu";
var mm = Regex.Matches(input, #"OU=([a-zA-Z\\]+)");
foreach (Match m in mm)
Console.WriteLine(m.Groups[1].Value);
}

Your existing regex:
#"OU=([a-zA-Z\\]+)\,.*$"
Matches OU=, then some letters and backslashes ([a-zA-Z\\]+), then a comma, then any characters (.*) to the end of the line ($).
Thus a single match will always match the entire line after the first OU section.
Modify your regex by removing the ,.*$ at the end, at it will match each OU group:
#"OU=([a-zA-Z\\]+)"
Also note that the parentheses are a capturing group. They are useful if you also want to capture just the value part by itself, but if you are not using that, they are not necessary, and you can just have this:
#"OU=[a-zA-Z\\]+"

It's beacuse you are mixing up matches and groups
string input = #"CN=Adam West,OU=STORE,OU=COMPANY,DC=mycompany,DC=group,DC=eu";
MatchCollection mc = Regex.Matches(input, #"OU=([a-zA-Z\\]+),");
foreach(Match m in mc)
{
Console.WriteLine(m.Result("$1"));
}

Group[0] returns the full match:
Group[1] returns the first Pattern in the match [i.e. everything in the first parenthesis '(' ')' ]
So if you wanted to get exactly those 2 occurances of OU.. you could do this:
Match m = Regex.Match(input, #"OU=([a-zA-Z\\]+)\,OU=([a-zA-Z\\]+)\,.*$");
Console.WriteLine(m.Groups[1].Value);
Console.WriteLine(m.Groups[2].Value);
Group[0] returns the full match: (which you don't want)
Group[1] returns the first Pattern in the match [i.e everything in the first parenthesis '(' ')' ]
Group[2] returns the second Pattern in the match [i.e. everything in the second parenthesis '(' ')' ]
Giving:
STORE
COMPANY
But I'm assuming you don't want to be so explicit with your Regex for each Pattern you are interested in.
If you want to get multiple matches, then you need to do Regex's Matches call that returns a Matchcollection.
MatchCollection ms = Regex.Matches(...);
This still won't work with your current Regex though, because everything from STORE so the end of the line will be in the first match. If you only want to get the pattern "1-or-more-letters" after a "OU="
You only need:
#"OU=([a-zA-Z\\]+)"
So your code would be:
string input = #"CN=Adam West,OU=STORE,OU=COMPANY,DC=mycompany,DC=group,DC=eu";
MatchCollection ms = Regex.Matches(input, #"OU=([a-zA-Z\\]+)");
foreach (Match m in ms)
{
Console.WriteLine(m.Groups[1].Value);// get the string in the first "(" ")"
}

Regular expression for performing task being done by string functions

The below code is performing following functionality which I intend to integrate into larger application.
Splitting large input string input by dot (.) character wherever it
occurs in input string.
Storing the splitted substrings into array result[];
In the foreach loop , a substring is matched for occurrence of
keyword.
If match occurs , starting from position of this matched substring in original input string , upto 300 characters are to be printed.
string[] result = input.Split('.');
foreach (string str in result)
{
//Console.WriteLine(str);
Match m = Regex.Match(str, keyword);
if (m.Success)
{
int start = input.IndexOf(str);
if ((input.Length - start) < 300)
{
Console.WriteLine(input.Substring(start, input.Length - start));
break;
}
else
{
Console.WriteLine(input.Substring(start, 300));
break;
}
}
The input is in fact large amount of text and I think this should be done by regular expression. Being a novice ,I am not able to put everything together using a regular expressions .
Match keyword. Match m = Regex.Match(str, keyword);
300 characters starting from dot (.) i.e starting from matched sentence , print 300 characters "^.\w{0,300}"
What I intend to do is :
Search for keyword in input text.
Just as a match is found , start from the sentence containing the
keyword and print upto 300 characters from input string.
How should I proceed ? Please help .

If I got it right, all you need to do is find your keyword and capture all that follows until you find first dot or reach maximum number of characters:
#"keyword([^\.]{0,300})"
See sample demo here.
C# code:
var regex = new Regex(#"keyword([^\.]{0,300})");
foreach (Match match in regex.Matches(input))
{
var result = match.Groups[1].Value;
// work with the result
}

Try this regex:
(?<=\.?)([\w\s]{0,300}keyword.*?)(?=\.)
explain:
(?= subexpression) Zero-width positive lookahead assertion.
(?<= subexpression) Zero-width positive lookbehind assertion.
*? Matches the previous element zero or more times, but as few times as possible.
and a simple code:
foreach (Match match in Regex.Matches(input,
#"(?<=\.?)([\w\s]{0,300}print.*?)(?=\.)"))
{
Console.WriteLine(match.Groups[1].Value);
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Match properties using regex - c#

Related

c# Regex of value after certain words

Regex to match multiple number groups between two characters

Find String Between To Identical Control Separators?

Improve RegEx search

Regular expression for performing task being done by string functions

Categories

Resources