C# regex to extract key value - c#

Is there an easy and elegant way to extract key value pairs from a string of below format?
"key1='value1' key2='value 2' key3='value3' key4='value4' key5='5555' key6='xxx666'"
My attempt resulted in this but I'm not too happy with it
var regex = new Regex(#"\'\s", RegexOptions.None);
var someString = #"key1='value1' key2='value 2' key3='value3' key4='value4' key5='5555' key6='xxx666'" + " ";
var splitArray = regex.Split(someString);
IDictionary<string, string> keyValuePairs = new Dictionary<string, string>();
foreach (var split in splitArray)
{
regex = new Regex(#"\=\'", RegexOptions.None);
var keyValuArray = regex.Split(split);
if (keyValuArray.Length > 1)
{
keyValuePairs.Add(keyValuArray[0], keyValuArray[1]);
}
}

You should be able to do it without a split, using a MatchCollection instead:
var rx = new Regex("([^=\\s]+)='([^']*)'");
var str = "key1='value1' key2='value 2' key3='value3' key4='value4' key5='5555' key6='xxx666'";
foreach (Match m in rx.Matches(str)) {
Console.WriteLine("{0} {1}", m.Groups[1], m.Groups[2]);
}
Demo.
The heart of this solution is this regular expression: ([^=\\s]+)='([^']*)' It defines the structure of your key-value pair: a sequence of non-space characters defines the key, then there's an equal sign, followed by the value enclosed in single quotes. This solution goes through the matches in sequence, extracting keys and values, which are assigned to matching groups Group[1] and Group[2], in this order.

Another way to do it:
var someString = #"key1='value1' key2='value 2' key3='value3' key4='value4' key5='5555' key6='xxx666'" + " ";
Dictionary<string, string> dic = Regex.Matches(someString, #"(?<key>\w+)='(?<value>[^']*)'")
.OfType<Match>()
.ToDictionary(m => m.Groups["key"].Value, m => m.Groups["value"].Value);

You can do it like this
var str = "key1='value1' key2='value 2' key3='value3' key4='value4' key5='5555' key6='xxx666'";
var arr = Regex.Split(str, "(?<=')\\s(?=\\w)"); // split on whitespace to get key=value
foreach(var s in arr) {
var nArr = s.Split("="); // split on = to get key and value
keyValuePairs.Add(nArr[0], nArr[1]);
}
(?<=')\s(?=\w) will look for space which is after ' and before the start of the key

Related

C# Replace regex matched pattern using dictionary

I am trying to replace a pattern in my string where only the words between the tags should be replaced. The word that needs to be replaced resides in a dictionary as key and value pair.
Currently this is what I am trying:
string input = "<a>hello</a> <b>hello world</b> <c>I like apple</c>";
string pattern = (#"(?<=>)(.)?[^<>]*(?=</)");
Regex match = new Regex(pattern, RegexOptions.IgnoreCase);
MatchCollection matches = match.Matches(input);
var dictionary1 = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase);
dictionary1.Add("hello", "Hi");
dictionary1.Add("world", "people");
dictionary1.Add("apple", "fruit");
string output = "";
output = match.Replace(input, replace => { return dictionary1.ContainsKey(replace.Value) ? dictionary1[replace.Value] : replace.Value; });
Console.WriteLine(output);
Console.ReadLine();
Using this, it does replace but only the first 'hello' and not the second one. I want to replace every occurrence of 'hello' between the tags.
Any help will be much appreciated.
The problem is that the matches are:
hello
hello world
I like apple
so e.g. hello world is not in your dictionary.
Based on your code, this could be a solution:
using System;
using System.Text.RegularExpressions;
using System.Collections.Generic;
public class Program
{
public static void Main()
{
var dictionary1 = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase);
dictionary1.Add("hello", "Hi");
dictionary1.Add("world", "people");
dictionary1.Add("apple", "fruit");
string input = "<a>hello</a> <b>hello world</b> <c>I like apple</c>";
string pattern = ("(?<=>)(.)?[^<>]list|" + GetKeyList(dictionary1) + "(?=</)");
Regex match = new Regex(pattern, RegexOptions.IgnoreCase);
MatchCollection matches = match.Matches(input);
string output = "";
output = match.Replace(input, replace => {
Console.WriteLine(" - " + replace.Value);
return dictionary1.ContainsKey(replace.Value) ? dictionary1[replace.Value] : replace.Value;
});
Console.WriteLine(output);
}
private static string GetKeyList(Dictionary<string, string> list)
{
return string.Join("|", new List<string>(list.Keys).ToArray());
}
}
Fiddle: https://dotnetfiddle.net/zNkEDv
If someone wants to dig into this an tell me why do I need a "list|" in the list (because the first item is being ignored), I'll appreciate it.
This is another way of doing it - I parse the string into XML and then select elements containing the keys in your dictionary and then replace each element's value.
However, you have to have a valid XML document - your example lacks a root node.
var xDocument = XDocument.Parse("<root><a>hello</a> <b>hello world</b> <c>I like apple</c></root>");
var dictionary1 = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase) { { "hello", "Hi" }, { "world", "people" }, { "apple", "fruit" } };
string pattern = #"\w+";
Regex match = new Regex(pattern, RegexOptions.IgnoreCase);
var xElements = xDocument.Root.Descendants()
.Where(x => dictionary1.Keys.Any(s => x.Value.Contains(s)));
foreach (var xElement in xElements)
{
var updated = match.Replace(xElement.Value,
replace => {
return dictionary1.ContainsKey(replace.Value)
? dictionary1[replace.Value] : replace.Value; });
xElement.Value = updated;
}
string output = xDocument.ToString(SaveOptions.DisableFormatting);
This pattern of "\w+" matches words, not spaces.
This LINQ selects descendants of the root node where the element value contains any of the keys of your dictionary:
var xElements = xDocument.Root.Descendants().Where(x => dictionary1.Keys.Any(s => x.Value.Contains(s)));
I then iterate through the XElement enumerable collection returned and apply your replacement MatchEvaluator to just the string value, which is a lot easier!
The final output is <root><a>Hi</a><b>Hi people</b><c>I like fruit</c></root>. You could then remove the opening and closing <root> and </root> tags, but I don't know what your complete XML looks like.
This will do what you want (from what you have provided so far):
private static Dictionary<string, string> dict;
static void Main(string[] args)
{
dict =
new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase)
{
{ "hello", "Hi" },
{ "world", "people" },
{ "apple", "fruit" }
};
var input = "<a>hello</a> <b>hello world</b> apple <c>I like apple</c> hello";
var pattern = #"<.>([^<>]+)<\/.>";
var output = Regex.Replace(input, pattern, Replacer);
Console.WriteLine(output);
Console.ReadLine();
}
static string Replacer(Match match)
{
var value = match.Value;
foreach (var kvp in dict)
{
if (value.Contains(kvp.Key)) value = value.Replace(kvp.Key, kvp.Value);
}
return value;
}

String data to be converted to Dictionary

I have a String
string data = "[City, Delhi]&[State, DL]&[Country, IN]";
from which I want a dictionary.
The approach I thought was
Split on "&"
In the resulting Array, parse each element
2.1 Replace "[" and "]"
2.2 Insert into the Dictionary
I hate this approach because my string already has "[" and "]" and I should be able to add it directly to Dictionary.
This is a good use case for regular expressions.
var d = Regex.Matches(data, #"\[(?<k>[^,]+), (?<v>[^]]+)\]")
.OfType<Match>()
.ToDictionary(m => m.Groups["k"].Value, m => m.Groups["v"].Value);
The approach you describe is probably as good as it's going to get.
A naive implementation (without error handling) would be:
var pairs = data.Split('&');
var dict = new Dictionary<string, string>();
foreach (var pair in pairs)
{
var parts = pair.Split(',');
dict.Add(
parts[0].TrimStart('['),
parts[1].TrimStart().TrimEnd(']'));
}
Or, using a regular expression to obtain the keys and values:
string data = "[City, Delhi]&[State, DL]&[Country, IN]";
var pairs = data.Split('&');
var dict = new Dictionary<string, string>();
var regex = new System.Text.RegularExpressions.Regex(#"\[(?<key>.*), (?<value>.*)\]");
foreach (var pair in pairs)
{
var match = regex.Match(pair);
// TODO: Error if match.Success == false ?
dict.Add(match.Groups["key"].Value, match.Groups["value"].Value);
}
You can try this using linq
string data = "[City, Delhi]&[State, DL]&[Country, IN]";
string[] arr = data.Replace("[", "").Replace("]", "").Split('&');
var dict = arr.ToList().ToDictionary(x => x.Split(',')[0], x => x.Split(',')[1]);
Try this...
private static void Splitter()
{
string data = "[City, Delhi]&[State, DL]&[Country, IN]";
Dictionary<string,string> dOutput = new Dictionary<string,string>();
string[] sArr = data.Split('&');
var v = from p in sArr
select p.Replace("[", "").Replace("]", "").Split(',');
var v2 = (from p in v
select p).ToDictionary(item => item[0], item => item[1]);
Console.WriteLine(v2.Count());
}
v2 is a dictionary object...

How to split and take multiple strings from a url in c#?

I have a string looking something like this:
/Gender=&Age=&Query=&Orgrimmar+l%C3%A4n=01&Stormwind+l%C3%A4n=07&Undercity+l%C3%A4n=09&Pag
I want a list of string with "Orgrimmar", "Stormwind" and "Undercity". How is this possible so that it splits AFTER Query and between & and + in order so that we avoid getting a string like this "Orgrimmar+l%C3%A4n=01&Stormwind".
Let us assume that we don't know the name of the strings.. :)
Updated, i still don't seem to get it to work. I have added a list of counties that i can use to validate this. However i still find it hard in this case. countyList is used to validate that the counties/cities in the url matches a pre-existing Collection.
var countyQuery = Request.Url.Query;
var counties = this._locationService.GetAllCounties();
List<string> countyList = new List<string>();
List<string> selectedCountiesList = new List<string>();
foreach (var i in counties)
{
countyList.Add(i.Name);
}
Regex r = new Regex(#"&(.+?)\+");
MatchCollection mc = r.Matches(countyQuery);
foreach (Match curMatch in mc)
{
if (countyList.Contains(curMatch.Groups[1].Value))
{
selectedCountiesList.Add(curMatch.Groups[1].Value);
}
}
return selectedCountiesList;
Changed url to be/?Gender=&Age=&Query=&county=13&county=08&county=01&Page=1
where 13, 08, 01 and so on is Id of the counties
The final solution was:
var selectedCountyQuery = Request.QueryString
//CountySearch = "county"
[QueryStringParameters.CountySearch];
List countyList = new List();
List<string> selectedCounties = new List<string>();
if (!string.IsNullOrEmpty(selectedCountyQuery))
{
var selectedCountiesArray = selectedCountyQuery.Split(new[]{ ',' });
foreach (var selectedCounty in selectedCountiesArray)
{
selectedCounties.Add(selectedCounty);
}
}
return selectedCounties;
You can get all parameter and value with Substring() and Split() method.
Example :
var URL = "controller/method?var1=&var2=&var3=dsgdf";
var ParameterPart = URL.Split("?")[1];
var ParametersArray = ParameterPart.Split("&");
//output : ["var1=","var2=","var3=dsgdf"];
foreach(var Parameter in ParametersArray)
{
var ParameterName= Parameter.Split("=")[0];
var ParameterValue= Parameter.Split("=")[1];
}
You can use a regex and extract the matches:
Regex r = new Regex(#"&(.+?)\+");
MatchCollection mc = r.Matches(s);
Then you can itterate your desired strings (in this case wow cities) like:
foreach(Match curMatch in mc)
{
Console.WriteLine(curMatch.Groups[1].Value);
}
string[] numbers ={ "/Gender=&Age=&Query=&Orgrimmar+l%C3%A4n=01&Stormwind+l%C3%A4n=07&Undercity+l%C3%A4n=09&Pag"};
string sPattern = #"(?<=&Orgrimmar)+";
foreach (string s in numbers){
if (System.Text.RegularExpressions.Regex.IsMatch(s, sPattern)){
System.Console.WriteLine(" - valid");}
else{System.Console.WriteLine(" - invalid");}
Output: valid
string[] numbers ={ "/Gender=&Age=&Query=Orgrimmar+l%C3%A4n=01&Stormwind+l%C3%A4n=07&Undercity+l%C3%A4n=09&Pag"};
Output: invalid
Further to check two parameters:
string[] numbers ={ "/Gender=&Age=&Query=&Orgrimmar+l%C3%A4n=01&Stormwind+l%C3%A4n=07&Undercity+l%C3%A4n=09&Pag"};
string sPattern = #"(?<=&Orgrimmar)+";
string sPattern2 = #"(?<=&Stormwind)+";
foreach (string s in numbers){
if (System.Text.RegularExpressions.Regex.IsMatch(s, sPattern) && System.Text.RegularExpressions.Regex.IsMatch(s, sPattern2))
...

How to check if a regex groups are equal?

I have a RegEx that checks my string. In my string I have two groups ?<key> and ?<value>. So here is my sample string:
string input = "key=value&key=value1&key=value2";
I use MatchCollections and when I try to print my groups on the console that here is my code:
string input = Console.ReadLine();
string pattern = #"(?<key>\w+)=(?<value>\w+)";
Regex rgx = new Regex(pattern);
MatchCollection matches = rgx.Matches(input);
foreach (Match item in matches)
{
Console.Write("{0}=[{1}]",item.Groups["key"], item.Groups["value"]);
}
I get an output like this: key=[value]key=[value1]key=[value2]
But I want my output to be like this: key=[value, value1, value2]
My point is how to check the group "key" if it's equal to the previous one so I can make the output like that I want.
You can use a Dictionary<string, List<string>>:
string pattern = #"(?<key>\w+)=(?<value>\w+)";
Regex rgx = new Regex(pattern);
MatchCollection matches = rgx.Matches(input);
Dictionary<string, List<string>> results = new Dictionary<string, List<string>>();
foreach (Match item in matches)
{
if (!results.ContainsKey(item.Groups["key"].Value)) {
results.Add(item.Groups["key"].Value, new List<string>());
}
results[item.Groups["key"].Value].Add(item.Groups["value"].Value);
}
foreach (var r in results) {
Console.Write("{0}=[{1}]", r.Key, string.Join(", ", r.Value));
}
Note the use of string.Join to output the data in the format required.
Use a Dictionary<string,List<string>>
Something like:
var dict = new Dictionary<string,List<string>>();
foreach (Match item in matches)
{
var key = item.Groups["key"];
var val = item.Groups["value"];
if (!dict.ContainsKey(key))
{
dict[key] = new List<string>();
}
dict[key].Add(val);
}
You can use Linq GroupBy method:
string input = "key=value&key=value1&key=value2&key1=value3&key1=value4";
string pattern = #"(?<key>\w+)=(?<value>\w+)";
Regex rgx = new Regex(pattern);
MatchCollection matches = rgx.Matches(input);
foreach (var result in matches
.Cast<Match>()
.GroupBy(k => k.Groups["key"].Value, v => v.Groups["value"].Value))
{
Console.WriteLine("{0}=[{1}]", result.Key, String.Join(",", result));
}
Output for snippet (here I've added another key key1 with two values into you original input string):
key=[value,value1,value2]
key1=[value3,value4]

How do I get the name of captured groups in a C# Regex?

Is there a way to get the name of a captured group in C#?
string line = "No.123456789 04/09/2009 999";
Regex regex = new Regex(#"(?<number>[\d]{9}) (?<date>[\d]{2}/[\d]{2}/[\d]{4}) (?<code>.*)");
GroupCollection groups = regex.Match(line).Groups;
foreach (Group group in groups)
{
Console.WriteLine("Group: {0}, Value: {1}", ???, group.Value);
}
I want to get this result:
Group: [I donĀ“t know what should go here], Value: 123456789 04/09/2009 999
Group: number, Value: 123456789
Group: date, Value: 04/09/2009
Group: code, Value: 999
Use GetGroupNames to get the list of groups in an expression and then iterate over those, using the names as keys into the groups collection.
For example,
GroupCollection groups = regex.Match(line).Groups;
foreach (string groupName in regex.GetGroupNames())
{
Console.WriteLine(
"Group: {0}, Value: {1}",
groupName,
groups[groupName].Value);
}
The cleanest way to do this is by using this extension method:
public static class MyExtensionMethods
{
public static Dictionary<string, string> MatchNamedCaptures(this Regex regex, string input)
{
var namedCaptureDictionary = new Dictionary<string, string>();
GroupCollection groups = regex.Match(input).Groups;
string [] groupNames = regex.GetGroupNames();
foreach (string groupName in groupNames)
if (groups[groupName].Captures.Count > 0)
namedCaptureDictionary.Add(groupName,groups[groupName].Value);
return namedCaptureDictionary;
}
}
Once this extension method is in place, you can get names and values like this:
var regex = new Regex(#"(?<year>[\d]+)\|(?<month>[\d]+)\|(?<day>[\d]+)");
var namedCaptures = regex.MatchNamedCaptures(wikiDate);
string s = "";
foreach (var item in namedCaptures)
{
s += item.Key + ": " + item.Value + "\r\n";
}
s += namedCaptures["year"];
s += namedCaptures["month"];
s += namedCaptures["day"];
Since .NET 4.7, there is Group.Name property available.
You should use GetGroupNames(); and the code will look something like this:
string line = "No.123456789 04/09/2009 999";
Regex regex =
new Regex(#"(?<number>[\d]{9}) (?<date>[\d]{2}/[\d]{2}/[\d]{4}) (?<code>.*)");
GroupCollection groups = regex.Match(line).Groups;
var grpNames = regex.GetGroupNames();
foreach (var grpName in grpNames)
{
Console.WriteLine("Group: {0}, Value: {1}", grpName, groups[grpName].Value);
}
To update the existing extension method answer by #whitneyland with one that can handle multiple matches:
public static List<Dictionary<string, string>> MatchNamedCaptures(this Regex regex, string input)
{
var namedCaptureList = new List<Dictionary<string, string>>();
var match = regex.Match(input);
do
{
Dictionary<string, string> namedCaptureDictionary = new Dictionary<string, string>();
GroupCollection groups = match.Groups;
string[] groupNames = regex.GetGroupNames();
foreach (string groupName in groupNames)
{
if (groups[groupName].Captures.Count > 0)
namedCaptureDictionary.Add(groupName, groups[groupName].Value);
}
namedCaptureList.Add(namedCaptureDictionary);
match = match.NextMatch();
}
while (match!=null && match.Success);
return namedCaptureList;
}
Usage:
Regex pickoutInfo = new Regex(#"(?<key>[^=;,]+)=(?<val>[^;,]+(,\d+)?)", RegexOptions.ExplicitCapture);
var matches = pickoutInfo.MatchNamedCaptures(_context.Database.GetConnectionString());
string server = matches.Single( a => a["key"]=="Server")["val"];
The Regex class is the key to this!
foreach(Group group in match.Groups)
{
Console.WriteLine("Group: {0}, Value: {1}", regex.GroupNameFromNumber(group.Index), group.Value);
}
http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.groupnamefromnumber.aspx

Categories

Resources