C# String replace with dictionary - c#

I have a string on which I need to do some replacements. I have a Dictionary<string, string> where I have search-replace pairs defined. I have created following extension methods to perform this operation:
public static string Replace(this string str, Dictionary<string, string> dict)
{
StringBuilder sb = new StringBuilder(str);
return sb.Replace(dict).ToString();
}
public static StringBuild Replace(this StringBuilder sb,
Dictionary<string, string> dict)
{
foreach (KeyValuePair<string, string> replacement in dict)
{
sb.Replace(replacement.Key, replacement.Value);
}
return sb;
}
Is there a better way of doing that?

If the data is tokenized (i.e. "Dear $name$, as of $date$ your balance is $amount$"), then a Regex can be useful:
static readonly Regex re = new Regex(#"\$(\w+)\$", RegexOptions.Compiled);
static void Main() {
string input = #"Dear $name$, as of $date$ your balance is $amount$";
var args = new Dictionary<string, string>(
StringComparer.OrdinalIgnoreCase) {
{"name", "Mr Smith"},
{"date", "05 Aug 2009"},
{"amount", "GBP200"}
};
string output = re.Replace(input, match => args[match.Groups[1].Value]);
}
However, without something like this, I expect that your Replace loop is probably about as much as you can do, without going to extreme lengths. If it isn't tokenized, perhaps profile it; is the Replace actually a problem?

Do this with Linq:
var newstr = dict.Aggregate(str, (current, value) =>
current.Replace(value.Key, value.Value));
dict is your search-replace pairs defined Dictionary object.
str is your string which you need to do some replacements with.

Seems reasonable to me, except for one thing: it's order-sensitive. For instance, take an input string of "$x $y" and a replacement dictionary of:
"$x" => "$y"
"$y" => "foo"
The results of the replacement are either "foo foo" or "$y foo" depending on which replacement is performed first.
You could control the ordering using a List<KeyValuePair<string, string>> instead. The alternative is to walk through the string making sure you don't consume the replacements in further replace operations. That's likely to be a lot harder though.

Here's a lightly re-factored version of #Marc's great answer, to make the functionality available as an extension method to Regex:
static void Main()
{
string input = #"Dear $name$, as of $date$ your balance is $amount$";
var args = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase);
args.Add("name", "Mr Smith");
args.Add("date", "05 Aug 2009");
args.Add("amount", "GBP200");
Regex re = new Regex(#"\$(\w+)\$", RegexOptions.Compiled);
string output = re.replaceTokens(input, args);
// spot the LinqPad user // output.Dump();
}
public static class ReplaceTokensUsingDictionary
{
public static string replaceTokens(this Regex re, string input, IDictionary<string, string> args)
{
return re.Replace(input, match => args[match.Groups[1].Value]);
}
}

when using Marc Gravell's RegEx solution, first check if a token is available using i.e. ContainsKey, this to prevent KeyNotFoundException errors :
string output = re.Replace(zpl, match => { return args.ContainsKey(match.Groups[1].Value) ? arg[match.Groups[1].Value] : match.Value; });
when using the following slightly modified sample code (1st parameter has different name):
var args = new Dictionary<string, string>(
StringComparer.OrdinalIgnoreCase)
{
{"nameWRONG", "Mr Smith"},
{"date", "05 Aug 2009"},
{"AMOUNT", "GBP200"}
};
this produces the following:
"Dear $name$, as of 05 Aug 2009 your balance is GBP200"

The correct tool for this particular task is Mustache, a simple standard logicless templating language.
There are many implementations. For this example, I'm going to use stubble:
var stubble = new StubbleBuilder().Build();
var dataHash = new Dictionary<string, Object>()
{
{"Foo","My Foo Example"},
{"Bar",5}
};
var output = stubble.Render(
"Hey, watch me replace this: {{Foo}} ... with example text. Also {{bar}} is 5"
, dataHash
);

Here you are:
public static class StringExm
{
public static String ReplaceAll(this String str, KeyValuePair<String, String>[] map)
{
if (String.IsNullOrEmpty(str))
return str;
StringBuilder result = new StringBuilder(str.Length);
StringBuilder word = new StringBuilder(str.Length);
Int32[] indices = new Int32[map.Length];
for (Int32 characterIndex = 0; characterIndex < str.Length; characterIndex++)
{
Char c = str[characterIndex];
word.Append(c);
for (var i = 0; i < map.Length; i++)
{
String old = map[i].Key;
if (word.Length - 1 != indices[i])
continue;
if (old.Length == word.Length && old[word.Length - 1] == c)
{
indices[i] = -old.Length;
continue;
}
if (old.Length > word.Length && old[word.Length - 1] == c)
{
indices[i]++;
continue;
}
indices[i] = 0;
}
Int32 length = 0, index = -1;
Boolean exists = false;
for (int i = 0; i < indices.Length; i++)
{
if (indices[i] > 0)
{
exists = true;
break;
}
if (-indices[i] > length)
{
length = -indices[i];
index = i;
}
}
if (exists)
continue;
if (index >= 0)
{
String value = map[index].Value;
word.Remove(0, length);
result.Append(value);
if (word.Length > 0)
{
characterIndex -= word.Length;
word.Length = 0;
}
}
result.Append(word);
word.Length = 0;
for (int i = 0; i < indices.Length; i++)
indices[i] = 0;
}
if (word.Length > 0)
result.Append(word);
return result.ToString();
}
}

Why not just check whether such key exists?
if exists, then remove the pair, otherwise, skip this step;
add the same key, but now with newly desired value.
// say, you have the following collection
var fields = new Dictionary<string, string>();
fields.Add("key1", "value1");
fields.Add("key2", "value2");
fields.Add("key3", "value3");
// now, you want to add a pair "key2"/"value4"
// or replace current value of "key2" with "value4"
if (fields.ContainsKey("key2"))
{
fields.Remove("key2");
}
fields.Add("key2", "value4");

Related

Extract index numbers from list of string

I have a list of strings where those strings have an index in each string and I need to extract the index from that string and put it in a List<int>.
Here's is a list example:
List<string> values = new List<string>();
values.Add("cohabitantGender");
values.Add("additionalDriver0LastName");
values.Add("additionalDriver0AgeWhenLicensed");
values.Add("vehicle0City");
values.Add("vehicle1City");
values.Add("vehicle2City");
values.Add("vehicle3City");
from this list I need to extract the indexes from the values vehicleXCity.
I have this code right now:
public static List<int> FormObjectIndexExtractor(List<string> values, string prefix, string suffix)
{
var selectedMatches = values.Where(v => v.StartsWith(prefix) && v.EndsWith(suffix)).Select(v=> v).ToList();
var indexes = new List<int>();
foreach (var v in selectedMatches) indexes.Add(int.Parse(Regex.Match(v, #"\d+").Value));
return indexes;
}
And I'm using it like this:
List<int> indexes = FormObjectIndexExtractor(values, "vehicle", "City");
But if I have a value like vehicle4AnotherCity the code will work in the wrong way.
Does anyone have some alternative to this code that may help?
Below is an extension helper class incase you have a more robust list and you require multiple exclusion options
public class INumberList : List<string>
{
public List<int> GetNumberList()
{
List<int> numberList = new List<int>();
for (int i = 0; i < this.Count; i++)
{
numberList.Add(GetIntFromString(this[i]));
}
return numberList;
}
public INumberList ExcludeIndex(string prefix, string suffix)
{
for (int i = 0; i < this.Count; i++)
{
if (this[i].StartsWith(prefix) && this[i].EndsWith(suffix))
{
//remove non needed indexes
this.RemoveAt(i);
}
}
return this;
}
public static int GetIntFromString(String input)
{
// Replace everything that is no a digit.
String inputCleaned = Regex.Replace(input, "[^0-9]", "");
int value = 0;
// Tries to parse the int, returns false on failure.
if (int.TryParse(inputCleaned, out value))
{
// The result from parsing can be safely returned.
return value;
}
return 0; // Or any other default value.
}
}
Then use like this:
INumberList values = new INumberList();
values.Add("cohabitantGender");
values.Add("additionalDriver0LastName");
values.Add("additionalDriver0AgeWhenLicensed");
values.Add("vehicle0City");
values.Add("vehicle1City");
values.Add("vehicle2City");
values.Add("vehicle3City");
//Get filtered index list with multiple exclusion option
List<int> indexList = values.ExcludeIndex("cohabitantGender","")
.ExcludeIndex("additionalDriver","AgeWhenLicensed")
.GetNumberList();
//will return [0,0,1,2,3]
Try this:
public static List<int> FormObjectIndexExtractor(List<string> values, string prefix, string suffix)
{
var s = "^" + prefix + #"(\d+)\.*?" + suffix + "$";
return values
.Where(v => Regex.Match(v, s).Success)
.Select(v=> int.Parse(Regex.Match(v, s).Groups[1].Value))
.ToList();
}
Here is a solution (using modern C# features) that does not use Regex:
public static List<int> FormObjectIndexExtractor(IEnumerable<string> values, string prefix, string suffix)
{
int? TryParseItem(string val)
{
if (val.Length <= prefix.Length + suffix.Length || !val.StartsWith(prefix) || !val.EndsWith(suffix))
return null;
var subStr = val.Substring(prefix.Length, val.Length - prefix.Length - suffix.Length);
if (int.TryParse(subStr, out var number))
return number;
return null;
}
return values.Select(TryParseItem).Where(v => v.HasValue).Select(v => v.Value).ToList();
}
This version splits all the parts of string.
public static List<int> FormObjectIndexExtractor(List<string> values, string prefix, string suffix)
{
List<int> ret = new List<int>();
Regex r = new Regex("^([a-zA-Z]+)(\\d+)([a-zA-Z]+)$");
foreach (var s in values)
{
var match = r.Match(s);
if (match.Success)
{
if (match.Groups[1].ToString() == prefix && match.Groups[3].ToString() == suffix)
{
ret.Add(int.Parse(match.Groups[2].ToString()));
}
}
}
return ret;
}
or alternatively:
public static List<int> FormObjectIndexExtractor(List<string> values, string prefix, string suffix)
{
List<int> ret = new List<int>();
Regex r = new Regex($"^{prefix}(\d+){suffix}$");
foreach (var s in values)
{
var match = r.Match(s);
if (match.Success)
{
ret.Add(int.Parse(match.Groups[1].ToString()));
}
}
return ret;
}
This is the more generic version.
Regex match:
starts with 'vehicle'
matches a number
ends with 'city'.
Parse and return as List<int>
var indexes = values.Where(a => Regex.IsMatch(a, #"^vehicle\d+City$")).
Select(k => int.Parse(Regex.Match(k, #"\d+").Value)).ToList();

Get a special, signed part of a string [duplicate]

I'm trying to develop a method that will match all strings between two strings:
I've tried this but it returns only the first match:
string ExtractString(string s, string start,string end)
{
// You should check for errors in real-world code, omitted for brevity
int startIndex = s.IndexOf(start) + start.Length;
int endIndex = s.IndexOf(end, startIndex);
return s.Substring(startIndex, endIndex - startIndex);
}
Let's suppose we have this string
String Text = "A1FIRSTSTRINGA2A1SECONDSTRINGA2akslakhflkshdflhksdfA1THIRDSTRINGA2"
I would like a c# function doing the following :
public List<string> ExtractFromString(String Text,String Start, String End)
{
List<string> Matched = new List<string>();
.
.
.
return Matched;
}
// Example of use
ExtractFromString("A1FIRSTSTRINGA2A1SECONDSTRINGA2akslakhflkshdflhksdfA1THIRDSTRINGA2","A1","A2")
// Will return :
// FIRSTSTRING
// SECONDSTRING
// THIRDSTRING
Thank you for your help !
private static List<string> ExtractFromBody(string body, string start, string end)
{
List<string> matched = new List<string>();
int indexStart = 0;
int indexEnd = 0;
bool exit = false;
while (!exit)
{
indexStart = body.IndexOf(start);
if (indexStart != -1)
{
indexEnd = indexStart + body.Substring(indexStart).IndexOf(end);
matched.Add(body.Substring(indexStart + start.Length, indexEnd - indexStart - start.Length));
body = body.Substring(indexEnd + end.Length);
}
else
{
exit = true;
}
}
return matched;
}
Here is a solution using RegEx. Don't forget to include the following using statement.
using System.Text.RegularExpressions
It will correctly return only text between the start and end strings given.
Will not be returned:
akslakhflkshdflhksdf
Will be returned:
FIRSTSTRING
SECONDSTRING
THIRDSTRING
It uses the regular expression pattern [start string].+?[end string]
The start and end strings are escaped in case they contain regular expression special characters.
private static List<string> ExtractFromString(string source, string start, string end)
{
var results = new List<string>();
string pattern = string.Format(
"{0}({1}){2}",
Regex.Escape(start),
".+?",
Regex.Escape(end));
foreach (Match m in Regex.Matches(source, pattern))
{
results.Add(m.Groups[1].Value);
}
return results;
}
You could make that into an extension method of String like this:
public static class StringExtensionMethods
{
public static List<string> EverythingBetween(this string source, string start, string end)
{
var results = new List<string>();
string pattern = string.Format(
"{0}({1}){2}",
Regex.Escape(start),
".+?",
Regex.Escape(end));
foreach (Match m in Regex.Matches(source, pattern))
{
results.Add(m.Groups[1].Value);
}
return results;
}
}
Usage:
string source = "A1FIRSTSTRINGA2A1SECONDSTRINGA2akslakhflkshdflhksdfA1THIRDSTRINGA2";
string start = "A1";
string end = "A2";
List<string> results = source.EverythingBetween(start, end);
text.Split(new[] {"A1", "A2"}, StringSplitOptions.RemoveEmptyEntries);
You can split the string into an array using the start identifier in following code:
String str = "A1FIRSTSTRINGA2A1SECONDSTRINGA2akslakhflkshdflhksdfA1THIRDSTRINGA2";
String[] arr = str.Split("A1");
Then iterate through your array and remove the last 2 characters of each string (to remove the A2). You'll also need to discard the first array element as it will be empty assuming the string starts with A1.
Code is untested, currently on a mobile
This is a generic solution, and I believe more readable code. Not tested, so beware.
public static IEnumerable<IList<T>> SplitBy<T>(this IEnumerable<T> source,
Func<T, bool> startPredicate,
Func<T, bool> endPredicate,
bool includeDelimiter)
{
var l = new List<T>();
foreach (var s in source)
{
if (startPredicate(s))
{
if (l.Any())
{
l = new List<T>();
}
l.Add(s);
}
else if (l.Any())
{
l.Add(s);
}
if (endPredicate(s))
{
if (includeDelimiter)
yield return l;
else
yield return l.GetRange(1, l.Count - 2);
l = new List<T>();
}
}
}
In your case you can call,
var text = "A1FIRSTSTRINGA2A1SECONDSTRINGA2akslakhflkshdflhksdfA1THIRDSTRINGA2";
var splits = text.SplitBy(x => x == "A1", x => x == "A2", false);
This is not the most efficient when you do not want the delimiter to be included (like your case) in result but efficient for opposite cases. To speed up your case one can directly call the GetEnumerator and make use of MoveNext.

String to char to strings

I want to make a program which takes a string you entered and turn it to different string, so if for example I input "Hello World" every char will turn into a string and the console will output something like "Alpha Beta Gamma Gamma Zeta Foxtrot Dona Rama Lana Zema" - making every char a word.
I tried doing it like this :
static string WordMap(string value)
{
char[] buffer = value.ToCharArray();
for (int i = 0; i < buffer.Length; i++)
{
if (letter = "a")
{
letter = ("Alpha");
}
//and so on
buffer[i] = letter;
}
return new string(buffer);
}
but I just can't get it to work.
can anyone give me a tip or point me at the right direction?
What you need is a Dictionary<char,string>
var words = new Dictionary<char, string>();
words.Add('a', "Alpha");
words.Add('b',"Beta");
...
string input = Console.ReadLine();
string[] contents = new string[input.Length];
for (int i = 0; i < input.Length; i++)
{
if (words.ContainsKey(input[i]))
{
contents[i] = words[input[i]];
}
}
string result = string.Join(" ", contents);
Or LINQ way:
var result = string.Join(" ", input.Where(words.ContainsKey).Select(c => words[c]));
First off, the buffer is a char array. Arrays have a fixed size and to expand them you need to create a new one. To overcome this cumbersome work, there is a StringBuilder class that does this automatically.
Secondly, if you keep these 'Alpha', 'Beta', ... strings in if statements you will have a very long piece of code. You can replace this by using a dictionary, or create it from a single string or text file.
To put this into practice:
class MyClass
{
static Dictionary<char, string> _map = new Dictionary<char, string>();
static MyClass()
{
_map.Add('a', "Alpha");
_map.Add('b', "Beta");
// etc
}
static string WordMap(string data)
{
var output = new StringBuilder();
foreach (char c in data)
{
if (_map.ContainsKey(c))
{
output.Append(_map[c]);
output.Append(' ');
}
}
return output.ToString();
}
}
Solution without a dictionary:
static string WordMap(string data)
{
const string words = "Alpha Beta Gamma Delta ...";
string[] wordMap = words.Split(' ');
var output = new StringBuilder();
foreach (char c in data)
{
int index = c - 'a';
if (index >= 0 && index < wordMap.Length)
{
output.Append(wordMap[index]);
output.Append(' ');
}
}
return output.ToString();
}
With LINQ and String.Join it's short and readable. Since you want to have a new word for special chards you need a word-map. I would use a Dictionary<char, string>:
static Dictionary<char, string> wordMap = new Dictionary<char, string>()
{
{'a', "Alpha"}, {'b', "Beta"},{'c', "Gamma"}, {'d', "Delta"}
};
static string WordMap(string value)
{
var strings = value
.Select(c =>
{
string word;
if(!wordMap.TryGetValue(c, out word))
word = c.ToString();
return word;
});
return string.Join("", strings);
}
Test:
string newString = WordMap("abcdeghj"); // AlphaBetaGammaDeltaeghj
Tips:
You don't have to create character array from string, you can easily access single characters in string by indexer:
char some = "123"[2];
When you use "" then you create string not char therefore you should use '' to create character for comparison:
if (some == 'a') Console.WriteLine("character is a, see how you compare chars!!!");
A good solution ...
string[] words = { "Alpha", "Beta", "C_word", "D_Word" }; // ....
string myPhrase = "aBC";
myPhrase.Replace(" ", string.Empty).ToList().ForEach(a =>
{
int asciiCode = (int)a;
/// ToUpper()
int index = asciiCode >= 97 ? asciiCode - 32 : asciiCode;
Console.WriteLine(words[index - 65]); // ascii --> 65-a , 66-b ...
});
One more variation of answer which contains not found option.
static Dictionary<char, string> Mapping =
new Dictionary<char, string>()
{ { 'a', "alpha" }, { 'b', "beta" }, { 'c', "gamma" }, { 'd', "zeta" } };
static void Main(string[] args)
{
string test = "abcx";
Console.WriteLine(string.Join(" ", test.Select(t => GetMapping(t))));
//output alpha beta gamma not found
Console.ReadKey();
}
static string GetMapping(char key)
{
if (Mapping.ContainsKey(key))
return Mapping.First(a => a.Key == key).Value;
else
return "not found";
}
Just declare a second variable where you build up your result.
And I think you have some syntax problems, you need to have "==" in a
condition, otherwise it's an assignment.
static string WordMap(string value)
{
string result = string.Empty;
char[] buffer = value.ToCharArray();
for (int i = 0; i < buffer.Length; i++)
{
if (letter == "a")
{
result += ("Alpha");
}
//and so on
}
return result;
}
But I would only do this that way, if this is "just for fun" code,
as it will not be very fast.
Building up the result like I did is slow, a better way would be
result = string.Concat(result, "(Alpha)");
And an even faster way is using a StringBuilder (s. documentation for that),
which offers you fast and convenient methods to deal with bigger strings.
Only downfall here is, that you need to know a little bit, how big the result
will be in characters, as you need to provide a starting dimension.
And here you should not start with simply 1, or 100. Each time, when the StringBuilder
is full, it creates a new bigger instance and copies the values, so multiple instances
of that will fill your memory, which can cause an out of memory exception,
when dealing with some ten thousands of characters.
But as said, for just for fun code, all of that does not matter...
And of course, you need to be aware, that if you do it like that, your result will
be in one straight line, no breaks. If you want line breaks add "\n" at the end of
the strings. Or add anything elese you need.
Regards,
Markus

Extract all strings between two strings

I'm trying to develop a method that will match all strings between two strings:
I've tried this but it returns only the first match:
string ExtractString(string s, string start,string end)
{
// You should check for errors in real-world code, omitted for brevity
int startIndex = s.IndexOf(start) + start.Length;
int endIndex = s.IndexOf(end, startIndex);
return s.Substring(startIndex, endIndex - startIndex);
}
Let's suppose we have this string
String Text = "A1FIRSTSTRINGA2A1SECONDSTRINGA2akslakhflkshdflhksdfA1THIRDSTRINGA2"
I would like a c# function doing the following :
public List<string> ExtractFromString(String Text,String Start, String End)
{
List<string> Matched = new List<string>();
.
.
.
return Matched;
}
// Example of use
ExtractFromString("A1FIRSTSTRINGA2A1SECONDSTRINGA2akslakhflkshdflhksdfA1THIRDSTRINGA2","A1","A2")
// Will return :
// FIRSTSTRING
// SECONDSTRING
// THIRDSTRING
Thank you for your help !
private static List<string> ExtractFromBody(string body, string start, string end)
{
List<string> matched = new List<string>();
int indexStart = 0;
int indexEnd = 0;
bool exit = false;
while (!exit)
{
indexStart = body.IndexOf(start);
if (indexStart != -1)
{
indexEnd = indexStart + body.Substring(indexStart).IndexOf(end);
matched.Add(body.Substring(indexStart + start.Length, indexEnd - indexStart - start.Length));
body = body.Substring(indexEnd + end.Length);
}
else
{
exit = true;
}
}
return matched;
}
Here is a solution using RegEx. Don't forget to include the following using statement.
using System.Text.RegularExpressions
It will correctly return only text between the start and end strings given.
Will not be returned:
akslakhflkshdflhksdf
Will be returned:
FIRSTSTRING
SECONDSTRING
THIRDSTRING
It uses the regular expression pattern [start string].+?[end string]
The start and end strings are escaped in case they contain regular expression special characters.
private static List<string> ExtractFromString(string source, string start, string end)
{
var results = new List<string>();
string pattern = string.Format(
"{0}({1}){2}",
Regex.Escape(start),
".+?",
Regex.Escape(end));
foreach (Match m in Regex.Matches(source, pattern))
{
results.Add(m.Groups[1].Value);
}
return results;
}
You could make that into an extension method of String like this:
public static class StringExtensionMethods
{
public static List<string> EverythingBetween(this string source, string start, string end)
{
var results = new List<string>();
string pattern = string.Format(
"{0}({1}){2}",
Regex.Escape(start),
".+?",
Regex.Escape(end));
foreach (Match m in Regex.Matches(source, pattern))
{
results.Add(m.Groups[1].Value);
}
return results;
}
}
Usage:
string source = "A1FIRSTSTRINGA2A1SECONDSTRINGA2akslakhflkshdflhksdfA1THIRDSTRINGA2";
string start = "A1";
string end = "A2";
List<string> results = source.EverythingBetween(start, end);
text.Split(new[] {"A1", "A2"}, StringSplitOptions.RemoveEmptyEntries);
You can split the string into an array using the start identifier in following code:
String str = "A1FIRSTSTRINGA2A1SECONDSTRINGA2akslakhflkshdflhksdfA1THIRDSTRINGA2";
String[] arr = str.Split("A1");
Then iterate through your array and remove the last 2 characters of each string (to remove the A2). You'll also need to discard the first array element as it will be empty assuming the string starts with A1.
Code is untested, currently on a mobile
This is a generic solution, and I believe more readable code. Not tested, so beware.
public static IEnumerable<IList<T>> SplitBy<T>(this IEnumerable<T> source,
Func<T, bool> startPredicate,
Func<T, bool> endPredicate,
bool includeDelimiter)
{
var l = new List<T>();
foreach (var s in source)
{
if (startPredicate(s))
{
if (l.Any())
{
l = new List<T>();
}
l.Add(s);
}
else if (l.Any())
{
l.Add(s);
}
if (endPredicate(s))
{
if (includeDelimiter)
yield return l;
else
yield return l.GetRange(1, l.Count - 2);
l = new List<T>();
}
}
}
In your case you can call,
var text = "A1FIRSTSTRINGA2A1SECONDSTRINGA2akslakhflkshdflhksdfA1THIRDSTRINGA2";
var splits = text.SplitBy(x => x == "A1", x => x == "A2", false);
This is not the most efficient when you do not want the delimiter to be included (like your case) in result but efficient for opposite cases. To speed up your case one can directly call the GetEnumerator and make use of MoveNext.

Need some help on a Regex match / replace pattern

My ultimate goal here is to turn the following string into JSON, but I would settle for something that gets me one step closer by combining the fieldname with each of the values.
Sample Data:
Field1:abc;def;Field2:asd;fgh;
Using Regex.Replace(), I need it to at least look like this:
Field1:abc,Field1:def,Field2:asd,Field2:fgh
Ultimately, this result would be awesome if it can be done via Regex in a single call.
{"Field1":"abc","Field2":"asd"},{"Field1":"def","Field2":"fgh"}
I've tried many different variations of this pattern, but can't seem to get it right:
(?:(\w+):)*?(?:([^:;]+);)
Only one other example I could find that is doing something similar, but just enough differences that I can't quite put my finger on it.
Regex to repeat a capture across a CDL?
EDIT:
Here's my solution. I'm not going to post it as a "Solution" because I want to give credit to one that was posted by others. In the end, I took a piece from each of the posted solutions and came up with this one. Thanks to everyone who posted. I gave credit to the solution that compiled, executed fastest and had the most accurate results.
string hbi = "Field1:aaa;bbb;ccc;ddd;Field2:111;222;333;444;";
Regex re = new Regex(#"(\w+):(?:([^:;]+);)+");
MatchCollection matches = re.Matches(hbi);
SortedDictionary<string, string> dict = new SortedDictionary<string, string>();
for (int x = 0; x < matches.Count; x++)
{
Match match = matches[x];
string property = match.Groups[1].Value;
for (int i = 0; i < match.Groups[2].Captures.Count; i++)
{
string key = i.ToString() + x.ToString();
dict.Add(key, string.Format("\"{0}\":\"{1}\"", property, match.Groups[2].Captures[i].Value));
}
}
Console.WriteLine(string.Join(",", dict.Values));
Now you have two problems
I don't think regular expressions will be the best way to handle this. You should probably start by splitting on semicolons, then loop through the results looking for a value that starts with "Field1:" or "Field2:" and collect the results into a Dictionary.
Treat this as pseudo code because I have not compiled or tested it:
string[] data = input.Split(';');
dictionary<string, string> map = new dictionary<string, string>();
string currentKey = null;
foreach (string value in data)
{
// This part should change depending on how the fields are defined.
// If it's a fixed set you could have an array of fields to search,
// or you might need to use a regular expression.
if (value.IndexOf("Field1:") == 0 || value.IndexOf("Field2:"))
{
string currentKey = value.Substring(0, value.IndexOf(":"));
value = value.Substring(currentKey.Length+1);
}
map[currentKey] = value;
}
// convert map to json
I had an idea that it should be possible to do this in a shorter and more clear way. It ended up not being all that much shorter and you can question if it's more clear. At least it's another way to solve the problem.
var str = "Field1:abc;def;Field2:asd;fgh";
var rows = new List<Dictionary<string, string>>();
int index = 0;
string value;
string fieldname = "";
foreach (var s in str.Split(';'))
{
if (s.Contains(":"))
{
index = 0;
var tmp = s.Split(':');
fieldname = tmp[0];
value = tmp[1];
}
else
{
value = s;
index++;
}
if (rows.Count < (index + 1))
rows.Insert(index, new Dictionary<string, string>());
rows[index][fieldname] = value;
}
var arr = rows.Select(dict =>
String.Join("," , dict.Select(kv =>
String.Format("\"{0}\":\"{1}\"", kv.Key, kv.Value))))
.Select(r => "{" + r + "}");
var json = String.Join(",", arr );
Debug.WriteLine(json);
Outputs:
{"Field1":"abc","Field2":"asd"},{"Field1":"def","Field2":"fgh"}
I would go with RegEx as the simplest and most straightforward way to parse the strings, but I'm sorry, pal, I couldn't come up with a clever-enough replacement string to do this in one shot.
I hacked it out for fun through, and the monstrosity below accomplishes what you need, albeit hideously. :-/
Regex r = new Regex(#"(?<FieldName>\w+:)*(?:(?<Value>(?:[^:;]+);)+)");
var matches = r.Matches("Field1:abc;def;Field2:asd;fgh;moo;"); // Modified to test "uneven" data as well.
var tuples = new[] { new { FieldName = "", Value = "", Index = 0 } }.ToList(); tuples.Clear();
foreach (Match match in matches)
{
var matchGroups = match.Groups;
var fieldName = matchGroups[1].Captures[0].Value;
int index = 0;
foreach (Capture cap in matchGroups[2].Captures)
{
var tuple = new { FieldName = fieldName, Value = cap.Value, Index = index };
tuples.Add(tuple);
index++;
}
}
var maxIndex = tuples.Max(tup => tup.Index);
var jsonItemList = new List<string>();
for (int a = 0; a < maxIndex+1; a++)
{
var jsonBuilder = new StringBuilder();
jsonBuilder.Append("{");
foreach (var tuple in tuples.Where(tup => tup.Index == a))
{
jsonBuilder.Append(string.Format("\"{0}\":\"{1}\",", tuple.FieldName, tuple.Value));
}
jsonBuilder.Remove(jsonBuilder.Length - 1, 1); // trim last comma.
jsonBuilder.Append("}");
jsonItemList.Add(jsonBuilder.ToString());
}
foreach (var item in jsonItemList)
{
// Write your items to your document stream.
}

Categories

Resources