What is the best way to split the name/description of the value into a dictionary <string, string>?
string test = postedByUser='Jason, Bourne' postedById='48775' Text='Some text in here' postedDate='2020-04-21'
so ideally i want
dictionary key = postedByUser, value = Jason, Bourne
dictionary key = postedById, value = 48775
etc
code added so far
string test = #"postedByUser=Jason, Bourne' postedById='48775' Text='Some text in here' postedDate='2020-04-21'";
Dictionary<string, string> dict = new Dictionary<string, string>();
List<string> lst = test.Split('=').ToList();
foreach(string item in lst)
{
// cant figure out how edit the orginal string to remove the item that has
//been split by the '='
}
Well you can the following code will solve your problem but I suggest you to add more exception handling to make the code robust.
string test = #"postedByUser='Jason, Bourne' postedById='48775' Text='Some text in here' postedDate='2020-04-21'";
Dictionary<string, string> dict = new Dictionary<string, string>();
List<string> keyvalues = test.Split("' ").ToList();
foreach(var keyvalue in keyvalues)
{
var splitKeyValue = keyvalue.Split('=');
dict.Add(splitKeyValue[0], splitKeyValue[1]);
}
EDIT:
For .NET Framework 4.6,
List<string> keyvalues = test.Split(new string[] { "' " }, StringSplitOptions.None).ToList();
Try this one (quite ugly but should work):
var dict = test.Split("' ").Select(t=>string.Concat(t,"'").Split("=")).ToDictionary(t=>t[0],t=>t[1]);
Use String.Split(..) to separate out your string for processing.
String.Split documentation
First split the string based on a ' ' space character as a separator e.g.
var splitStrings = test.Split(' ', StringSplitOptions.None);
then run through each string in strings and separate using '='. This will give you a list of 2 strings where the first is your key and the second is your value.
Regular expressions can be used to solve this problem:
string text = #"postedByUser='Jason, Bourne' postedById='48775' Text='Some text in here' postedDate='2020-04-21'";
Dictionary<string, string> result = new Dictionary<string, string>();
foreach (Match m in Regex.Matches(text, #"(\w+)=\'(.+?)\'"))
{
result.Add(m.Groups[1].Value, m.Groups[2].Value);
}
Here is complete sample.
Related
I am parsing a template file which will contain certain keys that I need to map values to. Take a line from the file for example:
Field InspectionStationID 3 {"PVA TePla #WSM#", "sw#data.tool_context.TOOL_SOFTWARE_VERSION#", "#data.context.TOOL_ENTITY#"}
I need to replace the string within the # symbols with values from a dictionary.
So there can be multiple keys from the dictionary. However, not all strings inside the # are in the dictionary so for those, I will have to replace them with empty string.
I cant seem to find a way to do this. And yes I have looked at this solution:
check if string contains dictionary Key -> remove key and add value
For now what I have is this (where I read from the template file line by line and then write to a different file):
string line = string.Empty;
var dict = new Dictionary<string, string>() {
{ "data.tool_context.TOOL_SOFTWARE_VERSION", "sw0.2.002" },
{"data.context.TOOL_ENTITY", "WSM102" }
};
StringBuilder inputText = new StringBuilder();
StreamWriter writeKlarf = new StreamWriter(klarfOutputNameActual);
using (StreamReader sr = new StreamReader(WSMTemplatePath))
{
while((line = sr.ReadLine()) != null)
{
//Console.WriteLine(line);
if (line.Contains("#"))
{
}
else
{
writeKlarf.WriteLine(line)
}
}
}
writeKlarf.Close();
THe idea is that for each line, replace the string within the # and the # with match values from the dictionary if the #string# is inside the dictionary. How can I do this?
Sample Output Given the line above:
Field InspectionStationID 3 {"PVA TePla", "sw0.2.002", "WSM102"}
Here because #WSM# is not the dictionary, it is replaced with empty string
One more thing, this logic only applies to the first qurter of the file. The rest of the file will have other data that will need to be entered via another logic so I am not sure if it makes sense to read the whole file in into memory just for the header section?
Here's a quick example that I wrote for you, hopefully this is what you're asking for.
This will let you have a <string, string> Dictionary, check for the Key inside of a delimiter, and if the text inside of the delimiter matches the Dictionary key, it will replace the text. It won't edit any of the inputted strings that don't have any matches.
If you want to delete the unmatched value instead of leaving it alone, replace the kvp.Value in the line.Replace() with String.Empty
var dict = new Dictionary<string, string>() {
{ "test", "cool test" }
};
string line = "#test# is now replaced.";
foreach (var kvp in dict)
{
string split = line.Split('#')[1];
if (split == kvp.Key)
{
line = line.Replace($"#{split}#", kvp.Value);
}
Console.WriteLine(line);
}
Console.ReadLine();
If you had a list of tuple that were the find and replace, you can read the file, replace each, and then rewrite the file
var frs = new List<(string F, string R)>(){
("#data.tool_context.TOOL_SOFTWARE_VERSION#", "sw0.2.002"),
("#otherfield#", "replacement here")
};
var i = File.ReadAllText("path");
frs.ForEach(fr => i = i.Replace(fr.F,fr.R));
File.WriteAllText("path2", i);
The choice to use a list vs dictionary is fairly arbitrary; List has a ForEach method but it could just as easily be a foreach loop on a dictionary. I included the ## in the find string because I got the impression the output is not supposed to contain ##..
This version leaves alone any template parameters that aren't available
You can try matching #...# keys with a help of regular expressions:
using System.IO;
using System.Linq;
using System.Text.RegularExpressions;
...
static string MyReplace(string value, IDictionary<string, string> subs) => Regex
.Replace(value, "#[^#]*#", match => subs.TryGetValue(
match.Value.Substring(1, match.Value.Length - 2), out var item) ? item : "");
then you can apply it to the file: we read file's lines, process them with a help of Linq and write them into another file.
var dict = new Dictionary<string, string>() {
{"data.tool_context.TOOL_SOFTWARE_VERSION", "sw0.2.002" },
{"data.context.TOOL_ENTITY", "WSM102" },
};
File.WriteAllLines(klarfOutputNameActual, File
.ReadLines(WSMTemplatePath)
.Select(line => MyReplace(line, dict)));
Edit: If you want to switch off MyReplace from some line on
bool doReplace = true;
File.WriteAllLines(klarfOutputNameActual, File
.ReadLines(WSMTemplatePath)
.Select(line => {
//TODO: having line check if we want to keep replacing
if (!doReplace || SomeCondition(line)) {
doReplace = false;
return line;
}
return MyReplace(line, dict)
}));
Here SomeCondition(line) returns true whenever header ends and we should not replace #..# any more.
This question already has answers here:
C# String replace with dictionary
(8 answers)
Closed 3 years ago.
I have a lstSubs List<KeyValuePair<string, string>
which contain value
FNAME, "ABC"
LNAME ,"XYZ"
VAR001, "VAR002"
VAR002 , "ActualValueforVAR001"
VAR003, "VAR004"
VAR004 , "VAR005"
VAR005, "ActualValueforVAR003"
I have a String like envelop "Hello [FNAME] [LNAME] you have created a request for [VAR001] which got assigned to [VAR003]"
var regex = new Regex(#"\[(.*?)\]");
var matches = regex.Matches(envelop.ToString());
foreach (Match match in matches)
{
columnValue = linq to get the value from the list based on key;
envelop.Replace(match.Value, columnValue);
}
in this, The straight Key,Value pairs are easy to get via Linq but I am getting tough time to fetch the complex values which are nested in terms of connected Key, Value.
is there any way in LINQ or have to go with a loop.
Expected Output : Hello ABC XYZ you have created a request for ActualValueforVAR001 which got assigned to ActualValueforVAR003
Thanks,
PS. The code is not complete. it's a part of entire code edited with an intention to make it concise to issue
Edited: some of my text was not visible due to formatting.
They Dictionary values are nested as I am creating them based on some conditions in which they got configured
First, let's turn initial List<T> into a Dictionary<K, V>:
List<KeyValuePair<string, string>> list = new List<KeyValuePair<string, string>>() {
new KeyValuePair<string, string>("FNAME", "ABC"),
new KeyValuePair<string, string>("LNAME", "XYZ"),
new KeyValuePair<string, string>("VAR001", "VAR002"),
new KeyValuePair<string, string>("VAR002", "ActualValueforVAR001"),
new KeyValuePair<string, string>("VAR003", "VAR004"),
new KeyValuePair<string, string>("VAR004", "VAR005"),
new KeyValuePair<string, string>("VAR005", "ActualValueforVAR003"),
};
Dictionary<string, string> dict = list.ToDictionary(
pair => pair.Key,
pair => pair.Value,
StringComparer.OrdinalIgnoreCase); // Comment out if should be case sensitive
// Some values can be nested
while (true) {
bool nestedFound = false;
foreach (var pair in dict.ToList()) {
if (dict.TryGetValue(pair.Value, out var newValue)) {
dict[pair.Key] = newValue;
nestedFound = true;
}
}
if (!nestedFound)
break;
}
Then for a given envelop
string envelop =
#"Hello [FNAME] [LNAME] you have created a request for [VAR001] which got assigned to [VAR003]";
you can put a simple Regex.Replace:
string result = Regex
.Replace(envelop,
#"\[[A-Za-z0-9]+\]",
m => dict.TryGetValue(m.Value.Trim('[', ']'), out var value) ? value : "???");
Console.Write(result);
Outcome:
Hello ABC XYZ you have created a request for ActualValueforVAR001 which got assigned to ActualValueforVAR003
I'm building a simple dictionary from a reg file (export from Windows Regedit). The .reg file contains a key in square brackets, followed by zero or more lines of text, followed by a blank line. This code will create the dictionary that I need:
var a = File.ReadLines("test.reg");
var dict = new Dictionary<String, List<String>>();
foreach (var key in a) {
if (key.StartsWith("[HKEY")) {
var iter = a.GetEnumerator();
var value = new List<String>();
do {
iter.MoveNext();
value.Add(iter.Current);
} while (String.IsNullOrWhiteSpace(iter.Current) == false);
dict.Add(key, value);
}
}
I feel like there is a cleaner (prettier?) way to do this in a single Linq statement (using a group by), but it's unclear to me how to implement the iteration of the value items into a list. I suspect I could do the same GetEnumerator in a let statement but it seems like there should be a way to implement this without resorting to an explicit iterator.
Sample data:
[HKEY_LOCAL_MACHINE\SOFTWARE\Classes\.msu]
#="Microsoft.System.Update.1"
[HKEY_LOCAL_MACHINE\SOFTWARE\Classes\.MTS]
#="WMP11.AssocFile.M2TS"
"Content Type"="video/vnd.dlna.mpeg-tts"
"PerceivedType"="video"
[HKEY_LOCAL_MACHINE\SOFTWARE\Classes\.MTS\OpenWithProgIds]
"WMP11.AssocFile.M2TS"=hex(0):
[HKEY_LOCAL_MACHINE\SOFTWARE\Classes\.MTS\ShellEx]
[HKEY_LOCAL_MACHINE\SOFTWARE\Classes\.MTS\ShellEx\{BB2E617C-0920-11D1-9A0B-00C04FC2D6C1}]
#="{9DBD2C50-62AD-11D0-B806-00C04FD706EC}"
Update
I'm sorry I need to be more specific. The files am looking at around ~300MB so I took the approach I did to keep the memory footprint down. I'd prefer an approach that doesn't require pulling the entire file into memory.
You can always use Regex:
var dict = new Dictionary<String, List<String>>();
var a = File.ReadAllText(#"test.reg");
var results = Regex.Matches(a, "(\\[[^\\]]+\\])([^\\[]+)\r\n\r\n", RegexOptions.Singleline);
foreach (Match item in results)
{
dict.Add(
item.Groups[1].Value,
item.Groups[2].Value.Split(new[] { "\r\n" }, StringSplitOptions.RemoveEmptyEntries).ToList()
);
}
I whipped this out real quick. You might be able to improve the regex pattern.
Instead of using GetEnumerator you can take advantage of TakeWhile and Split methods to break your list into smaller list (each sublist represents one key and its values)
var registryLines = File.ReadLines("test.reg");
Dictionary<string, List<string>> resultKeys = new Dictionary<string, List<string>>();
while (registryLines.Count() > 0)
{
// Take the key and values into a single list
var keyValues = registryLines.TakeWhile(x => !String.IsNullOrWhiteSpace(x)).ToList();
// Adds a new entry to the dictionary using the first value as key and the rest of the list as value
if (keyValues != null && keyValues.Count > 0)
resultKeys.Add(keyValues[0], keyValues.Skip(1).ToList());
// Jumps to the next registry (+1 to skip the blank line)
registryLines = registryLines.Skip(keyValues.Count + 1);
}
EDIT based on your update
Update I'm sorry I need to be more specific. The files am looking at
around ~300MB so I took the approach I did to keep the memory
footprint down. I'd prefer an approach that doesn't require pulling
the entire file into memory.
Well, if you can't read the whole file into memory, it makes no sense to me asking for a LINQ solution. Here is a sample of how you can do it reading line by line (still no need for GetEnumerator)
Dictionary<string, List<string>> resultKeys = new Dictionary<string, List<string>>();
using (StreamReader reader = File.OpenText("test.reg"))
{
List<string> keyAndValues = new List<string>();
while (!reader.EndOfStream)
{
string line = reader.ReadLine();
// Adds key and values to a list until it finds a blank line
if (!string.IsNullOrWhiteSpace(line))
keyAndValues.Add(line);
else
{
// Adds a new entry to the dictionary using the first value as key and the rest of the list as value
if (keyAndValues != null && keyAndValues.Count > 0)
resultKeys.Add(keyAndValues[0], keyAndValues.Skip(1).ToList());
// Starts a new Key collection
keyAndValues = new List<string>();
}
}
}
I think you can use a code like this - if you can use memory -:
var lines = File.ReadAllText(fileName);
var result =
Regex.Matches(lines, #"\[(?<key>HKEY[^]]+)\]\s+(?<value>[^[]+)")
.OfType<Match>()
.ToDictionary(k => k.Groups["key"], v => v.Groups["value"].ToString().Trim('\n', '\r', ' '));
C# Demo
This will take 24.173 seconds for a file with more than 4 million lines - Size:~550MB - by using 1.2 GB memory.
Edit :
The best way is using File.ReadAllLines as it is lazy:
var lines = File.ReadAllLines(fileName);
var keyRegex = new Regex(#"\[(?<key>HKEY[^]]+)\]");
var currentKey = string.Empty;
var currentValue = string.Empty;
var result = new Dictionary<string, string>();
foreach (var line in lines)
{
var match = keyRegex.Match(line);
if (match.Length > 0)
{
if (!string.IsNullOrEmpty(currentKey))
{
result.Add(currentKey, currentValue);
currentValue = string.Empty;
}
currentKey = match.Groups["key"].ToString();
}
else
{
currentValue += line;
}
}
This will take 17093 milliseconds for a file with 795180 lines.
I am trying to find an efficient way to match the strings in this dictionary based on the rules stated in XML file.
I will try to explain the code from the beginning. There are two csv files.
File1.csv
RefID|Firstname|Lastname|ID|DOB
Ref_1|KEN|CARPENTER|67814|1122
Ref_2|TRAY|ROBINSON|67814|1122
Ref_3|TRAY|ROBINSON|67871|1122
Ref_4|TRAN|ROBINSON|67871|1122
Ref_5|LAWSN|PERDUE|6761|2009
Ref_6|MCKEN|BARNUM|6761|2009
Ref_7|MCKEN|BARNUM|6768|2009
Ref_8|MCKEN|BARNUM|6768|2009
Ref_9|TRAN|ROBINSON|67871|1122
File2.csv
SID|Values
TRAROB|Ref_1,Ref_2,Ref_3,Ref_4,Ref_9
MCKBAR|Ref_5,Ref_6,Ref_7,Ref_8
XML :
<?xml version="1.0" encoding="utf-8" ?>
<FeedInfo>
<Rule>
<RuleInfo>
<RuleName>Rule 1</RuleName>
</RuleInfo>
<Rules>
<item name ="FirstName" NoOfChars ="ALL" number ="0"/>
<item name ="LastName" NoOfChars ="ALL" number ="1"/>
<item name ="ID" NoOfChars ="ALL" number ="2" />
</Rules>
</Rule>
</FeedInfo>
I wrote the following code :
static void Main(string[] args)
{
populate();
rulesReader();
}
public static Dictionary<string,string> createDictionary(string dataPath)
{
//creates a dictionary from a file
StreamReader sr = new StreamReader(dataPath);
Dictionary<string, string> refIdVal = new Dictionary<string, string>();
string line = sr.ReadLine();
while ((line = sr.ReadLine()) != null)
{
string key = line.Split('|')[0];
int i = line.IndexOf('|',0) + 1;
int l = line.Length - i;
string value = line.Substring(i,l);
refIdVal.Add(key, value);
}
sr.Close();
return refIdVal;
}
public static Dictionary<string,string> populate()
{
//populates the dictionary with SID,RefID|values format.
string refIdPath = "File1.csv";
string sidPath = "File2.csv";
Dictionary<string, string> final = new Dictionary<string, string>();
Dictionary<string, string> refIdVal = createDictionary(refIdPath);
Dictionary<string, string> sidVal = createDictionary(sidPath);
foreach (KeyValuePair<string, string> pair in sidVal)
{
string[] refIdTockens = pair.Value.Split(',');
for (int i = 0; i <refIdTockens.Length; i++)
{
final.Add(pair.Key + "," + refIdTockens[i], refIdVal[refIdTockens[i]]);
//Console.WriteLine(pair.Key + "," + refIdTockens[i] + "==" + refIdVal[refIdTockens[i]]+ "==" + i);
}
}
foreach (KeyValuePair<string, string> pair in final)
{
Console.WriteLine(pair.Key + "==" + pair.Value);
}
return final;
}
public static Dictionary<string,string> finalOutput(Dictionary<string,string> inputDictionary)
{
Dictionary<string,string> input = inputDictionary;
foreach (KeyValuePair<string, string> pair in input)
{
}
return null;
}
public static Dictionary<String, List<int>> rulesReader()
{
//reads the rules from xml file and returns a dictionary in <string,list> format.
Dictionary<string, List<int>> rulesAndNumbers = new Dictionary<string, List<int>>();
XDocument xDoc = XDocument.Load("rules.xml");
int rulesCount = xDoc.Descendants("RuleName").Count();
string ruleName = null;
string ruleValue = null;
//List<string> ruleNumbers = new List<string>();
var feedDetails = from feed in xDoc.Descendants("Rule")
select new
{
IndexInfo = feed.Descendants("RuleInfo").Descendants(),
IndexRules = feed.Descendants("Rules").Descendants()
};
foreach (var feed in feedDetails)
{
foreach (XElement xe in feed.IndexInfo) //RuleName
{
List<int> ruleNumbers = new List<int>();
ruleName = xe.Value;
foreach (XElement xe1 in feed.IndexRules)
{
ruleValue = xe1.Attribute("number").Value;
ruleNumbers.Add(Int32.Parse(ruleValue));
Console.WriteLine(ruleName + "==" + ruleValue);
}
rulesAndNumbers.Add(ruleName, ruleNumbers);
//ruleNumbers.Clear();
}
}
return rulesAndNumbers;
}
the code above gives me a dictionary in this format:
SID,REFID == FirstName|LastName|ID|DOB ( KEY == VALUE )
SidRefID Dictionary
TRAROB,Ref_1==KEN|CARPENTER|67814|1122
TRAROB,Ref_2==TRAN|ROBINSON|67814|1122
TRAROB,Ref_3==TRAN|ROBINSON|67871|1122
TRAROB,Ref_4==TRAN|ROBINSON|67871|1122
MCKBAR,Ref_5==LAWSN|PERDUE|6761|2009
MCKBAR,Ref_6==MCKEN|BARNUM|6761|2009
MCKBAR,Ref_7==MCKEN|BARNUM|6768|2009
MCKBAR,Ref_8==MCKEN|BARNUM|6768|2009
TRAROB,Ref_9==TRAN|ROBINSON|67871|1122
and a dictionary like this XML Dictionary
[Rule1|0]
[Rule1|1]
[Rule1|2]
Now, after all this I am stuck here : I need to match all the values with the with the same partial KEY i.e. SID or Key.split(,)[0]. In the final dictionary, based on the numbers mentioned in the XML. The 0th,1st and 2nd position of array after splitting the values should be concatenated.
I have already created the XML Dictionary in string,List(int) format. So the Ref_1 should match with Ref_2,Ref3,Ref_4 based on (0,1,2) i.e concatenation of firstName,LastName,ID. Fir example:
Ref1,Ref_2,Ref3,Ref_4 all have same SID (SidRefId Dictionary)
so I need to match
KENCARPENTER67814 with TRAYROBINSON67814 & TRAYROBINSON67871 & TRAYROBINSON67871 & TRAYROBINSON67871 which will return FALSE for KENCARPENTER67814 because none of the string matches with each other, Similarly the desired output is:
RULE1,TRAROB,Ref_1==KEN|CARPENTER|67814|1122|FALSE
RULE1,TRAROB,Ref_2==TRAN|ROBINSON|67814|1122|FALSE
RULE1,TRAROB,Ref_3==TRAN|ROBINSON|67871|1122|TRUE
RULE1,TRAROB,Ref_4==TRAN|ROBINSON|67871|1122|TRUE
RULE1,MCKBAR,Ref_5==LAWSN|PERDUE|6761|2009|FALSE
RULE1,MCKBAR,Ref_6==MCKEN|BARNUM|6761|2009|FALSE
RULE1,MCKBAR,Ref_7==MCKEN|BARNUM|6768|2009|TRUE
RULE1,MCKBAR,Ref_8==MCKEN|BARNUM|6768|2009|TRUE
RULE1,TRAROB,Ref_9==TRAN|ROBINSON|67871|1122|TRUE
I thought of making a copy of the SidRefId dictionary and matching it with each other, but its gonna take lot of time for large files and multiple rules in the XML file, which i am going to deal with.
Can someone tell me an efficient way to do this? Thanks!
To me it looks like you're trying to develop your own engine for record linkage. That is, for finding duplicate records that are not exact duplicates. If I were you I wouldn't try to make my own engine, but instead just use one of the already existing ones.
Wikipedia used to have a list of such engines, but it got deleted and I don't know of any other lists, so I'll just link to the one I made: Duke. There are other engines as well.
If you insist on doing this yourself, one way to do it is what you're doing here: build a key for each record, then group by key. That's fairly primitive, though, so you should aim to do more detailed matching after you've matched by key. Just matching by key will cause many false positives.
A more sophisticated approach is to do what I did: index the data up with a search engine like Lucene, then search for similar records and do detailed comparison on the candidates. Or you could use locality-sensitive hashing. Or metric spaces. Or q-gram based indexes.
I have some class with lots of fields;
public class CrowdedHouse
{
public int value1;
public float value2;
public Guid value3;
public string Value4;
// some more fields below
}
My classmust be (de)serialized into simple Windows text file in the following format
NAME1=VALUE1
NAME2=VALUE2
What is the most convinient way to do that in .NET? This is a text file and all the values must be fist converted to string. Let's assume I have already converted all data to strings.
UPDATE One option would be pinvoke WritePrivateProfileString/WritePrivateProfileString
but these are using the required "[Section]" field that I don't need to use.
EDIT: If you have already converted each data value to strings, simply use the method below to serialize it after making a Dictionary of these values:
var dict = new Dictionary<string, string>
{
{ "value1", "value1value" },
{ "value2", "value2value" },
// etc
}
or use dict.Add(string key, string value).
To read the data, simply split each line around the = and store the results as a Dictionary<string, string>:
string[] lines = File.ReadAllLines("file.ext");
var dict = lines.Select(l => l.Split('=')).ToDictionary(a => a[0], a => a[1]);
To convert a dictionary to the file, use:
string[] lines = dict.Select(kvp => kvp.Key + "=" + kvp.Value).ToArray();
File.WriteAllLines(lines);
Note that your NAMEs and VALUEs cannot contain =.
Writing is easy:
// untested
using (var file = System.IO.File.CreateText("data.txt"))
{
foreach(var item in data)
file.WriteLine("{0}={1}", item.Key, item.Value);
}
And for reading it back:
// untested
using (var file = System.IO.File.OpenText("data.txt"))
{
string line;
while ((file.ReadLine()) != null)
{
string[] parts = line.Split('=');
string key = parts[0];
string value = parts[1];
// use it
}
}
But probably the best answer is : Use XML.
Minor improvement of Captain Comic answer:
To enable = in values: (will split only once)
var dict = lines.Select(l => l.Split(new[]{'='},2)).ToDictionary(a => a[0], a => a[1]);