Get data from csv to dictionary c# - c#

I need to get data from csv to dictionary but when i try to compile this code i recieve error "An item with the same key has already been added." How to do it ? `
Dictionary<string, string> dic = new Dictionary<string, string>();
public void AddToDic()
{
string line = "";
using (StreamReader sr = new StreamReader(#"words.txt"))
{
while (sr.Peek() != -1)
{
line = line + sr.ReadLine();
string[] splitted = line.Split(' ');
dic.Add(splitted[0], splitted[1]); //ERROR An item with the same key has already been added.
}
}
}
//text in words.txt is like: "car auto" newline "water voda" etc...

Since you don't show us the contents of the file you are trying to parse we can only guess. Here are my guesses (followed by a solution):
each line of the file contains two words
the first word should become the key of a dictionary
the file may contain the same key word multiple times
Since a dictionary requires unique keys and the file may contain the same key multiple times there can be multiple values for each key. So a better data structure might be: Dictionary<string, string[]>.
You can use File.ReadLines or File.ReadAllLines to read the lines of a file and then use LINQ to transform that into a dictionary:
Dictionary<string, string[]> result =
File.ReadLines("words.txt")
.Select(line => line.Split(' '))
.GroupBy(arr => arr[0])
.ToDictionary(gr => gr.Key,
gr => gr.Select(s => s[1]).ToArray());
Explanation: After reading a line it is splitted into a string[]. The result is grouped by the first word which will become the key for the dictionary. Each group is an IEnumerable<string[]> and only the second value from each array is selected into the result.
BTW: If you replace ReadLines with ReadAllLines the file will be read at once and then it will be closed before processing it. ReadLines reads the lines one by one and keeps the file open during processing it.

Try below checking:
if(!dic.ContainsKey(splitted[0])
dic.Add(splitted[0], splitted[1]);

Dictionaries keys must be unique
if(!dic.ContainsKey(splitted[0]))
dic.Add(splitted[0], splitted[1]); //ERROR An item with the same key
will stop the error from happening, but probably not the behavior you want. Think about how you want to handle duplicate keys (fail loading of the file, only store the key of the first one you see, only store the latest one you see, append a counter on the end of the key name if there's a collision)

Related

check if string contains dictionary keys and replace the matching subtring with Values from dictionary

I am parsing a template file which will contain certain keys that I need to map values to. Take a line from the file for example:
Field InspectionStationID 3 {"PVA TePla #WSM#", "sw#data.tool_context.TOOL_SOFTWARE_VERSION#", "#data.context.TOOL_ENTITY#"}
I need to replace the string within the # symbols with values from a dictionary.
So there can be multiple keys from the dictionary. However, not all strings inside the # are in the dictionary so for those, I will have to replace them with empty string.
I cant seem to find a way to do this. And yes I have looked at this solution:
check if string contains dictionary Key -> remove key and add value
For now what I have is this (where I read from the template file line by line and then write to a different file):
string line = string.Empty;
var dict = new Dictionary<string, string>() {
{ "data.tool_context.TOOL_SOFTWARE_VERSION", "sw0.2.002" },
{"data.context.TOOL_ENTITY", "WSM102" }
};
StringBuilder inputText = new StringBuilder();
StreamWriter writeKlarf = new StreamWriter(klarfOutputNameActual);
using (StreamReader sr = new StreamReader(WSMTemplatePath))
{
while((line = sr.ReadLine()) != null)
{
//Console.WriteLine(line);
if (line.Contains("#"))
{
}
else
{
writeKlarf.WriteLine(line)
}
}
}
writeKlarf.Close();
THe idea is that for each line, replace the string within the # and the # with match values from the dictionary if the #string# is inside the dictionary. How can I do this?
Sample Output Given the line above:
Field InspectionStationID 3 {"PVA TePla", "sw0.2.002", "WSM102"}
Here because #WSM# is not the dictionary, it is replaced with empty string
One more thing, this logic only applies to the first qurter of the file. The rest of the file will have other data that will need to be entered via another logic so I am not sure if it makes sense to read the whole file in into memory just for the header section?
Here's a quick example that I wrote for you, hopefully this is what you're asking for.
This will let you have a <string, string> Dictionary, check for the Key inside of a delimiter, and if the text inside of the delimiter matches the Dictionary key, it will replace the text. It won't edit any of the inputted strings that don't have any matches.
If you want to delete the unmatched value instead of leaving it alone, replace the kvp.Value in the line.Replace() with String.Empty
var dict = new Dictionary<string, string>() {
{ "test", "cool test" }
};
string line = "#test# is now replaced.";
foreach (var kvp in dict)
{
string split = line.Split('#')[1];
if (split == kvp.Key)
{
line = line.Replace($"#{split}#", kvp.Value);
}
Console.WriteLine(line);
}
Console.ReadLine();
If you had a list of tuple that were the find and replace, you can read the file, replace each, and then rewrite the file
var frs = new List<(string F, string R)>(){
("#data.tool_context.TOOL_SOFTWARE_VERSION#", "sw0.2.002"),
("#otherfield#", "replacement here")
};
var i = File.ReadAllText("path");
frs.ForEach(fr => i = i.Replace(fr.F,fr.R));
File.WriteAllText("path2", i);
The choice to use a list vs dictionary is fairly arbitrary; List has a ForEach method but it could just as easily be a foreach loop on a dictionary. I included the ## in the find string because I got the impression the output is not supposed to contain ##..
This version leaves alone any template parameters that aren't available
You can try matching #...# keys with a help of regular expressions:
using System.IO;
using System.Linq;
using System.Text.RegularExpressions;
...
static string MyReplace(string value, IDictionary<string, string> subs) => Regex
.Replace(value, "#[^#]*#", match => subs.TryGetValue(
match.Value.Substring(1, match.Value.Length - 2), out var item) ? item : "");
then you can apply it to the file: we read file's lines, process them with a help of Linq and write them into another file.
var dict = new Dictionary<string, string>() {
{"data.tool_context.TOOL_SOFTWARE_VERSION", "sw0.2.002" },
{"data.context.TOOL_ENTITY", "WSM102" },
};
File.WriteAllLines(klarfOutputNameActual, File
.ReadLines(WSMTemplatePath)
.Select(line => MyReplace(line, dict)));
Edit: If you want to switch off MyReplace from some line on
bool doReplace = true;
File.WriteAllLines(klarfOutputNameActual, File
.ReadLines(WSMTemplatePath)
.Select(line => {
//TODO: having line check if we want to keep replacing
if (!doReplace || SomeCondition(line)) {
doReplace = false;
return line;
}
return MyReplace(line, dict)
}));
Here SomeCondition(line) returns true whenever header ends and we should not replace #..# any more.

How do i create a c# dictionary that will read in key and values from .csv file

I have a .csv file with a list of abbreviations and their actual meaning e.g.
Laughing Out Loud, LOL
I need to be able to search for an abbreviation in a text box and replace the abbreviation with the actual words. This is what I have attempted so far to understand dictionaries but cannot work out how to read in values from the file.
Dictionary<string, string> Abbreviations = new Dictionary<string, string>();
Abbreviations.Add("Laughing Out Loud", "lol");
foreach (KeyValuePair<string, string> abbrev in Abbreviations)
{
txtinput.Text = txtinput + "<<" + abbrev.Key + ">>";
}
You can try this LINQ solution the GroupBy is to handle the case where a key is in a file multiple times.
Dictionary<string, string[]> result =
File.ReadLines("test.csv")
.Select(line => line.Split(','))
.GroupBy(arr => arr[0])
.ToDictionary(gr => gr.Key,
gr => gr.Select(s => s[1]).ToArray());
To check if the abbreviation in the TextBox exists in the Dictionary:
foreach (KeyValuePair<string, string[]> abbrev in result)
{
if (txtinput.Text == abbrev.Value)
{
txtinput.Text = txtinput + "<<" + abbrev.Key + ">>";
}
}
You can start by creating a Stream Reader for your file, then looping for all your values in the CSV and add them to the dictionary.
static void Main(string[] args)
{
var csv_reader = new StreamReader(File.OpenRead(#"your_file_path"));
//declare your dictionary somewhere outside the loop.
while (!csv_reader.EndOfStream)
{
//read the line and split if you need to with .split('')
var line = reader.ReadLine();
//Add to the dictionary here
}
//Call another method for your search and replace.
SearchAndReplace(your_input)
}
Then have the implementation of that method, search if the input exists in the dictionary and if it does replace it.
You could use LINQ to put the values of the csv into your dictionary, if that's easier for you.
I'm going to assume that your input file may have commas in the actual text, and not just separating the two fields.
Now, if that were the case, then the standard CSV file format for format the file like this:
Laughing Out Loud,LOL
"I Came, I Saw, I Conquered",ICISIC
However, from your example you have a space before the "LOL", so it doesn't appear that you're using standard CSV.
So I'll work on this input:
Laughing Out Loud, LOL
"I Came, I Saw, I Conquered",ICISIC
"to, too, or two", 2
because,B/C
For this input then this code produces a dictionary:
var dictionary =
(
from line in File.ReadAllLines("FILE.CSV")
let lastComma = line.LastIndexOf(',')
let abbreviation = line.Substring(lastComma + 1).Trim()
let actualRaw = line.Substring(0, lastComma).Trim()
let actual = actualRaw.StartsWith("\"") && actualRaw.EndsWith("\"")
? actualRaw.Substring(1, actualRaw.Length - 2)
: actualRaw
select new { abbreviation, actual }
).ToDictionary(x => x.abbreviation, x => x.actual);
You can go one better than this though. It's quite possible to create a "super function" that will do all of the replaces in one go for you.
Try this:
var translate =
(
from line in File.ReadAllLines("FILE.CSV")
let lastComma = line.LastIndexOf(',')
let abbreviation = line.Substring(lastComma + 1).Trim()
let actualRaw = line.Substring(0, lastComma).Trim()
let actual = actualRaw.StartsWith("\"") && actualRaw.EndsWith("\"")
? actualRaw.Substring(1, actualRaw.Length - 2)
: actualRaw
select (Func<string, string>)(x => x.Replace(abbreviation, actual))
).Aggregate((f1, f2) => x => f2(f1(x)));
Then I can do this:
Console.WriteLine(translate("It was me 2 B/C ICISIC, LOL!"));
I get this result:
It was me to, too, or two because I Came, I Saw, I Conquered, Laughing Out Loud!

Proof Reading .CSV per line

CSVHelper and FileHelper is not an option
I have a .csv export that I need to check for consistency structured like the below
Reference,Date,EntryID
ABC123,08/09/2015,123
ABD234,08/09/2015,124
XYZ987,07/09/2015,125
QWE456,08/09/2016,126
I can use ReadLine or RealAllLines and .Split which give me entire rows/columns BUT I have need to select each row and then go through each attribute (separated by ',') for format checking
I am running into problems here. I can not single out each value in a row for this check.
It is probably either something simple onto
class Program
{
static void Main(string[] args)
{
string csvFile = #"proof.csv";
string[] lines = File.ReadAllLines(csvFile);
var values = lines.Skip(1).Select(l => new { FirstRow = l.Split('\n').First(), Values = l.Split('\n').Select(v => int.Parse(v)) });
foreach (var value in values)
{
Console.WriteLine(string.Format("{0}", value.FirstRow));
}
}
}
Or I am going down the wrong path, my searches relate to pulling specific rows or columns (as opposed to checking the individual values associated)
The sample of the data above has a highlighted example: The date is next year and I would like to be able to proof that value (just an example as it could be in either column where errors appear)
I can not single out each value in a row
That's because you split on \n twice. The values within a row are separated by comma (,).
I'm not sure what all that LINQ is supposed to do, but it's as simple as this:
string[] lines = File.ReadAllLines(csvFile);
foreach (var line in lines.Skip(1))
{
var values = line.Split(',');
// access values[0], values[1] ...
}
Instead of reading it as text read it by OLEDB object, so data of CSV file will come in datatable and you do not need to spit it.
To Read the csv file you can use these objects of OLEDB
System.Data.OleDb.OleDbCommand
System.Data.OleDb.OleDbDataAdapter
System.Data.OleDb.OleDbConnection
and
System.Data.DataTable

read all lines in text file with separator

I have a file with this content :
1,2,3,4,5#
1,2,3,4,5#
How can i read all lines using readline ?the important thing is i need to separate the values in each line ,i mean the first line's values 1,2,3,4,5 should be separated .
Suppose i have an array named myarray that can save all values in first line :the array should be like this :
myarray[0]=1
myarray[1]=2
myarray[2]=3
myarray[3]=4
myarray[4]=5
I am so new in IO in c#
Best regards
Using LINQ you can do:
List<string[]> list = File.ReadLines("YourFile.txt")
.Select(r => r.TrimEnd('#'))
.Select(line => line.Split(','))
.ToList();
File.ReadLines would read the file line by line.
.Select(r => r.TrimEnd('#')) would remove the # from end of the
line
.Select(line => line.Split(',')) would split the line on comma and return an array of string items.
ToList() would give you a List<string[]> back.
You can also use TrimEnd and Split in a single Select statement like below, (it would result in the same output):
List<string[]> list = File.ReadLines("YourFile.txt")
.Select(r => r.TrimEnd('#').Split(','))
.ToList();
Try this
string[] readText = File.ReadAllLines(path);
That will return an array of all the lines.
https://msdn.microsoft.com/en-us/library/s2tte0y1(v=vs.110).aspx
You can use a StreamReader to read all the lines in from a file and split them with a given delimiter (,).
var filename = #"C:\data.txt";
using (var sr = new StreamReader(filename))
{
var contents = sr.ReadToEnd();
var myarray = contents.
Split(',');
}
Although I do prefer the LINQ approach answer further up.

C# Exception Handling continue on error

I have a basic C# console application that reads a text file (CSV format) line by line and puts the data into a HashTable. The first CSV item in the line is the key (id num) and the rest of the line is the value. However I've discovered that my import file has a few duplicate keys that it shouldn't have. When I try to import the file the application errors out because you can't have duplicate keys in a HashTable. I want my program to be able to handle this error though. When I run into a duplicate key I would like to put that key into a arraylist and continue importing the rest of the data into the hashtable. How can I do this in C#
Here is my code:
private static Hashtable importFile(Hashtable myHashtable, String myFileName)
{
StreamReader sr = new StreamReader(myFileName);
CSVReader csvReader = new CSVReader();
ArrayList tempArray = new ArrayList();
int count = 0;
while (!sr.EndOfStream)
{
String temp = sr.ReadLine();
if (temp.StartsWith(" "))
{
ServMissing.Add(temp);
}
else
{
tempArray = csvReader.CSVParser(temp);
Boolean first = true;
String key = "";
String value = "";
foreach (String x in tempArray)
{
if (first)
{
key = x;
first = false;
}
else
{
value += x + ",";
}
}
myHashtable.Add(key, value);
}
count++;
}
Console.WriteLine("Import Count: " + count);
return myHashtable;
}
if (myHashtable.ContainsKey(key))
duplicates.Add(key);
else
myHashtable.Add(key, value);
A better solution is to call ContainsKey to check if the key exist before adding it to the hash table instead. Throwing exception on this kind of error is a performance hit and doesn't improve the program flow.
ContainsKey has a constant O(1) overhead for every item, while catching an Exception incurs a performance hit on JUST the duplicate items.
In most situations, I'd say check for the key, but in this case, its better to catch the exception.
Here is a solution which avoids multiple hits in the secondary list with a small overhead to all insertions:
Dictionary<T, List<K>> dict = new Dictionary<T, List<K>>();
//Insert item
if (!dict.ContainsKey(key))
dict[key] = new List<string>();
dict[key].Add(value);
You can wrap the dictionary in a type that hides this or put it in a method or even extension method on dictionary.
If you have more than 4 (for example) CSV values, it might be worth setting the value variable to use a StringBuilder as well since the string concatenation is a slow function.
Hmm, 1.7 Million lines? I hesitate to offer this for that kind of load.
Here's one way to do this using LINQ.
CSVReader csvReader = new CSVReader();
List<string> source = new List<string>();
using(StreamReader sr = new StreamReader(myFileName))
{
while (!sr.EndOfStream)
{
source.Add(sr.ReadLine());
}
}
List<string> ServMissing =
source
.Where(s => s.StartsWith(" ")
.ToList();
//--------------------------------------------------
List<IGrouping<string, string>> groupedSource =
(
from s in source
where !s.StartsWith(" ")
let parsed = csvReader.CSVParser(s)
where parsed.Any()
let first = parsed.First()
let rest = String.Join( "," , parsed.Skip(1).ToArray())
select new {first, rest}
)
.GroupBy(x => x.first, x => x.rest) //GroupBy(keySelector, elementSelector)
.ToList()
//--------------------------------------------------
List<string> myExtras = new List<string>();
foreach(IGrouping<string, string> g in groupedSource)
{
myHashTable.Add(g.Key, g.First());
if (g.Skip(1).Any())
{
myExtras.Add(g.Key);
}
}
Thank you all.
I ended up using the ContainsKey() method. It takes maybe 30 secs longer, which is fine for my purposes. I'm loading about 1.7 million lines and the program takes about 7 mins total to load up two files, compare them, and write out a few files. It only takes about 2 secs to do the compare and write out the files.

Categories

Resources