Convert a txt file to dictionary<string, string> - c#

I have a text file and I need to put all even lines to Dictionary Key and all even lines to Dictionary Value. What is the best solution to my problem?
int count_lines = 1;
Dictionary<string, string> stroka = new Dictionary<string, string>();
foreach (string line in ReadLineFromFile(readFile))
{
if (count_lines % 2 == 0)
{
stroka.Add Value
}
else
{
stroka.Add Key
}
count_lines++;
}

Try this:
var res = File
.ReadLines(pathToFile)
.Select((v, i) => new {Index = i, Value = v})
.GroupBy(p => p.Index / 2)
.ToDictionary(g => g.First().Value, g => g.Last().Value);
The idea is to group all lines by pairs. Each group will have exactly two items - the key as the first item, and the value as the second item.
Demo on ideone.

You probably want to do this:
var array = File.ReadAllLines(filename);
for(var i = 0; i < array.Length; i += 2)
{
stroka.Add(array[i + 1], array[i]);
}
This reads the file in steps of two instead of every line separately.
I suppose you wanted to use these pairs: (2,1), (4,3), ... . If not, please change this code to suit your needs.

You can read line by line and add to a Dictionary
public void TextFileToDictionary()
{
Dictionary<string, string> d = new Dictionary<string, string>();
using (var sr = new StreamReader("txttodictionary.txt"))
{
string line = null;
// while it reads a key
while ((line = sr.ReadLine()) != null)
{
// add the key and whatever it
// can read next as the value
d.Add(line, sr.ReadLine());
}
}
}
This way you will get a dictionary, and if you have odd lines, the last entry will have a null value.

String fileName = #"c:\MyFile.txt";
Dictionary<string, string> stroka = new Dictionary<string, string>();
using (TextReader reader = new StreamReader(fileName)) {
String key = null;
Boolean isValue = false;
while (reader.Peek() >= 0) {
if (isValue)
stroka.Add(key, reader.ReadLine());
else
key = reader.ReadLine();
isValue = !isValue;
}
}

Related

Dictionary to return a character list with their indices

I've been tasked with taking a string and returning a dictionary that has a map of characters to a list of their indices in a given string. The output should show which characters occur where in the given string.
This code passes the test:
public class CharacterIndexDictionary
{
public static Dictionary<string, List<int>> ConcordanceForString(string input)
{
var result = new Dictionary<string, List<int>>();
for (var index = 0; index < input.Length; index++)
{
// Get the character positioned at the current index.
// We could just use input[index] everywhere, but
// this is a little easier to read.
string currentCharacter = input[index].ToString();
// If the dictionary doesn't already have an entry
// for the current character, add one.
if (!result.ContainsKey(currentCharacter))
{
result.Add(currentCharacter, new List<int>());
}
// Add the current index to the list for
// the current character.
result[currentCharacter].Add(index);
}
return result;
}
}
If I wanted to index characters I'd use a Dictionary<char, List<int>> instead of using a string as the key, but this uses string because the test requires it.
This code block is like your code and in a way that you can understand
public Dictionary<string, List<int>> ConcordanceForString(string s)
{
Dictionary<string, List<int>> newDictionary = new Dictionary<string, List<int>>();
List<char> charList = new List<char>();
foreach (var item in s)
{
if (!charList.Any(x => x == item))
{
charList.Add(item);
List<int> itemInds = new List<int>();
for (int i = 0; i< s.Length; i++)
{
if (s[i] == item)
{
itemInds.Add(i);
}
}
newDictionary.Add(item.ToString(), itemInds);
}
}
return newDictionary;
}

StreamWriter C# formatting output

Problem Statement
In order to run gene annotation software, I need to prepare two types of files, vcard files and coverage tables, and there has to be one-to-one match of vcard to coverage table. Since Im running 2k samples, its hard to identify which file is not one-to-one match. I know that both files have unique identifier numbers, hence, if both folders have files that have same unique numbers, i treat that as "same" file
I made a program that compares two folders and reports unique entries in each folder. To do so, I made two list that contains unique file names to each directory.
I want to format the report file (tab delimited .txt file) such that it looks something like below:
Unique in fdr1 Unique in fdr2
file x file a
file y file b
file z file c
I find this difficult to do because I have to iterate twice (since I have two lists), but there is no way of going back to the previous line in StreamWriter as far as I know. Basically, once I iterate through the first list and fill the first column, how can I fill the second column with the second list?
Can someone help me out with this?
Thanks
If design of the code has to change (i.e. one list instead of two), please let me know
As requested by some user, this is how I was going to do (not working version)
// Write report
using (StreamWriter sw = new StreamWriter(dest_txt.Text + #"\" + "Report.txt"))
{
// Write headers
sw.WriteLine("Unique Entries in Folder1" + "\t" + "Unique Entries in Folder2");
// Write unique entries in fdr1
foreach(string file in fdr1FileList)
{
sw.WriteLine(file + "\t");
}
// Write unique entries in fdr2
foreach (string file in fdr2FileList)
{
sw.WriteLine(file + "\t");
}
sw.Dispose();
}
As requested for my approach for finding unique entries, here's my code snippet
Dictionary<int, bool> fdr1Dict = new Dictionary<int, bool>();
Dictionary<int, bool> fdr2Dict = new Dictionary<int, bool>();
List<string> fdr1FileList = new List<string>();
List<string> fdr2FileList = new List<string>();
string fdr1Path = folder1_txt.Text;
string fdr2Path = folder2_txt.Text;
// File names in the specified directory; path not included
string[] fdr1FileNames = Directory.GetFiles(fdr1Path).Select(Path.GetFileName).ToArray();
string[] fdr2FileNames = Directory.GetFiles(fdr2Path).Select(Path.GetFileName).ToArray();
// Iterate through the first directory, and add GL number to dictionary
for(int i = 0; i < fdr1FileNames.Length; i++)
{
// Grabs only the number from the file name
string number = Regex.Match(fdr1FileNames[i], #"\d+").ToString();
int glNumber;
// Make sure it is a number
if(Int32.TryParse(number, out glNumber))
{
fdr1Dict[glNumber] = true;
}
// If number not present, raise exception
else
{
throw new Exception(String.Format("GL Number not found in: {0}", fdr1FileNames[i]));
}
}
// Iterate through the second directory, and add GL number to dictionary
for (int i = 0; i < fdr2FileNames.Length; i++)
{
// Grabs only the number from the file name
string number = Regex.Match(fdr2FileNames[i], #"\d+").ToString();
int glNumber;
// Make sure it is a number
if (Int32.TryParse(number, out glNumber))
{
fdr2Dict[glNumber] = true;
}
// If number not present, raise exception
else
{
throw new Exception(String.Format("GL Number not found in: {0}", fdr2FileNames[i]));
}
}
// Iterate through the first directory, and find files that are unique to it
for (int i = 0; i < fdr1FileNames.Length; i++)
{
int glNumber = Int32.Parse(Regex.Match(fdr1FileNames[i], #"\d+").Value);
// If same file is not present in the second folder add to the list
if(!fdr2Dict[glNumber])
{
fdr1FileList.Add(fdr1FileNames[i]);
}
}
// Iterate through the second directory, and find files that are unique to it
for (int i = 0; i < fdr2FileNames.Length; i++)
{
int glNumber = Int32.Parse(Regex.Match(fdr2FileNames[i], #"\d+").Value);
// If same file is not present in the first folder add to the list
if (!fdr1Dict[glNumber])
{
fdr2FileList.Add(fdr2FileNames[i]);
}
I am a quite confident that this will work as I've tested it:
static void Main(string[] args)
{
var firstDir = #"Path1";
var secondDir = #"Path2";
var firstDirFiles = System.IO.Directory.GetFiles(firstDir);
var secondDirFiles = System.IO.Directory.GetFiles(secondDir);
print2Dirs(firstDirFiles, secondDirFiles);
}
private static void print2Dirs(string[] firstDirFile, string[] secondDirFiles)
{
var maxIndex = Math.Max(firstDirFile.Length, secondDirFiles.Length);
using (StreamWriter streamWriter = new StreamWriter("result.txt"))
{
streamWriter.WriteLine(string.Format("{0,-150}{1,-150}", "Unique in fdr1", "Unique in fdr2"));
for (int i = 0; i < maxIndex; i++)
{
streamWriter.WriteLine(string.Format("{0,-150}{1,-150}",
firstDirFile.Length > i ? firstDirFile[i] : string.Empty,
secondDirFiles.Length > i ? secondDirFiles[i] : string.Empty));
}
}
}
It's a quite simple code but if you need help understanding it just let me know :)
I would construct each line at a time. Something like this:
int row = 0;
string[] fdr1FileList = new string[0];
string[] fdr2FileList = new string[0];
while (row < fdr1FileList.Length || row < fdr2FileList.Length)
{
string rowText = "";
rowText += (row >= fdr1FileList.Length ? "\t" : fdr1FileList[row] + "\t");
rowText += (row >= fdr2FileList.Length ? "\t" : fdr2FileList[row]);
row++;
}
Try something like this:
static void Main(string[] args)
{
Dictionary<int, string> fdr1Dict = FilesToDictionary(Directory.GetFiles("path1"));
Dictionary<int, string> fdr2Dict = FilesToDictionary(Directory.GetFiles("path2"));
var unique_f1 = fdr1Dict.Where(f1 => !fdr2Dict.ContainsKey(f1.Key)).ToArray();
var unique_f2 = fdr2Dict.Where(f2 => !fdr1Dict.ContainsKey(f2.Key)).ToArray();
int f1_size = unique_f1.Length;
int f2_size = unique_f2.Length;
int list_length = 0;
if (f1_size > f2_size)
{
list_length = f1_size;
Array.Resize(ref unique_f2, list_length);
}
else
{
list_length = f2_size;
Array.Resize(ref unique_f1, list_length);
}
using (StreamWriter writer = new StreamWriter("output.txt"))
{
writer.WriteLine(string.Format("{0,-30}{1,-30}", "Unique in fdr1", "Unique in fdr2"));
for (int i = 0; i < list_length; i++)
{
writer.WriteLine(string.Format("{0,-30}{1,-30}", unique_f1[i].Value, unique_f2[i].Value));
}
}
}
static Dictionary<int, string> FilesToDictionary(string[] filenames)
{
Dictionary<int, string> dict = new Dictionary<int, string>();
for (int i = 0; i < filenames.Length; i++)
{
int glNumber;
string filename = Path.GetFileName(filenames[i]);
string number = Regex.Match(filename, #"\d+").ToString();
if (int.TryParse(number, out glNumber))
dict.Add(glNumber, filename);
}
return dict;
}

Remove All Indexes in String

I have dictionary of int (Dictionary<int, int>) which has index of all parenthesis in a string (key was openStartParenthesisIndex and value was closeEndParenthesisIndex)
e.g in text
stringX.stringY(())() -> stringX.stringY$($()^)^$()^
$ = openParenthesisStartIndex
^ = closeParenthesisEndIndex
Dictionary items:
key value
(openParenthesisStartIndex) --- (closeParenthesisEndIndex)
item1 15 19
item2 16 18
item3 19 21
My problem was when I loop my dictionary and try to remove it on string, next loop the index was not valid since its already change because I remove it .
string myText = "stringX.stringY(())()";
Dictionary<int, int> myIndexs = new Dictionary<int, int>();
foreach (var x in myIndexs)
{
myText = myText.Remove(item.Key, 1).Remove(item.Value-1);
}
Question: how can i remove all index in a string (from startIndex[key] to endIndex[value])?
To prevent the index from changing, one trick is to remove the occurences starting from the end:
string myText = stringX.stringY(())();
Dictionary<int, int> myIndexs = new Dictionary<int, int>();
var allIndexes = myIndexs.Keys.Concat(myIndexs.Values);
foreach (var index in allIndexes.OrderByDescending(i => i))
{
myText = myText.Remove(index, 1);
}
Note that you probably don't need a dictionary at all. Consider replacing it by a list.
StringBuilder will be more suited to your case as you are continuously changing data. StringBuilder MSDN
Ordering the keys by descending order will work as well for removing all indexes.
Another workaround could be to place an intermediary character at required index and replace all instances of that character in the end.
StringBuilder ab = new StringBuilder("ab(cd)");
ab.Remove(2, 1);
ab.Insert(2, "`");
ab.Remove(5, 1);
ab.Insert(5, "`");
ab.Replace("`", "");
System.Console.Write(ab);
Strings when you make a change to a string a new string is always created, so what you want is to create a new string without the removed parts. This code is a little bit complicated because of how it deals with the potential overlap. Maybe the better way would be to cleanup the indexes, making a list of indexes that represent the same removals in the right order without overlap.
public static string removeAtIndexes(string source)
{
var indexes = new Tuple<int, int>[]
{
new Tuple<int, int>(15, 19),
new Tuple<int, int>(16, 18),
new Tuple<int, int>(19, 21)
};
var sb = new StringBuilder();
var last = 0;
bool copying = true;
for (var i = 0; i < source.Length; i++)
{
var end = false;
foreach (var index in indexes)
{
if (copying)
{
if (index.Item1 <= i)
{
copying = false;
break;
}
}
else
{
if (index.Item2 < i)
{
end = true;
}
}
}
if (false == copying && end)
{
copying = true;
}
if(copying)
{
sb.Append(source[i]);
}
}
return sb.ToString();
}

How to loop through and compare millions of values in two text files?

I have two text files files (TXT) which contain over 2 million distinct file names. I want to loop through all the names in the first file and find those that are also present in the second text file.
I have tried looping through the StreamReader but it takes a lot of time. I also tried the code below, but it still takes too much time.
StreamReader first = new StreamReader(path);
string strFirst = first.ReadToEnd();
string[] strarrFirst = strFirst.Split('\n');
bool found = false;
StreamReader second = new StreamReader(path2);
string str = second.ReadToEnd();
string[] strarrSecond = str.Split('\n');
for (int j = 0; j < (strarrFirst.Length); j++)
{
found = false;
for (int i = 0; i < (strarrSecond .Length); i++)
{
if (strarrFirst[j] == strarrSecond[i])
{
found = true;
break;
}
}
if (!found)
{
Console.WriteLine(strarrFirst[j]);
}
}
What is a good way to compare the files?
How about this:
var commonNames = File.ReadLines(path).Intersect(File.ReadLines(path2));
That's O(N + M) instead of your current solution which tests every line in the first file with every line in the second file - O(N * M).
That's assuming you're using .NET 4. Otherwise, you could use File.ReadAllLines, but that will read the whole file into memory. Or you could write the equivalent of File.ReadLines yourself - it's not terribly hard.
Ultimately you're likely to be limited by file IO by the time you've got rid of the O(N * M) problem in your current code - there's not much way to get round that.
EDIT: For .NET 2, first let's implement something like ReadLines:
public static IEnumerable<string> ReadLines(string file)
{
using (TextReader reader = File.OpenText(file))
{
string line;
while ((line = reader.ReadLine()) != null)
{
yield return line;
}
}
}
Now we really want to use a HashSet<T>, but that wasn't in .NET 2 - so let's use Dictionary<TKey, TValue> instead:
Dictionary<string, string> map = new Dictionary<string, string>();
foreach (string line in ReadLines(path))
{
map[line] = line;
}
List<string> intersection = new List<string>();
foreach (string line in ReadLines(path2))
{
if (map.ContainsKey(line))
{
intersection.Add(line);
}
}
Try something like this to speed it up a bit ...
var path = string.Empty;
var path2 = string.Empty;
var strFirst = string.Empty;
var str = string.Empty;
var strarrFirst = new List<string>();
var strarrSecond = new List<string>();
using (var first = new StreamReader(path))
{
strFirst = first.ReadToEnd();
}
using (var second = new StreamReader(path2))
{
str = second.ReadToEnd();
}
strarrFirst.AddRange(strFirst.Split('\n'));
strarrSecond.AddRange(str.Split('\n'));
strarrSecond.Sort();
foreach(var value in strarrFirst)
{
var found = strarrSecond.BinarySearch(value) >= 0;
if (!found) Console.WriteLine(value);
}
Just for fun, I've tried Jon Skeet method and own:
var guidArray = Enumerable.Range(0, 1000000).Select(x => Guid.NewGuid().ToString()).ToList();
string path = "first.txt";
File.WriteAllLines(path, guidArray);
string path2 = "second.txt";
File.WriteAllLines(path2, guidArray.Select(x=>DateTime.UtcNow.Ticks % 2 == 0 ? x : Guid.NewGuid().ToString()));
var start = DateTime.Now;
var commonNames = File.ReadLines(path).Intersect(File.ReadLines(path2)).ToList();
Console.WriteLine((DateTime.Now - start).TotalMilliseconds);
start = DateTime.Now;
var lines = File.ReadAllLines(path);
var hashset = new HashSet<string>(lines);
var lines2 = File.ReadAllLines(path2);
var result = lines2.Where(hashset.Contains).ToList();
Console.WriteLine((DateTime.Now - start).TotalMilliseconds);
Console.ReadKey();
And Skeet's method was tiny bit faster (1453.0831 vs 1488.0851, iDevForFun method was quite slow - 12791.7316), so i think under layers should happen same thing as I was trying to do manually with hashset.

My hashtable doesnt work

I am using a hashtable to read data from file and make clusters.
Say the data in file is:
umair,i,umair
sajid,mark,i , k , i
The output is like:
[{umair,umair},i]
[sajid,mark,i,i,k]
But my code does not work. Here is the code:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Collections;
namespace readstringfromfile
{
class Program
{
static void Main()
{
/* int i = 0;
foreach (string line in File.ReadAllLines("newfile.txt"))
{
string[] parts = line.Split(',');
foreach (string part in parts)
{
Console.WriteLine("{0}:{1}", i,part);
}
i++; // For demo only
}*/
Hashtable hashtable = new Hashtable();
using (StreamReader r = new StreamReader("newfile.txt"))
{
string line;
while ((line = r.ReadLine()) != null)
{
string[] records = line.Split(',');
foreach (string record in records)
{
if (hashtable[records] == null)
hashtable[records] = (int)0;
hashtable[records] = (int)hashtable[records] + 1;
Console.WriteLine(hashtable.Keys);
}
/////this portion is not working/////////////////////////////////////
foreach (DictionaryEntry entry in hashtable)
{
for (int i = 0; i < (int)hashtable[records]; i++)
{
Console.WriteLine(entry);
}
}
}
}
}
}
}
You're working with the records array when inserting into the hashtable (and when reading from it) instead of using the foreach-variable record. Also, in the final look, you iterate based on records instead of the current entry.Key. You're also declaring the hashtable in a too wide scope, causing all rows to be inserted into the same hashtable, instead of one per row.
public static void Main() {
var lines = new[] { "umair,i,umair", "sajid,mark,i,k,i" };
foreach (var line in lines) {
var hashtable = new Hashtable();
var records = line.Split(',');
foreach (var record in records) {
if (hashtable[record] == null)
hashtable[record] = 0;
hashtable[record] = (Int32)hashtable[record] + 1;
}
var str = "";
foreach (DictionaryEntry entry in hashtable) {
var count = (Int32)hashtable[entry.Key];
for (var i = 0; i < count; i++) {
str += entry.Key;
if (i < count - 1)
str += ",";
}
str += ",";
}
// Remove last comma.
str = str.TrimEnd(',');
Console.WriteLine(str);
}
Console.ReadLine();
}
However, you should consider using the generic Dictionary<TKey,TValue> class, and use a StringBuilder if you're building alot of strings.
public static void Main() {
var lines = new[] { "umair,i,umair", "sajid,mark,i,k,i" };
foreach (var line in lines) {
var dictionary = new Dictionary<String, Int32>();
var records = line.Split(',');
foreach (var record in records) {
if (!dictionary.ContainsKey(record))
dictionary.Add(record, 1);
else
dictionary[record]++;
}
var str = "";
foreach (var entry in dictionary) {
for (var i = 0; i < entry.Value; i++) {
str += entry.Key;
if (i < entry.Value - 1)
str += ",";
}
str += ",";
}
// Remove last comma.
str = str.TrimEnd(',');
Console.WriteLine(str);
}
Console.ReadLine();
}
You're attempting to group elements of a sequence. LINQ has a built-in operator for that; it's used as group ... by ... into ... or the equivalent method .GroupBy(...)
That means you can write your code (excluding File I/O etc.) as:
var lines = new[] { "umair,i,umair", "sajid,mark,i,k,i" };
foreach (var line in lines) {
var groupedRecords =
from record in line.Split(',')
group record by record into recordgroup
from record in recordgroup
select record;
Console.WriteLine(
string.Join(
",", groupedRecords
)
);
}
If you prefer shorter code, the loop be equivalently written as:
foreach (var line in lines)
Console.WriteLine(string.Join(",",
line.Split(',').GroupBy(rec=>rec).SelectMany(grp=>grp)));
both versions will output...
umair,umair,i
sajid,mark,i,i,k
Note that you really shouldn't be using a Hashtable - that's just a type-unsafe slow version of Dictionary for almost all purposes. Also, the output example you mention includes [] and {} characters - but you didn't specify how or whether they're supposed to be included, so I left those out.
A LINQ group is nothing more than a sequence of elements (here, identical strings) with a Key (here a string). Calling GroupBy thus transforms the sequence of records into a sequence of groups. However, you want to simply concatenate those groups. SelectMany is such a concatenation: from a sequence of items, it concatenates the "contents" of each item into one large sequence.

Categories

Resources