How to merge .txt files in c#? [duplicate] - c#

using (StreamWriter writer = File.CreateText(FinishedFile))
{
int lineNum = 0;
while (lineNum < FilesLineCount.Min())
{
for (int i = 0; i <= FilesToMerge.Count() - 1; i++)
{
if (i != FilesToMerge.Count() - 1)
{
var CurrentFile = File.ReadLines(FilesToMerge[i]).Skip(lineNum).Take(1);
string CurrentLine = string.Join("", CurrentFile);
writer.Write(CurrentLine + ",");
}
else
{
var CurrentFile = File.ReadLines(FilesToMerge[i]).Skip(lineNum).Take(1);
string CurrentLine = string.Join("", CurrentFile);
writer.Write(CurrentLine + "\n");
}
}
lineNum++;
}
}
The current way i am doing this is just too slow. I am merging files that are each 50k+ lines long with various amounts of data.
for ex:
File 1
1
2
3
4
File 2
4
3
2
1
i need this to merge into being a third fileFile 3
1,4
2,3
3,2
4,1P.S. The user can pick as many files as they want from any locations.
Thanks for the help.

You approach is slow because of the Skip and Take in the loops.
You could use a dictionary to collect all line-index' lines:
string[] allFileLocationsToMerge = { "filepath1", "filepath2", "..." };
var mergedLists = new Dictionary<int, List<string>>();
foreach (string file in allFileLocationsToMerge)
{
string[] allLines = File.ReadAllLines(file);
for (int lineIndex = 0; lineIndex < allLines.Length; lineIndex++)
{
bool indexKnown = mergedLists.TryGetValue(lineIndex, out List<string> allLinesAtIndex);
if (!indexKnown)
allLinesAtIndex = new List<string>();
allLinesAtIndex.Add(allLines[lineIndex]);
mergedLists[lineIndex] = allLinesAtIndex;
}
}
IEnumerable<string> mergeLines = mergedLists.Values.Select(list => string.Join(",", list));
File.WriteAllLines("targetPath", mergeLines);

Here's another approach - this implementation only stores in memory one set of lines from each file simultaneously, thus reducing memory pressure significantly (if that is an issue).
public static void MergeFiles(string output, params string[] inputs)
{
var files = inputs.Select(File.ReadLines).Select(iter => iter.GetEnumerator()).ToArray();
StringBuilder line = new StringBuilder();
bool any;
using (var outFile = File.CreateText(output))
{
do
{
line.Clear();
any = false;
foreach (var iter in files)
{
if (!iter.MoveNext())
continue;
if (line.Length != 0)
line.Append(", ");
line.Append(iter.Current);
any = true;
}
if (any)
outFile.WriteLine(line.ToString());
}
while (any);
}
foreach (var iter in files)
{
iter.Dispose();
}
}
This also handles files of different lengths.

Related

StreamWriter C# formatting output

Problem Statement
In order to run gene annotation software, I need to prepare two types of files, vcard files and coverage tables, and there has to be one-to-one match of vcard to coverage table. Since Im running 2k samples, its hard to identify which file is not one-to-one match. I know that both files have unique identifier numbers, hence, if both folders have files that have same unique numbers, i treat that as "same" file
I made a program that compares two folders and reports unique entries in each folder. To do so, I made two list that contains unique file names to each directory.
I want to format the report file (tab delimited .txt file) such that it looks something like below:
Unique in fdr1 Unique in fdr2
file x file a
file y file b
file z file c
I find this difficult to do because I have to iterate twice (since I have two lists), but there is no way of going back to the previous line in StreamWriter as far as I know. Basically, once I iterate through the first list and fill the first column, how can I fill the second column with the second list?
Can someone help me out with this?
Thanks
If design of the code has to change (i.e. one list instead of two), please let me know
As requested by some user, this is how I was going to do (not working version)
// Write report
using (StreamWriter sw = new StreamWriter(dest_txt.Text + #"\" + "Report.txt"))
{
// Write headers
sw.WriteLine("Unique Entries in Folder1" + "\t" + "Unique Entries in Folder2");
// Write unique entries in fdr1
foreach(string file in fdr1FileList)
{
sw.WriteLine(file + "\t");
}
// Write unique entries in fdr2
foreach (string file in fdr2FileList)
{
sw.WriteLine(file + "\t");
}
sw.Dispose();
}
As requested for my approach for finding unique entries, here's my code snippet
Dictionary<int, bool> fdr1Dict = new Dictionary<int, bool>();
Dictionary<int, bool> fdr2Dict = new Dictionary<int, bool>();
List<string> fdr1FileList = new List<string>();
List<string> fdr2FileList = new List<string>();
string fdr1Path = folder1_txt.Text;
string fdr2Path = folder2_txt.Text;
// File names in the specified directory; path not included
string[] fdr1FileNames = Directory.GetFiles(fdr1Path).Select(Path.GetFileName).ToArray();
string[] fdr2FileNames = Directory.GetFiles(fdr2Path).Select(Path.GetFileName).ToArray();
// Iterate through the first directory, and add GL number to dictionary
for(int i = 0; i < fdr1FileNames.Length; i++)
{
// Grabs only the number from the file name
string number = Regex.Match(fdr1FileNames[i], #"\d+").ToString();
int glNumber;
// Make sure it is a number
if(Int32.TryParse(number, out glNumber))
{
fdr1Dict[glNumber] = true;
}
// If number not present, raise exception
else
{
throw new Exception(String.Format("GL Number not found in: {0}", fdr1FileNames[i]));
}
}
// Iterate through the second directory, and add GL number to dictionary
for (int i = 0; i < fdr2FileNames.Length; i++)
{
// Grabs only the number from the file name
string number = Regex.Match(fdr2FileNames[i], #"\d+").ToString();
int glNumber;
// Make sure it is a number
if (Int32.TryParse(number, out glNumber))
{
fdr2Dict[glNumber] = true;
}
// If number not present, raise exception
else
{
throw new Exception(String.Format("GL Number not found in: {0}", fdr2FileNames[i]));
}
}
// Iterate through the first directory, and find files that are unique to it
for (int i = 0; i < fdr1FileNames.Length; i++)
{
int glNumber = Int32.Parse(Regex.Match(fdr1FileNames[i], #"\d+").Value);
// If same file is not present in the second folder add to the list
if(!fdr2Dict[glNumber])
{
fdr1FileList.Add(fdr1FileNames[i]);
}
}
// Iterate through the second directory, and find files that are unique to it
for (int i = 0; i < fdr2FileNames.Length; i++)
{
int glNumber = Int32.Parse(Regex.Match(fdr2FileNames[i], #"\d+").Value);
// If same file is not present in the first folder add to the list
if (!fdr1Dict[glNumber])
{
fdr2FileList.Add(fdr2FileNames[i]);
}
I am a quite confident that this will work as I've tested it:
static void Main(string[] args)
{
var firstDir = #"Path1";
var secondDir = #"Path2";
var firstDirFiles = System.IO.Directory.GetFiles(firstDir);
var secondDirFiles = System.IO.Directory.GetFiles(secondDir);
print2Dirs(firstDirFiles, secondDirFiles);
}
private static void print2Dirs(string[] firstDirFile, string[] secondDirFiles)
{
var maxIndex = Math.Max(firstDirFile.Length, secondDirFiles.Length);
using (StreamWriter streamWriter = new StreamWriter("result.txt"))
{
streamWriter.WriteLine(string.Format("{0,-150}{1,-150}", "Unique in fdr1", "Unique in fdr2"));
for (int i = 0; i < maxIndex; i++)
{
streamWriter.WriteLine(string.Format("{0,-150}{1,-150}",
firstDirFile.Length > i ? firstDirFile[i] : string.Empty,
secondDirFiles.Length > i ? secondDirFiles[i] : string.Empty));
}
}
}
It's a quite simple code but if you need help understanding it just let me know :)
I would construct each line at a time. Something like this:
int row = 0;
string[] fdr1FileList = new string[0];
string[] fdr2FileList = new string[0];
while (row < fdr1FileList.Length || row < fdr2FileList.Length)
{
string rowText = "";
rowText += (row >= fdr1FileList.Length ? "\t" : fdr1FileList[row] + "\t");
rowText += (row >= fdr2FileList.Length ? "\t" : fdr2FileList[row]);
row++;
}
Try something like this:
static void Main(string[] args)
{
Dictionary<int, string> fdr1Dict = FilesToDictionary(Directory.GetFiles("path1"));
Dictionary<int, string> fdr2Dict = FilesToDictionary(Directory.GetFiles("path2"));
var unique_f1 = fdr1Dict.Where(f1 => !fdr2Dict.ContainsKey(f1.Key)).ToArray();
var unique_f2 = fdr2Dict.Where(f2 => !fdr1Dict.ContainsKey(f2.Key)).ToArray();
int f1_size = unique_f1.Length;
int f2_size = unique_f2.Length;
int list_length = 0;
if (f1_size > f2_size)
{
list_length = f1_size;
Array.Resize(ref unique_f2, list_length);
}
else
{
list_length = f2_size;
Array.Resize(ref unique_f1, list_length);
}
using (StreamWriter writer = new StreamWriter("output.txt"))
{
writer.WriteLine(string.Format("{0,-30}{1,-30}", "Unique in fdr1", "Unique in fdr2"));
for (int i = 0; i < list_length; i++)
{
writer.WriteLine(string.Format("{0,-30}{1,-30}", unique_f1[i].Value, unique_f2[i].Value));
}
}
}
static Dictionary<int, string> FilesToDictionary(string[] filenames)
{
Dictionary<int, string> dict = new Dictionary<int, string>();
for (int i = 0; i < filenames.Length; i++)
{
int glNumber;
string filename = Path.GetFileName(filenames[i]);
string number = Regex.Match(filename, #"\d+").ToString();
if (int.TryParse(number, out glNumber))
dict.Add(glNumber, filename);
}
return dict;
}

Count the number of words within an Array or List

I need to count the number of words within an array or a list. The reason I say array or list is because I am not sure which would be the best to use in this situation. The data is static and in a .txt file (It's actually a book). I was able to create an array and break down words from the array but for the life of me I can not count! I have tried many different ways to do this and I'm thinking since it is a string it is unable to count. I have even teetered on the edge of just printing the whole book to a listbox and counting from the listbox but, that's ridiculous.
public partial class mainForm : Form
{
//------------------------
//GLOBAL VARIABLES:
//------------------------
List<string> countWords;
string[] fileWords;
string[] fileLines;
char[] delim = new char[] { ' ', ',','.','?','!' };
string path;
public mainForm()
{
InitializeComponent();
}
private void BookTitle() // TiTleAndAuthor Method will pull the Book Title and display it.
{
for (int i = 0; i < 1; i++)
{
bookTitleLabel.Text = fileLines[i];
}
}
private void BookAuthor() // TiTleAndAuthor Method will pull the Book Author and display it.
{
for (int i = 1; i < 2; i++)
{
bookAuthorLabel.Text = fileLines[i];
}
}
private void FirstLines() // FirstTenWords Method pulls the first ten words of any text file and prints the to a ListBox
{
for (int i = 0; i <= 499; i++)
{
wordsListBox.Items.Add(fileWords[i]);
}
}
private void WordCount() // Count all the words in the file.
{
}
private void openFileButton_Click(object sender, EventArgs e)
{
OpenFileDialog inputFile = new OpenFileDialog();
if (inputFile.ShowDialog() == DialogResult.OK) // check the file the user selected
{
path = inputFile.FileName; // save that path of the file to a string variable for later use
StreamReader fileRead = new StreamReader(path); // read a file at the path outlined in the path variable
fileWords = fileRead.ReadToEnd().Split(delim); // Breakdown the text into lines of text to call them at a later date
fileLines = File.ReadAllLines(path);
countWords = File.ReadLines(path).ToList();
wordsListBox.Items.Clear();
BookTitle();
BookAuthor();
FirstLines();
WordCount();
}
else
{
MessageBox.Show("Not a valid file, please select a text file");
}
}
}
Maybe this is useful:
static void Main(string[] args)
{
string[] lines = File_ReadAllLines();
List<string> words = new List<string>();
foreach(var line in lines)
{
words.AddRange(line.Split(' '));
}
Console.WriteLine(words.Count);
}
private static string[] File_ReadAllLines()
{
return new[] {
"The one book",
"written by gnarf",
"once upon a time ther werent any grammer",
"iso 1-12122-445",
"(c) 2012 under the hills"
};
}
Before I get to the answer, a quick observation on some of the loops:
for (int i = 1; i < 2; i++)
{
bookAuthorLabel.Text = fileLines[i];
}
This'll only run once, so it's pointless to have it in a loop (unless you intended this to actually loop through the whole list, in which case it's a bug). If this is the expected behavior, you might as well just do
bookAuthorLabel.Text = fileLines[1];
You have something similar here:
for (int i = 0; i < 1; i++)
{
bookTitleLabel.Text = fileLines[i];
}
Again, this is pointless.
Now for the answer itself. I'm not sure if you're trying to get total word count or count of individual words, so here's a code sample for doing both:
private static void CountWords()
{
const string fileName = "CountWords.txt";
// Create a dummy file
using (var sw = new StreamWriter(fileName))
{
sw.WriteLine("This is a short sentence");
sw.WriteLine("This is a long sentence");
}
string text = File.ReadAllText(fileName);
string[] result = text.Split(new[] { " ", Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries);
// Total word count
Console.WriteLine("Total count: " + result.Count().ToString());
// Now to illustrate getting the count of individual words
var dictionary = new Dictionary<string, int>();
foreach (string word in result)
{
if (dictionary.ContainsKey(word))
{
dictionary[word]++;
}
else
{
dictionary[word] = 1;
}
}
foreach (string key in dictionary.Keys)
{
Console.WriteLine(key + ": " + dictionary[key].ToString());
}
}
This should be easy to adapt to your particular needs in this case.
Read text file line by line. split by empty character and remove unnecessary spaces. sum this count to total.
var totalWords = 0;
using (StreamReader sr = new StreamReader("abc.txt"))
{
while (!sr.EndOfStream)
{
int count = sr
.ReadLine()
.Split(new char[] {' '}, StringSplitOptions.RemoveEmptyEntries).Count();
totalWords += count;
}
You can also use the below code:
totalWords = fileRead.ReadToEnd().Split(delim, StringSplitOptions.RemoveEmptyEntries).Length;

c# How to run a application faster

I am creating a word list of possible uppercase letters to prove how insecure 8 digit passwords are this code will write aaaaaaaa to aaaaaaab to aaaaaaac etc. until zzzzzzzz using this code:
class Program
{
static string path;
static int file = 0;
static void Main(string[] args)
{
new_file();
var alphabet = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ123456789+-*_!$£^=<>§°ÖÄÜöäü.;:,?{}[]";
var q = alphabet.Select(x => x.ToString());
int size = 3;
int counter = 0;
for (int i = 0; i < size - 1; i++)
{
q = q.SelectMany(x => alphabet, (x, y) => x + y);
}
foreach (var item in q)
{
if (counter >= 20000000)
{
new_file();
counter = 0;
}
if (File.Exists(path))
{
using (StreamWriter sw = File.AppendText(path))
{
sw.WriteLine(item);
Console.WriteLine(item);
/*if (!(Regex.IsMatch(item, #"(.)\1")))
{
sw.WriteLine(item);
counter++;
}
else
{
Console.WriteLine(item);
}*/
}
}
else
{
new_file();
}
}
}
static void new_file()
{
path = #"C:\" + "list" + file + ".txt";
if (!File.Exists(path))
{
using (StreamWriter sw = File.CreateText(path))
{
}
}
file++;
}
}
The Code is working fine but it takes Weeks to run it. Does anyone know a way to speed it up or do I have to wait? If anyone has a idea please tell me.
Performance:
size 3: 0.02s
size 4: 1.61s
size 5: 144.76s
Hints:
removed LINQ for combination generation
removed Console.WriteLine for each password
removed StreamWriter
large buffer (128k) for file writing
const string alphabet = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ123456789+-*_!$£^=<>§°ÖÄÜöäü.;:,?{}[]";
var byteAlphabet = alphabet.Select(ch => (byte)ch).ToArray();
var alphabetLength = alphabet.Length;
var newLine = new[] { (byte)'\r', (byte)'\n' };
const int size = 4;
var number = new byte[size];
var password = Enumerable.Range(0, size).Select(i => byteAlphabet[0]).Concat(newLine).ToArray();
var watcher = new System.Diagnostics.Stopwatch();
watcher.Start();
var isRunning = true;
for (var counter = 0; isRunning; counter++)
{
Console.Write("{0}: ", counter);
Console.Write(password.Select(b => (char)b).ToArray());
using (var file = System.IO.File.Create(string.Format(#"list.{0:D5}.txt", counter), 2 << 16))
{
for (var i = 0; i < 2000000; ++i)
{
file.Write(password, 0, password.Length);
var j = size - 1;
for (; j >= 0; j--)
{
if (number[j] < alphabetLength - 1)
{
password[j] = byteAlphabet[++number[j]];
break;
}
else
{
number[j] = 0;
password[j] = byteAlphabet[0];
}
}
if (j < 0)
{
isRunning = false;
break;
}
}
}
}
watcher.Stop();
Console.WriteLine(watcher.Elapsed);
}
Try the following modified code. In LINQPad it runs in < 1 second. With your original code I gave up after 40 seconds. It removes the overhead of opening and closing the file for every WriteLine operation. You'll need to test and ensure it gives the same results because I'm not willing to run your original code for 24 hours to ensure the output is the same.
class Program
{
static string path;
static int file = 0;
static void Main(string[] args)
{
new_file();
var alphabet = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ123456789+-*_!$£^=<>§°ÖÄÜöäü.;:,?{}[]";
var q = alphabet.Select(x => x.ToString());
int size = 3;
int counter = 0;
for (int i = 0; i < size - 1; i++)
{
q = q.SelectMany(x => alphabet, (x, y) => x + y);
}
StreamWriter sw = File.AppendText(path);
try
{
foreach (var item in q)
{
if (counter >= 20000000)
{
sw.Dispose();
new_file();
counter = 0;
}
sw.WriteLine(item);
Console.WriteLine(item);
}
}
finally
{
if(sw != null)
{
sw.Dispose();
}
}
}
static void new_file()
{
path = #"C:\temp\list" + file + ".txt";
if (!File.Exists(path))
{
using (StreamWriter sw = File.CreateText(path))
{
}
}
file++;
}
}
your alphabet is missing 0
With that fixed there would be 89 chars in your set. Let's call it 100 for simplicity. The set you are looking for is all the 8 character length strings drawn from that set. There are 100^8 of these, i.e. 10,000,000,000,000,000.
The disk space they will take up depends on how you encode them, lets be generous - assume you use some 8 bit char set that contains the these characters, and you don't put in carriage returns, so one byte per char, so 10,000,000,000,000,000 bytes =~ 10 peta byes?
Do you have 10 petabytes of disk? (10000 TB)?
[EDIT] In response to 'this is not an answer':
The original motivation is to create the list? The shows how large the list would be. Its hard to see what could be DONE with the list if it was actualised, i.e. it would always be quicker to reproduce it than to load it. Surely whatever point could be made by producing the list can also be made by simply knowing it's size, which the above shows how to work it out.
There are LOTS of inefficiencies in you code, but if your questions is 'how can i quickly produce this list and write it to disk' the answer is 'you literally cannot'.
[/EDIT]

How can I write a strings to file with linebreaks inbetween?

I want to be able to write some values to a file whilst creating blank lines in between. Here is the code that I have so far:
TextWriter w_Test = new StreamWriter(file_test);
foreach (string results in searchResults)
{
w_Test.WriteLine(Path.GetFileNameWithoutExtension(results));
var list1 = File.ReadAllLines(results).Skip(10);
foreach (string listResult in list1)
{
w_Test.WriteLine(listResult);
}
}
w_Test.Close();
This creates 'Test' with the following output:
result1
listResult1
listResult2
result2
listResult3
result3
result4
I want to write the results so that each result block is 21 lines in size before writing the next, e.g.
result1
(20 lines even if no 'listResult' found)
result2
(20 lines even if no 'listResult' found)
etc.......
What would be the best way of doing this??
TextWriter w_Test = new StreamWriter(file_test);
foreach (string results in searchResults)
{
int noLinesOutput = 0;
w_Test.WriteLine(Path.GetFileNameWithoutExtension(results));
noLinesOutput++;
var list1 = File.ReadAllLines(results).Skip(10);
foreach (string listResult in list1)
{
w_Test.WriteLine(listResult);
noLinesOutput++;
}
for ( int i = 20; i > noLinesOutput; i-- )
w_Test.WriteLine();
}
w_Test.Close();
Here's a simple helper method I use in such cases:
// pad the sequence with 'elem' until it's 'count' elements long
static IEnumerable<T> PadRight<T>(IEnumerable<T> enm,
T elem,
int count)
{
int ii = 0;
foreach(var elem in enm)
{
yield return elem;
ii += 1;
}
for (; ii < count; ++ii)
{
yield return elem;
}
}
Then
foreach (string listResult in
PadRight(list1, "", 20))
{
w_Test.WriteLine(listResult);
}
should do the trick.
Perhaps with this loop:
var lines = 20;
foreach(string fullPath in searchResults)
{
List<string> allLines = new List<string>();
allLines.Add(Path.GetFileNameWithoutExtension(fullPath));
int currentLine = 0;
foreach(string line in File.ReadLines(fullPath).Skip(10))
{
if(++currentLine > lines) break;
allLines.Add(line);
}
while (currentLine++ < lines)
allLines.Add(String.Empty);
File.WriteAllLines(fullPath, allLines);
}

How to loop through and compare millions of values in two text files?

I have two text files files (TXT) which contain over 2 million distinct file names. I want to loop through all the names in the first file and find those that are also present in the second text file.
I have tried looping through the StreamReader but it takes a lot of time. I also tried the code below, but it still takes too much time.
StreamReader first = new StreamReader(path);
string strFirst = first.ReadToEnd();
string[] strarrFirst = strFirst.Split('\n');
bool found = false;
StreamReader second = new StreamReader(path2);
string str = second.ReadToEnd();
string[] strarrSecond = str.Split('\n');
for (int j = 0; j < (strarrFirst.Length); j++)
{
found = false;
for (int i = 0; i < (strarrSecond .Length); i++)
{
if (strarrFirst[j] == strarrSecond[i])
{
found = true;
break;
}
}
if (!found)
{
Console.WriteLine(strarrFirst[j]);
}
}
What is a good way to compare the files?
How about this:
var commonNames = File.ReadLines(path).Intersect(File.ReadLines(path2));
That's O(N + M) instead of your current solution which tests every line in the first file with every line in the second file - O(N * M).
That's assuming you're using .NET 4. Otherwise, you could use File.ReadAllLines, but that will read the whole file into memory. Or you could write the equivalent of File.ReadLines yourself - it's not terribly hard.
Ultimately you're likely to be limited by file IO by the time you've got rid of the O(N * M) problem in your current code - there's not much way to get round that.
EDIT: For .NET 2, first let's implement something like ReadLines:
public static IEnumerable<string> ReadLines(string file)
{
using (TextReader reader = File.OpenText(file))
{
string line;
while ((line = reader.ReadLine()) != null)
{
yield return line;
}
}
}
Now we really want to use a HashSet<T>, but that wasn't in .NET 2 - so let's use Dictionary<TKey, TValue> instead:
Dictionary<string, string> map = new Dictionary<string, string>();
foreach (string line in ReadLines(path))
{
map[line] = line;
}
List<string> intersection = new List<string>();
foreach (string line in ReadLines(path2))
{
if (map.ContainsKey(line))
{
intersection.Add(line);
}
}
Try something like this to speed it up a bit ...
var path = string.Empty;
var path2 = string.Empty;
var strFirst = string.Empty;
var str = string.Empty;
var strarrFirst = new List<string>();
var strarrSecond = new List<string>();
using (var first = new StreamReader(path))
{
strFirst = first.ReadToEnd();
}
using (var second = new StreamReader(path2))
{
str = second.ReadToEnd();
}
strarrFirst.AddRange(strFirst.Split('\n'));
strarrSecond.AddRange(str.Split('\n'));
strarrSecond.Sort();
foreach(var value in strarrFirst)
{
var found = strarrSecond.BinarySearch(value) >= 0;
if (!found) Console.WriteLine(value);
}
Just for fun, I've tried Jon Skeet method and own:
var guidArray = Enumerable.Range(0, 1000000).Select(x => Guid.NewGuid().ToString()).ToList();
string path = "first.txt";
File.WriteAllLines(path, guidArray);
string path2 = "second.txt";
File.WriteAllLines(path2, guidArray.Select(x=>DateTime.UtcNow.Ticks % 2 == 0 ? x : Guid.NewGuid().ToString()));
var start = DateTime.Now;
var commonNames = File.ReadLines(path).Intersect(File.ReadLines(path2)).ToList();
Console.WriteLine((DateTime.Now - start).TotalMilliseconds);
start = DateTime.Now;
var lines = File.ReadAllLines(path);
var hashset = new HashSet<string>(lines);
var lines2 = File.ReadAllLines(path2);
var result = lines2.Where(hashset.Contains).ToList();
Console.WriteLine((DateTime.Now - start).TotalMilliseconds);
Console.ReadKey();
And Skeet's method was tiny bit faster (1453.0831 vs 1488.0851, iDevForFun method was quite slow - 12791.7316), so i think under layers should happen same thing as I was trying to do manually with hashset.

Categories

Resources