Check for existing file - c#

I need a way to check whether or not a picture already is in the lbl_Dias. If not add picture, if it is, move to next picture.
This is to make a list of images that is in a random order, but without dublicates.
What I got so far is this
protected void DiasShow()
{
string[] getFiles = Directory.GetFiles(HttpContext.Current.Server.MapPath("/CSS/Design/Page_Design/Dias/1920x1080/"));
for (int i = 0; i <= GetFiles.Length; i++)
{
Random FindRandom = new Random();
string RandomFileName = getFiles[FindRandom.Next(getFiles.Length)];
FileInfo ImageName = new FileInfo(RandomFileName);
string FileType = ImageName.Name.Substring(ImageName.Name.Length - 4);
if ((FileType.ToUpper() == ".JPG") || (FileType.ToUpper() == "JPEG"))
{
lbl_Dias.Text += "<img src=\"CSS/Design/Page_Design/Dias/1920x1080/" + ImageName.Name + "\" />";
}
}
}
I hope you guys can help, i'm kinda stock ^^

First of all, get rid of NumberOfImages, as it's pointless.
The whole foreach-loop is horrible since you're iterating through the collection where it isn't needed.
Secondly, you can use the Extension property of FileInfo to check for the extension string of a file - no need to substring, etc.
Thirdly, what exactly are you trying to do here?
You do realize that you will probably get random duplicates at the end of the loop, since you are not removing used images from the collection.
In the end, you don't need to check if a file exists at all, since you got it from a function that returns files that exist.
protected void DiasShow()
{
var mapPath = HttpContext.Current.Server.MapPath("/CSS/Design/Page_Design/Dias/1920x1080/");
var images =
Directory.GetFiles(mapPath).Select(
file => new FileInfo(file)).Where(fi =>
fi.Extension.EndsWith("jpg", StringComparison.OrdinalIgnoreCase) ||
fi.Extension.EndsWith("jpeg", StringComparison.OrdinalIgnoreCase)).ToList();
var rand = new Random();
while (images.Count > 0)
{
var i = rand.Next(images.Count);
lbl_Dias.Text += "<img src=\"CSS/Design/Page_Design/Dias/1920x1080/" + images[i].Name + "\"/>";
images.RemoveAt(i);
}
}

I presume you just want to list the files in a random order:
protected void DiasShow()
{
var getFiles = Directory.GetFiles(HttpContext.Current.Server.MapPath("/CSS/Design/Page_Design/Dias/1920x1080/")); //Find alle filer I en mappe
var random = RandomiseList(getFiles);
var txt = new StringBuilder();
foreach (var randomFileName in random)
{
var fileType = System.IO.Path.GetExtension(randomFileName).ToUpper();
if ((fileType == ".JPG") || (fileType == ".JPEG"))
{
var imageName = System.IO.Path.GetFileName(randomFileName);
txt.Append("<img src=\"CSS/Design/Page_Design/Dias/1920x1080/" + imageName+ "\" />");
}
}
lbl_Dias.Text += txt.ToString();
}
public static T[] RandomiseList<T>(T[] source)
{
var rand = new Random(); //no need for own seed
var list = new List<T>(source); //copy to a new list which we can remove from
var result = new T[source.Length];
for (int i = 0; i < source.Length; i++)
{
var listIndex = rand.Next(list.Count());
result[i]= list[listIndex];
list.RemoveAt(listIndex);
}
return result;
}

Since there is no code where you are checking for the file, I assume you are checking file names.
use Contains() Or Regex match after extracting out text from lbl_Dias.

You can make a list of filenames.
each time you assigned a filename you check if it's in the list, if not add it to the list, if it exists look for a different file.

To verify something is not in the text of your label have a corresponding dictionary and check it before adding it to the label such as
Dictionary<string, int> _filenames = new Dictionary<string,int>();
....
if (_filenames.ContainsKey( ImageName.Name ) == false)
{
_filenames.Add(ImageName.Name, 0);
lbl_Dias.Text += "<img src=\"CSS/Design/Page_Design/Dias/1920x1080/" + ImageName.Name + "\" />";
}

Related

Fastest way to split a large list of integers into List of strings in C#

My goal is to split a list of 24043 integers into strings like:
"?ids=" + "1,2,3...198,199,200"
Can you think of a better solution than mine in terms of performance?
public List<string> ZwrocListeZapytan(List<int> listaId)
{
var listaZapytan = new List<string>();
if (listaId.Count == 0) return listaZapytan;
var zapytanie = "?ids=";
var licznik = 1;
for (var i = 0; i < listaId.Count; i++)
{
if (licznik == 200 || i == listaId.Count - 1)
{
listaZapytan.Add(zapytanie + listaId[i]);
zapytanie = "?ids=";
licznik = 1;
}
else
{
zapytanie += listaId[i] + ",";
licznik++;
}
}
return listaZapytan;
}
Using Linq:
Assuming listaId is the list of integers that has to be converted:
var result = listaId.GroupBy(x => x / 200)
.Select(y => "?ids=" + string.Join(",", y)).ToList();
.GroupBy() helps take 200 at a time
.Select() is used to combine them together in the format like the OP suggested i.e ?ids=1,2,... using string.Join()
Can you think of a better solution than mine in terms of performance?
It terms of performance the only thing that comes to my mind as an enhancement for your code is to use a StringBuilder when you concatenate the string:
public List<string> ZwrocListeZapytan(List<int> listaId)
{
var listaZapytan = new List<string>();
if (listaId.Count == 0) return listaZapytan;
StringBuilder sb = new StringBuilder();
sb.Append("?ids=");
var licznik = 1;
for (var i = 0; i < listaId.Count; i++)
{
if (licznik == 200 || i == listaId.Count - 1)
{
listaZapytan.Add(sb.ToString() +listaId[i]);
sb.Clear();
sb.Append("?ids=");
licznik = 1;
}
else
{
sb.Append(listaId[i] + ",");
licznik++;
} return listaZapytan;
}
Otherwise you could make the for-loop run in steps of the 200. At each step take the numbers from the given range and use String.Join to create the string:
// TEST DATA
List<int> listaId = Enumerable.Range(1, 420).ToList();
List<string> listaZapytan = new List<string>();
int stepsize = 200;
for (int i = 0; i < listaId.Count; i +=stepsize)
{
listaZapytan.Add("?ids=" + String.Join(",", listaId.Skip(i).Take(stepsize)));
}
Could you please make a try with this and let me know whether this approach helps to solve your issue?
List<int> listaId = Enumerable.Range(0, 24043).ToList();
var items = String.Join("", Enumerable.Range(0, 24043)
.Select((x,i)=>i%200==0?
"\n?ids=" + x.ToString():
"," + x.ToString()));
Running Example
Here we are using Enumerable.Range to generate 24043 continuous numbers starting from 0. Then we can use the Select method to split them into a list of 200 and form the required string. If you want to get the output as a List, Remove the String.Join and add .ToList() at the end of the query. Current query produces output with 0-199 in the first list if you want 200 in that list means change the condition to i%201.

StreamWriter C# formatting output

Problem Statement
In order to run gene annotation software, I need to prepare two types of files, vcard files and coverage tables, and there has to be one-to-one match of vcard to coverage table. Since Im running 2k samples, its hard to identify which file is not one-to-one match. I know that both files have unique identifier numbers, hence, if both folders have files that have same unique numbers, i treat that as "same" file
I made a program that compares two folders and reports unique entries in each folder. To do so, I made two list that contains unique file names to each directory.
I want to format the report file (tab delimited .txt file) such that it looks something like below:
Unique in fdr1 Unique in fdr2
file x file a
file y file b
file z file c
I find this difficult to do because I have to iterate twice (since I have two lists), but there is no way of going back to the previous line in StreamWriter as far as I know. Basically, once I iterate through the first list and fill the first column, how can I fill the second column with the second list?
Can someone help me out with this?
Thanks
If design of the code has to change (i.e. one list instead of two), please let me know
As requested by some user, this is how I was going to do (not working version)
// Write report
using (StreamWriter sw = new StreamWriter(dest_txt.Text + #"\" + "Report.txt"))
{
// Write headers
sw.WriteLine("Unique Entries in Folder1" + "\t" + "Unique Entries in Folder2");
// Write unique entries in fdr1
foreach(string file in fdr1FileList)
{
sw.WriteLine(file + "\t");
}
// Write unique entries in fdr2
foreach (string file in fdr2FileList)
{
sw.WriteLine(file + "\t");
}
sw.Dispose();
}
As requested for my approach for finding unique entries, here's my code snippet
Dictionary<int, bool> fdr1Dict = new Dictionary<int, bool>();
Dictionary<int, bool> fdr2Dict = new Dictionary<int, bool>();
List<string> fdr1FileList = new List<string>();
List<string> fdr2FileList = new List<string>();
string fdr1Path = folder1_txt.Text;
string fdr2Path = folder2_txt.Text;
// File names in the specified directory; path not included
string[] fdr1FileNames = Directory.GetFiles(fdr1Path).Select(Path.GetFileName).ToArray();
string[] fdr2FileNames = Directory.GetFiles(fdr2Path).Select(Path.GetFileName).ToArray();
// Iterate through the first directory, and add GL number to dictionary
for(int i = 0; i < fdr1FileNames.Length; i++)
{
// Grabs only the number from the file name
string number = Regex.Match(fdr1FileNames[i], #"\d+").ToString();
int glNumber;
// Make sure it is a number
if(Int32.TryParse(number, out glNumber))
{
fdr1Dict[glNumber] = true;
}
// If number not present, raise exception
else
{
throw new Exception(String.Format("GL Number not found in: {0}", fdr1FileNames[i]));
}
}
// Iterate through the second directory, and add GL number to dictionary
for (int i = 0; i < fdr2FileNames.Length; i++)
{
// Grabs only the number from the file name
string number = Regex.Match(fdr2FileNames[i], #"\d+").ToString();
int glNumber;
// Make sure it is a number
if (Int32.TryParse(number, out glNumber))
{
fdr2Dict[glNumber] = true;
}
// If number not present, raise exception
else
{
throw new Exception(String.Format("GL Number not found in: {0}", fdr2FileNames[i]));
}
}
// Iterate through the first directory, and find files that are unique to it
for (int i = 0; i < fdr1FileNames.Length; i++)
{
int glNumber = Int32.Parse(Regex.Match(fdr1FileNames[i], #"\d+").Value);
// If same file is not present in the second folder add to the list
if(!fdr2Dict[glNumber])
{
fdr1FileList.Add(fdr1FileNames[i]);
}
}
// Iterate through the second directory, and find files that are unique to it
for (int i = 0; i < fdr2FileNames.Length; i++)
{
int glNumber = Int32.Parse(Regex.Match(fdr2FileNames[i], #"\d+").Value);
// If same file is not present in the first folder add to the list
if (!fdr1Dict[glNumber])
{
fdr2FileList.Add(fdr2FileNames[i]);
}
I am a quite confident that this will work as I've tested it:
static void Main(string[] args)
{
var firstDir = #"Path1";
var secondDir = #"Path2";
var firstDirFiles = System.IO.Directory.GetFiles(firstDir);
var secondDirFiles = System.IO.Directory.GetFiles(secondDir);
print2Dirs(firstDirFiles, secondDirFiles);
}
private static void print2Dirs(string[] firstDirFile, string[] secondDirFiles)
{
var maxIndex = Math.Max(firstDirFile.Length, secondDirFiles.Length);
using (StreamWriter streamWriter = new StreamWriter("result.txt"))
{
streamWriter.WriteLine(string.Format("{0,-150}{1,-150}", "Unique in fdr1", "Unique in fdr2"));
for (int i = 0; i < maxIndex; i++)
{
streamWriter.WriteLine(string.Format("{0,-150}{1,-150}",
firstDirFile.Length > i ? firstDirFile[i] : string.Empty,
secondDirFiles.Length > i ? secondDirFiles[i] : string.Empty));
}
}
}
It's a quite simple code but if you need help understanding it just let me know :)
I would construct each line at a time. Something like this:
int row = 0;
string[] fdr1FileList = new string[0];
string[] fdr2FileList = new string[0];
while (row < fdr1FileList.Length || row < fdr2FileList.Length)
{
string rowText = "";
rowText += (row >= fdr1FileList.Length ? "\t" : fdr1FileList[row] + "\t");
rowText += (row >= fdr2FileList.Length ? "\t" : fdr2FileList[row]);
row++;
}
Try something like this:
static void Main(string[] args)
{
Dictionary<int, string> fdr1Dict = FilesToDictionary(Directory.GetFiles("path1"));
Dictionary<int, string> fdr2Dict = FilesToDictionary(Directory.GetFiles("path2"));
var unique_f1 = fdr1Dict.Where(f1 => !fdr2Dict.ContainsKey(f1.Key)).ToArray();
var unique_f2 = fdr2Dict.Where(f2 => !fdr1Dict.ContainsKey(f2.Key)).ToArray();
int f1_size = unique_f1.Length;
int f2_size = unique_f2.Length;
int list_length = 0;
if (f1_size > f2_size)
{
list_length = f1_size;
Array.Resize(ref unique_f2, list_length);
}
else
{
list_length = f2_size;
Array.Resize(ref unique_f1, list_length);
}
using (StreamWriter writer = new StreamWriter("output.txt"))
{
writer.WriteLine(string.Format("{0,-30}{1,-30}", "Unique in fdr1", "Unique in fdr2"));
for (int i = 0; i < list_length; i++)
{
writer.WriteLine(string.Format("{0,-30}{1,-30}", unique_f1[i].Value, unique_f2[i].Value));
}
}
}
static Dictionary<int, string> FilesToDictionary(string[] filenames)
{
Dictionary<int, string> dict = new Dictionary<int, string>();
for (int i = 0; i < filenames.Length; i++)
{
int glNumber;
string filename = Path.GetFileName(filenames[i]);
string number = Regex.Match(filename, #"\d+").ToString();
if (int.TryParse(number, out glNumber))
dict.Add(glNumber, filename);
}
return dict;
}

Count the number of words within an Array or List

I need to count the number of words within an array or a list. The reason I say array or list is because I am not sure which would be the best to use in this situation. The data is static and in a .txt file (It's actually a book). I was able to create an array and break down words from the array but for the life of me I can not count! I have tried many different ways to do this and I'm thinking since it is a string it is unable to count. I have even teetered on the edge of just printing the whole book to a listbox and counting from the listbox but, that's ridiculous.
public partial class mainForm : Form
{
//------------------------
//GLOBAL VARIABLES:
//------------------------
List<string> countWords;
string[] fileWords;
string[] fileLines;
char[] delim = new char[] { ' ', ',','.','?','!' };
string path;
public mainForm()
{
InitializeComponent();
}
private void BookTitle() // TiTleAndAuthor Method will pull the Book Title and display it.
{
for (int i = 0; i < 1; i++)
{
bookTitleLabel.Text = fileLines[i];
}
}
private void BookAuthor() // TiTleAndAuthor Method will pull the Book Author and display it.
{
for (int i = 1; i < 2; i++)
{
bookAuthorLabel.Text = fileLines[i];
}
}
private void FirstLines() // FirstTenWords Method pulls the first ten words of any text file and prints the to a ListBox
{
for (int i = 0; i <= 499; i++)
{
wordsListBox.Items.Add(fileWords[i]);
}
}
private void WordCount() // Count all the words in the file.
{
}
private void openFileButton_Click(object sender, EventArgs e)
{
OpenFileDialog inputFile = new OpenFileDialog();
if (inputFile.ShowDialog() == DialogResult.OK) // check the file the user selected
{
path = inputFile.FileName; // save that path of the file to a string variable for later use
StreamReader fileRead = new StreamReader(path); // read a file at the path outlined in the path variable
fileWords = fileRead.ReadToEnd().Split(delim); // Breakdown the text into lines of text to call them at a later date
fileLines = File.ReadAllLines(path);
countWords = File.ReadLines(path).ToList();
wordsListBox.Items.Clear();
BookTitle();
BookAuthor();
FirstLines();
WordCount();
}
else
{
MessageBox.Show("Not a valid file, please select a text file");
}
}
}
Maybe this is useful:
static void Main(string[] args)
{
string[] lines = File_ReadAllLines();
List<string> words = new List<string>();
foreach(var line in lines)
{
words.AddRange(line.Split(' '));
}
Console.WriteLine(words.Count);
}
private static string[] File_ReadAllLines()
{
return new[] {
"The one book",
"written by gnarf",
"once upon a time ther werent any grammer",
"iso 1-12122-445",
"(c) 2012 under the hills"
};
}
Before I get to the answer, a quick observation on some of the loops:
for (int i = 1; i < 2; i++)
{
bookAuthorLabel.Text = fileLines[i];
}
This'll only run once, so it's pointless to have it in a loop (unless you intended this to actually loop through the whole list, in which case it's a bug). If this is the expected behavior, you might as well just do
bookAuthorLabel.Text = fileLines[1];
You have something similar here:
for (int i = 0; i < 1; i++)
{
bookTitleLabel.Text = fileLines[i];
}
Again, this is pointless.
Now for the answer itself. I'm not sure if you're trying to get total word count or count of individual words, so here's a code sample for doing both:
private static void CountWords()
{
const string fileName = "CountWords.txt";
// Create a dummy file
using (var sw = new StreamWriter(fileName))
{
sw.WriteLine("This is a short sentence");
sw.WriteLine("This is a long sentence");
}
string text = File.ReadAllText(fileName);
string[] result = text.Split(new[] { " ", Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries);
// Total word count
Console.WriteLine("Total count: " + result.Count().ToString());
// Now to illustrate getting the count of individual words
var dictionary = new Dictionary<string, int>();
foreach (string word in result)
{
if (dictionary.ContainsKey(word))
{
dictionary[word]++;
}
else
{
dictionary[word] = 1;
}
}
foreach (string key in dictionary.Keys)
{
Console.WriteLine(key + ": " + dictionary[key].ToString());
}
}
This should be easy to adapt to your particular needs in this case.
Read text file line by line. split by empty character and remove unnecessary spaces. sum this count to total.
var totalWords = 0;
using (StreamReader sr = new StreamReader("abc.txt"))
{
while (!sr.EndOfStream)
{
int count = sr
.ReadLine()
.Split(new char[] {' '}, StringSplitOptions.RemoveEmptyEntries).Count();
totalWords += count;
}
You can also use the below code:
totalWords = fileRead.ReadToEnd().Split(delim, StringSplitOptions.RemoveEmptyEntries).Length;

C# Variable not getting all values outside for loop

I have two values in the dictionary but when I try to get the two values outside the loop I am only getting one value. The locationdesc variable value are being overwritten. Is there a better way to create unique variables to handle this issues
There are two keys location-1 and location-2. I am trying to figure out how to get both the values outside the loop. Am I doing it wrong?
string locationDesc = "";
string locationAddress = "";
int count = dictionary.Count(D => D.Key.StartsWith("location-"));
for (int i = 1; i <= count; i++)
{
if (dictionary.ContainsKey("location-"+i))
{
string locationData = dictionary["location-"+i];
string[] locationDataRow = locationData.Split(':');
locationDesc = locationDataRow[0];
locationAddress = locationDataRow[1];
}
}
// Only getting location-2 value outside this loop since locationDesc is not unique.
Debug.WriteLine("Location Desc from dictionary is : " + locationDesc);
Debug.WriteLine("Location Add from dictionary is : " + locationAddress);
What I would like to get here is get both the values like locationDesc1 and locationDesc2 instead of locationDesc
What I am looking for is to create locationDesc and locationAddress unique so I can access both the values outside the for loop.
More Explanation as I was not very clear:
I have a dynamic table that will be created in the front end. Every time a location is created I create a cookie. For e.g. location-1, location-2 ...location-n with the location description and location values as values in the cookie. I am trying to access these values in the backend by creating a dictionary so I can assign all the values to unique variable which will make it easier for me to pass these values to a api call. I think I am over complicating a simple issue and might be doing it wrong.
My api call will be something like this:
<field="" path="" value=locationDesc1>
<field="" path="" value=locationDesc2>
The problem with your loop is that you are relying on the position of the entry in the dictionary matching the index within your loop. Your first line of code pretty much has it though:
int count = dictionary.Count(D => D.Key.StartsWith("location-"));
What this tells me is that you are looking for all entries in your dictionary where the key starts with "location-". So why not do that directly:
var values = dictionary.Where(d => d.Key.StartsWith("location-"));
And to do the extraction/string splitting at the same time:
var values = dictionary
.Where(d => d.Key.StartsWith("location-"))
.Select(d => d.Item.Split(':')
.Select(s => new
{
LocationDesc = s[0],
LocationAddress = s[1]
});
This will give you an IEnumerable of LocationDesc/LocationAddress pairs which you can loop over:
foreach(var pair in values)
{
Debug.WriteLine(pair.LocationDesc);
Debug.WriteLine(pair.LocationAddress);
}
Try this:
int count = dictionary.Count(D => D.Key.StartsWith("location-"));
Dictionary<string, string> values = new Dictionary<string, string>();
for (int i = 1; i <= count; i++)
{
if (dictionary.ContainsKey("location-"+i))
{
string locationData = dictionary["location-"+i];
string[] locationDataRow = locationData.Split(':');
values.Add(locationDataRow[0],locationDataRow[1]);
}
}
foreach (var item in values)
{
Debug.WriteLine(item.Key + " : " + item.Value);
}
As you are dealing with multiple values, you should go with a container where you can store all the values.
if you are dealing with only two unique values then use below code.
int count = dictionary.Count(D => D.Key.StartsWith("location-"));
string[] locationDesc = new string[2];
string[] locationAddress = new string[2];
for (int i = 1; i <= count; i++)
{
if (dictionary.ContainsKey("location-"+i))
{
string locationData = dictionary["location-"+i];
string[] locationDataRow = locationData.Split(':');
locationDesc[i-1] = locationDataRow[0];
locationAddress[i-1] = locationDataRow[1];
}
}
for (int i = 0; i <= locationDesc.Length-1; i++)
{
Debug.WriteLine("Location Desc from dictionary is : " + locationDesc[i]);
Debug.WriteLine("Location Add from dictionary is : " + locationAddress[i]);
}
if number of unique values is not fixed then go with ArrayList
int count = dictionary.Count(D => D.Key.StartsWith("location-"));
ArrayList locationDesc = new ArrayList();
ArrayList locationAddress = new ArrayList();
for (int i = 1; i <= count; i++)
{
if (dictionary.ContainsKey("location-"+i))
{
string locationData = dictionary["location-"+i];
string[] locationDataRow = locationData.Split(':');
locationDesc.Add(locationDataRow[0]);
locationAddress.Add(locationDataRow[1]);
}
}
for (int i = 0; i < locationDesc.Count; i++)
{
Debug.WriteLine("Location Desc from dictionary is : " + locationDesc[i]);
Debug.WriteLine("Location Add from dictionary is : " + locationAddress[i]);
}
Simple One. If you only want to show result using Debug.WriteLine, then go with below code
int count = dictionary.Count(D => D.Key.StartsWith("location-"));
for (int i = 1; i <= count; i++)
{
if (dictionary.ContainsKey("location-"+i))
{
string locationData = dictionary["location-"+i];
string[] locationDataRow = locationData.Split(':');
Debug.WriteLine("Location Desc from dictionary is : " + locationDataRow[0]);
Debug.WriteLine("Location Add from dictionary is : " + locationDataRow[1]);
}
}
Not able to prepare Code in Visual Studio at the moment therefore there may be some syntax errors.
It is hard to judge what you are event trying to do. I would not just be dumping objects you already have into other objects for fun. If you are just trying to expose values in a loop for use with another function, you can just use LINQ to iterate over the dictionary. If you want a specific value just add a where LINQ expression. LINQ should be in any .NET framework after 3.5 I believe.
public static void ApiMock(string s)
{
Console.WriteLine($"I worked on {s}!");
}
static void Main(string[] args)
{
var d = new Dictionary<int, string> {
{ 1, "location-1" },
{ 2, "location-2" },
{ 3, "location-3" }
};
d.ToList().ForEach(x => ApiMock(x.Value));
//I just want the second one
d.Where(x => x.Value.Contains("-2")).ToList().ForEach(x => ApiMock(x.Value));
//Do you want a concatenated string
var holder = string.Empty;
d.ToList().ForEach(x => holder += x.Value + ", ");
holder = holder.Substring(0, holder.Length - 2);
Console.WriteLine(holder);
}

How to loop through and compare millions of values in two text files?

I have two text files files (TXT) which contain over 2 million distinct file names. I want to loop through all the names in the first file and find those that are also present in the second text file.
I have tried looping through the StreamReader but it takes a lot of time. I also tried the code below, but it still takes too much time.
StreamReader first = new StreamReader(path);
string strFirst = first.ReadToEnd();
string[] strarrFirst = strFirst.Split('\n');
bool found = false;
StreamReader second = new StreamReader(path2);
string str = second.ReadToEnd();
string[] strarrSecond = str.Split('\n');
for (int j = 0; j < (strarrFirst.Length); j++)
{
found = false;
for (int i = 0; i < (strarrSecond .Length); i++)
{
if (strarrFirst[j] == strarrSecond[i])
{
found = true;
break;
}
}
if (!found)
{
Console.WriteLine(strarrFirst[j]);
}
}
What is a good way to compare the files?
How about this:
var commonNames = File.ReadLines(path).Intersect(File.ReadLines(path2));
That's O(N + M) instead of your current solution which tests every line in the first file with every line in the second file - O(N * M).
That's assuming you're using .NET 4. Otherwise, you could use File.ReadAllLines, but that will read the whole file into memory. Or you could write the equivalent of File.ReadLines yourself - it's not terribly hard.
Ultimately you're likely to be limited by file IO by the time you've got rid of the O(N * M) problem in your current code - there's not much way to get round that.
EDIT: For .NET 2, first let's implement something like ReadLines:
public static IEnumerable<string> ReadLines(string file)
{
using (TextReader reader = File.OpenText(file))
{
string line;
while ((line = reader.ReadLine()) != null)
{
yield return line;
}
}
}
Now we really want to use a HashSet<T>, but that wasn't in .NET 2 - so let's use Dictionary<TKey, TValue> instead:
Dictionary<string, string> map = new Dictionary<string, string>();
foreach (string line in ReadLines(path))
{
map[line] = line;
}
List<string> intersection = new List<string>();
foreach (string line in ReadLines(path2))
{
if (map.ContainsKey(line))
{
intersection.Add(line);
}
}
Try something like this to speed it up a bit ...
var path = string.Empty;
var path2 = string.Empty;
var strFirst = string.Empty;
var str = string.Empty;
var strarrFirst = new List<string>();
var strarrSecond = new List<string>();
using (var first = new StreamReader(path))
{
strFirst = first.ReadToEnd();
}
using (var second = new StreamReader(path2))
{
str = second.ReadToEnd();
}
strarrFirst.AddRange(strFirst.Split('\n'));
strarrSecond.AddRange(str.Split('\n'));
strarrSecond.Sort();
foreach(var value in strarrFirst)
{
var found = strarrSecond.BinarySearch(value) >= 0;
if (!found) Console.WriteLine(value);
}
Just for fun, I've tried Jon Skeet method and own:
var guidArray = Enumerable.Range(0, 1000000).Select(x => Guid.NewGuid().ToString()).ToList();
string path = "first.txt";
File.WriteAllLines(path, guidArray);
string path2 = "second.txt";
File.WriteAllLines(path2, guidArray.Select(x=>DateTime.UtcNow.Ticks % 2 == 0 ? x : Guid.NewGuid().ToString()));
var start = DateTime.Now;
var commonNames = File.ReadLines(path).Intersect(File.ReadLines(path2)).ToList();
Console.WriteLine((DateTime.Now - start).TotalMilliseconds);
start = DateTime.Now;
var lines = File.ReadAllLines(path);
var hashset = new HashSet<string>(lines);
var lines2 = File.ReadAllLines(path2);
var result = lines2.Where(hashset.Contains).ToList();
Console.WriteLine((DateTime.Now - start).TotalMilliseconds);
Console.ReadKey();
And Skeet's method was tiny bit faster (1453.0831 vs 1488.0851, iDevForFun method was quite slow - 12791.7316), so i think under layers should happen same thing as I was trying to do manually with hashset.

Categories

Resources