I wanna create text file containing one name on each line. Compute the number of times any name occurs. Output one line for each name in file and on each line print the number of occurrences followed by name.
I can open the file by using this code
private void button1_Click(object sender, EventArgs e)
{
using (OpenFileDialog dlgOpen = new OpenFileDialog())
{
try
{
// Available file extensions
openFileDialog1.Filter = "All files(*.*)|*.*";
// Initial directory
openFileDialog1.InitialDirectory = "D:";
// OpenFileDialog title
openFileDialog1.Title = "Open";
// Show OpenFileDialog box
if (openFileDialog1.ShowDialog() == DialogResult.OK)
{
// Create new StreamReader
StreamReader sr = new StreamReader(openFileDialog1.FileName, Encoding.Default);
// Get all text from the file
string str = sr.ReadToEnd();
// Close the StreamReader
sr.Close();
// Show the text in the rich textbox rtbMain
}
}
catch (Exception errorMsg)
{
MessageBox.Show(errorMsg.Message);
}
}
}
But what I want is to use the same button to read and display it in groupbox.
As this is homework, I am not going to give you code, but hopefully enough info to point you in the right direction.
I suggest you use File.ReadAllLines to read the file into an array of strings, each item in the array is one line in the file. This means you do not have to split the file contents up yourself. Then you can loop over the string array, and add each line to a Dictionary, where the key is the line read from the file, and the value is the number of occurrences. You need to check whether the key is already in the Dictionary - if not add it with a count of 1, otherwise update the existing count (+1). After that loop, have a second loop which loops over the Dictionary contents, updating your textbox with the names and their counts.
(assuming this is a homework) I used File.ReadAllLine and Dictionary<TKey, TValue>:
var nameCount = new Dictionary<string, int>();
foreach (String s in File.ReadAllLines("filename"))
{
if (nameCount.ContainsKey(s))
{
nameCount[s] = nameCount[s] + 1;
}
else
{
nameCount.Add(s, 1);
}
}
// and printing
foreach (var pair in nameCount)
{
Console.WriteLine("{0} count:{1}", pair.Key, pair.Value);
}
You can do that using Linq, without having to increment a int variable. To finaly have a dictionary containing names and counts
string names = sr.ReadAllLines();
Dictionary<string, int> namesAndCount = new Dictionary<string, int>();
foreach(var name in names)
{
if(namesAndCount.ContainsKey(name))
continue;
var count = (from n in names
where n == name
select n).Count();
namesAndCount.Add(name, count);
}
Okay, a function like this will build you distinct names with counts.
private static IDictionary<string, int> ParseNameFile(string filename)
{
var names = new Dictionary<string, int>();
using (var reader = new StreamReader(filename))
{
var line = reader.ReadLine();
while (line != null)
{
if (names.ContainsKey(line))
{
names[line]++;
}
else
{
names.Add(line, 1);
}
line = reader.ReadLine();
}
}
}
Or you could do somthing flash with linq and readAllLines.
private static IDictionary<string, int> ParseNameFile(string filename)
{
return File.ReadAllLines(filename)
.OrderBy(n => n)
.GroupBy(n => n)
.ToDictionary(g => g.Key, g => g.Count);
}
The first option does have the adavantage of not loading the whole file into memory.
As for outputting the information,
var output = new StringBuilder();
foreach (valuePair in ParseNameFile(openFileDialog1.FileName))
{
output.AppendFormat("{0} {1}\n", valuePair.Key, valuePair.Value);
}
Then you ToString() on output to put the data anywhere you want. If there will very many rows, a StreamWriter approach would be preferred.
Similar question has been asked before:
A method to count occurrences in a list
In my opinion using LINQ query is a good option.
string[] file = File.ReadAllLines(openFileDialog1.FileName, Encoding.Default);
IEnumerable<string> groupQuery =
from name in file
group name by name into g
orderby g.Key
select g;
foreach (var g in groupQuery)
{
MessageBox.Show(g.Count() + " " + g.Key);
}
Related
in class i have a task to make a game, and i have to make a leaderboard/high score list using a .txt file and .txt file only.
my file is like this
name 1, 6
name 2, 3
name 3, 9
And what I want it to look like
name 3, 9
name 1, 6
name 2, 3
I can display the text file. I'm using a listbox, but I can't sort it before displaying.
here is the code snippet
private void Leaderboard_Load(object sender, EventArgs e)
{
string[] scores = File.ReadAllLines(filepath); //filepath is equal to #database.txt file
var orderedScores = scores.OrderBy(x => int.Parse(x.Split(',')[1]));
foreach (var entry in orderedScores)
{
Console.WriteLine(scores);
}
StreamReader sr = new StreamReader(#"database.txt");
while (line != null)
{
line = sr.ReadLine();
if (line != null)
{
scoreboard.Items.Add(line);
}
}
sr.Close();
}
I have tried modifying the code and used different symbols to split the words (; ? $) instead of , but it hasnt worked. There are no errors when running, it just doesnt sort the file. Is there something I am missing?
You're not actually sorting the file.
var orderedScores = scores.OrderBy(x => int.Parse(x.Split(',')[1]));
Sorts the lines read from the file, into a new collection, which you then proceed to print out.
StreamReader sr = new StreamReader(#"database.txt");
This is where you get your source for the scoreboard, the problem is its reading from the same file, and you never wrote back the sorted lines.
If you need to write the sorted list back to the file, you'd need to do something like:
var fileText = String.Join(System.Environment.NewLine, orderedScores);
File.WriteAllText(filepath, orderedScores);
But, you don't have to write it back to the file. If all you want to do is display the sorted list, and not write it back to the file, all you have to do is loop through orderedScores; like this:
private void Leaderboard_Load(object sender, EventArgs e)
{
string[] scores = File.ReadAllLines(filepath); //filepath is equal to #database.txt file
var orderedScores = scores.OrderBy(x => int.Parse(x.Split(',')[1]));
foreach (var entry in orderedScores)
{
Console.WriteLine(entry);
if (entry != null)
{
scoreboard.Items.Add(entry);
}
}
}
And technically, since ListBox supports AddRange, all you have to do is:
private void Leaderboard_Load(object sender, EventArgs e)
{
string[] scores = File.ReadAllLines(filepath);
scoreboard.Items.AddRange(scores.OrderBy(x => int.Parse(x.Split(',')[1])));
}
And as mentioned by #PaulF below you are writing out the unordered list each iteration through the ordered scores, you should be getting your list printed 3 times, unordered (which makes it appear as if your code is doing nothing, even though it is actually sorting the list)
You can fix this by replacing the Console.WriteLine(scores) with Console.WriteLine(entry)
Lastly, this still won't give you your desired result, as this will sort the file or the ListBox, but it'll be sorted ascending, you need to sort it descending:
private void Leaderboard_Load(object sender, EventArgs e)
{
string[] scores = File.ReadAllLines(filepath);
scoreboard.Items.AddRange(scores.OrderByDescending(x => int.Parse(x.Split(',')[1])));
}
You need to use OrderByDescending method.
private void Leaderboard_Load(object sender, EventArgs e)
{
string[] scores = File.ReadAllLines(filepath); //filepath is equal to #database.txt file
var orderedScores = scores.OrderByDescending(x => int.Parse(x.Split(',')[1]));
foreach (var entry in orderedScores)
{
Console.WriteLine(entry);
}
StreamReader sr = new StreamReader(#"database.txt");
while (line != null)
{
line = sr.ReadLine();
if (line != null)
{
scoreboard.Items.Add(line);
}
}
sr.Close();
}
I am writing a form application where it displays all the PDF files from a directory in a Datagridview.
Now the files name format is usually 12740-250-B-File Name (So basically XXXXX-XXX-X-XXXXXXX).
So the first number is the project number, the second number followed after the dash is the series number- and the letter is the revision of the file.
I would like to have a button when pressed, it will find the files with the same series number (XXXXX-Series No - Revision - XXXXXX) and show me the latest revision which it will be the biggest letter, So between 12763-200-A-HelloWorld and from 12763-200-B-HelloWorld I want the 12763-200-B-HelloWorld to be the result of my query.
This is what I got so far:
private void button1_Click(object sender, EventArgs e)
{
}
private void button2_Click(object sender, EventArgs e)
{
String[] files = Directory.GetFiles(#"M:\Folder Directory","*.pdf*", SearchOption.AllDirectories);
DataTable table = new DataTable();
table.Columns.Add("File Name");
for (int i = 0; i < files.Length; i++)
{
FileInfo file = new FileInfo(files[i]);
table.Rows.Add(file.Name);
}
dataGridView1.DataSource = table;
}
Thanks in advance.
Note:
In the end, the files with the latest revision will be inserted in an excel spreadsheet.
Assuming your collection is a list of file names which is the result of Directory.GetFiles(""); The below linq will work. For the below to work you need to be certain of the file format as the splitting is very sensitive to a specific file format.
var seriesNumber = "200";
var files = new List<string> { "12763-200-A-HelloWorld", "12763-200-B-HelloWorld" };
var matching = files.Where(x => x.Split('-')[1] == seriesNumber)
.OrderByDescending(x => x.Split('-')[2])
.FirstOrDefault();
Result:
Matching: "12763-200-B-HelloWorld"
You can try the following:
string dirPath = #"M:\Folder Directory";
string filePattern = "*.pdf";
DirectoryInfo di = new DirectoryInfo(dirPath);
FileInfo[] files = di.GetFiles(filePattern, SearchOption.AllDirectories);
Dictionary<string, FileInfo> matchedFiles = new Dictionary<string, FileInfo>();
foreach (FileInfo file in files)
{
string filename = file.Name;
string[] seperatedFilename = filename.Split('-');
// We are assuming that filenames are consistent
// As such,
// the value at seperatedFilename[1] will always be Series No
// the value at seperatedFilename[2] will always be Revision
// If this is not the case in every scenario, the following code should expanded to allow other cases
string seriesNo = seperatedFilename[1];
string revision = seperatedFilename[2];
if (matchedFiles.ContainsKey(seriesNo))
{
FileInfo matchedFile = matchedFiles[seriesNo];
string matchedRevision = matchedFile.Name.Split('-')[2];
// Compare on the char values - https://learn.microsoft.com/en-us/dotnet/api/system.string.compareordinal?view=netframework-4.7.2
// If the value is int, then it can be cast to integer for comparison
if (String.CompareOrdinal(matchedRevision, seriesNo) > 0)
{
// This file is higher than the previous
matchedFiles[seriesNo] = file;
}
} else
{
// Record does not exist - so its is added by default
matchedFiles.Add(seriesNo, file);
}
}
// We have a list of all files which match our criteria
foreach (FileInfo file in matchedFiles.Values)
{
// TODO : Determine if the directory path is also required for the file
Console.WriteLine(file.FullName);
}
It splits the filename into component parts and compares the revision where the series names match; storing the result in a dictionary for further processing later.
This seems to be a good situation to use a dictionary in my opinion! You could try the following:
String[] files = new string[5];
//group of files with the same series number
files[0] = "12763-200-A-HelloWorld";
files[1] = "12763-200-X-HelloWorld";
files[2] = "12763-200-C-HelloWorld";
//another group of files with the same series number
files[3] = "12763-203-C-HelloWorld";
files[4] = "12763-203-Z-HelloWorld";
//all the discting series numbers, since split will the de second position of every string after the '-'
var distinctSeriesNumbers = files.Select(f => f.Split('-')[1]).Distinct();
Dictionary<String, List<String>> filesDictionary = new Dictionary<string, List<String>>();
//for each series number, we will try to get all the files and add them to dictionary
foreach (var serieNumber in distinctSeriesNumbers)
{
var filesWithSerieNumber = files.Where(f => f.Split('-')[1] == serieNumber).ToList();
filesDictionary.Add(serieNumber, filesWithSerieNumber);
}
List<String> listOfLatestSeries = new List<string>();
//here we will go through de dictionary and get the latest file of each series number
foreach (KeyValuePair<String, List<String>> entry in filesDictionary)
{
listOfLatestSeries.Add(entry.Value.OrderByDescending(d => d.Split('-')[2]).First());
}
//now we have the file with the last series number in the list
MessageBox.Show(listOfLatestSeries[0]); //result : "12763-200-X-HelloWorld"
MessageBox.Show(listOfLatestSeries[1]); //result : "12763-203-Z-HelloWorld";
I converted an excel file into a CSV file. The file contains over 100k records. I'm wanting to search and return duplicate rows by searching the full name column. If the full name's match up I want the program to return the entire rows of the duplicates. I started with a code that returns a list of full names but that's about it.
I've listed the code that I have now below:
public static void readCells()
{
var dictionary = new Dictionary<string, int>();
Console.WriteLine("started");
var counter = 1;
var readText = File.ReadAllLines(path);
var duplicatedValues = dictionary.GroupBy(fullName => fullName.Value).Where(fullName => fullName.Count() > 1);
foreach (var s in readText)
{
var values = s.Split(new Char[] { ',' });
var fullName = values[3];
if (!dictionary.ContainsKey(fullName))
{
dictionary.Add(fullName, 1);
}
else
{
dictionary[fullName] += 1;
}
Console.WriteLine("Full Name Is: " + values[3]);
counter++;
}
}
}
I changed dictionary to use fullname as key :
public static void readCells()
{
var dictionary = new Dictionary<string, List<List<string>>>();
Console.WriteLine("started");
var counter = 1;
var readText = File.ReadAllLines(path);
var duplicatedValues = dictionary.GroupBy(fullName => fullName.Value).Where(fullName => fullName.Count() > 1);
foreach (var s in readText)
{
List<string> values = s.Split(new Char[] { ',' }).ToList();
string fullName = values[3];
if (!dictionary.ContainsKey(fullName))
{
List<List<string>> newList = new List<List<string>>();
newList.Add(values);
dictionary.Add(fullName, newList);
}
else
{
dictionary[fullName].Add(values);
}
Console.WriteLine("Full Name Is: " + values[3]);
counter++;
}
}
I've found that using Microsoft's built-in TextFieldParser (which you can use in c# despite being in the Microsoft.VisualBasic.FileIO namespace) can simplify reading and parsing of CSV files.
Using this type, your method ReadCells() can be modified into the following extension method:
using Microsoft.VisualBasic.FileIO;
public static class TextFieldParserExtensions
{
public static List<IGrouping<string, string[]>> ReadCellsWithDuplicatedCellValues(string path, int keyCellIndex, int nRowsToSkip /* = 0 */)
{
using (var stream = File.OpenRead(path))
using (var parser = new TextFieldParser(stream))
{
parser.SetDelimiters(new string[] { "," });
var values = parser.ReadAllFields()
// If your CSV file contains header row(s) you can skip them by passing a value for nRowsToSkip
.Skip(nRowsToSkip)
.GroupBy(row => row.ElementAtOrDefault(keyCellIndex))
.Where(g => g.Count() > 1)
.ToList();
return values;
}
}
public static IEnumerable<string[]> ReadAllFields(this TextFieldParser parser)
{
if (parser == null)
throw new ArgumentNullException();
while (!parser.EndOfData)
yield return parser.ReadFields();
}
}
Which you would call like:
var groups = TextFieldParserExtensions.ReadCellsWithDuplicatedCellValues(path, 3);
Notes:
TextFieldParser correctly handles cells with escaped, embedded commas which s.Split(new Char[] { ',' }) will not.
Since your CSV file has over 100k records I adopted a streaming strategy to avoid the intermediate string[] readText memory allocation.
You can try out Cinchoo ETL - an open source library to parse CSV file and identify the duplicates with few lines of code.
Sample CSV file (EmpDuplicates.csv) below
Id,Name
1,Tom
2,Mark
3,Lou
3,Lou
4,Austin
4,Austin
4,Austin
Here is how you can parse and identify the duplicate records
using (var parser = new ChoCSVReader("EmpDuplicates.csv").WithFirstLineHeader())
{
foreach (dynamic c in parser.GroupBy(r => r.Id).Where(g => g.Count() > 1).Select(g => g.FirstOrDefault()))
Console.WriteLine(c.DumpAsJson());
}
Output:
{
"Id": 3,
"Name": "Lou"
}
{
"Id": 4,
"Name": "Austin"
}
Hope this helps.
For more detailed usage of this library, visit CodeProject article at https://www.codeproject.com/Articles/1145337/Cinchoo-ETL-CSV-Reader
I have a dictionary like below where i store list of file names with key generated as Csv1,Csv2 based on number of files.
I have a string array like below :
string[] files = { "SampleCSVFile_5300kb1.csv,SampleCSVFile_5300kb2.csv", "SampleCSVFile_5300kb3.csv"};
int counter=1;
var dictionary = new Dictionary<string, string>();
foreach (var file in files)
{
dictionary.Add("CSV" + counter, file);
counter++;
}
foreach (var file in files)
{
string myValue;
if (dictionary.TryGetValue(file, out myValue)) // getting null in out value
}
When i try to search for SampleCSVFile_5300kb1.csv i am getting null in my myValue variable.
Screenshot:
Update:i realize that i was adding wrong key so changed it like below but still unable to find CSV1 in case of SampleCSVFile_5300kb1.csv:
foreach (var file in files)
{
dictionary.Add(file,"CSV" + counter);
counter++;
}
Based on the comment you said in Amir Popoviches answer. I think you should alter your dictionary construction.
So you will create a mapping from each of the .csv file(s) to the "CSV1" etc string.
var files = new[] { "SampleCSVFile_5300kb1.csv,SampleCSVFile_5300kb2.csv", "SampleCSVFile_5300kb3.csv" };
var counter = 1;
var dictionary = new Dictionary<string, string>();
foreach (var file in files)
{
if (string.IsNullOrWhiteSpace(file))
{
continue;
}
foreach (var item in file.Split(new[] { "," }, StringSplitOptions.RemoveEmptyEntries))
{
dictionary.Add(item, "CSV" + counter);
}
counter++;
}
And as you said in comments you want to find what "CSVX" file is for each of the files you have so we simulate you trying to find a match for these files. Notice that this array has all separate file names, in the upper array we had values comma separated so we group them together.
var files2 = new[] { "SampleCSVFile_5300kb1.csv", "SampleCSVFile_5300kb2.csv", "SampleCSVFile_5300kb3.csv" };
foreach (var file in files2)
{
string csvValue;
if (dictionary.TryGetValue(file, out csvValue))
{
Console.WriteLine("{0} -> {1}", file, csvValue);
}
}
This should output you
SampleCSVFile_5300kb1.csv -> CSV1
SampleCSVFile_5300kb2.csv -> CSV1
SampleCSVFile_5300kb3.csv -> CSV2
First argument in TryGetValue is key. So you should pass there "CSV" + counter to make it works.
https://msdn.microsoft.com/pl-pl/library/bb347013(v=vs.110).aspx
You are adding items to the dictionary with the following keys:
"CSV" + counter -> CSV1, CSV2...
And you are trying to find different values (e.g. "SampleCSVFile_5300kb1.csv,SampleCSVFile_5300kb2.csv") here:
foreach (var file in files)
{
string myValue;
if (dictionary.TryGetValue(file, out myValue)) // getting null in out value
}
Try below updated code:
string[] files = { "SampleCSVFile_5300kb1.csv,SampleCSVFile_5300kb2.csv", "SampleCSVFile_5300kb3.csv" };
int counter = 1;
var dictionary = new Dictionary<string, string>();
foreach (var file in files)
{
dictionary.Add("CSV" + counter, file);
counter++;
}
counter = 1;
foreach (var file in files)
{
string myValue;
//You need to pass key name here but you are passing value of it
//Need to update here
string keyName = "CSV" + counter;
if (dictionary.TryGetValue(keyName, out myValue)) ; // getting null in out value
counter++;
}
Iterate the dictionary an find your desired value using split by comma. you will get "SampleCSVFile_5300kb1.csv" and "SampleCSVFile_5300kb2.csv" into the fileName array for the same myvalKey
foreach (KeyValuePair<string, string> entry in dictionary)
{
string myvalKey = entry.Key;
string myval = entry.Value;
if (myval.Contains(',')) {
string[] fileNames = myval.Split(',');
}
}
From what I understand you seem to be looking for a way to match only part of a key. And while I suggest using the answers of Janne Matikainen and just add the parts of your key separately with the same value, regardless I will give you a way to match on a partial key using a bit of Linq.
string resultValue = null;
string resultKey = dictionary.Keys.FirstOrDefault(k => k.Contains(file));
if(resultKey != null)
resultValue = dictionary[resultKey];
this does assume only the first match is wanted, if you want all matching keys replace FirstOrDefault with Where.
Beware that while this code is easy it is not suitable for when performance is critical as you iterate over the keys basically using the dictionary as a
List<Tuple<string,string>>
I am trying to compare the value in the 0 index of an array on one line and the 0 index on the following line. Imagine a CSV where I have a unique identifier in the first column, a corresponding value in the second column.
USER1, 1P
USER1, 3G
USER2, 1P
USER3, 1V
I would like to check the value of [0] the next line (or previous if that's easier) to compare and if they are the same (as they are in the example) concatenate it to index 1. That is, the data should read as
USER1, 1P, 3G
USER2, 1P
USER3, 1V
before it gets passed onto the next function. So far I have
private void csvParse(string path)
{
using (TextFieldParser parser = new TextFieldParser(path))
{
parser.Delimiters = new string[] { "," };
while (!parser.EndOfData)
{
string[] parts = parser.ReadFields();
if (parts == null)
{
break;
}
contact.ContactId = parts[0];
long nextLine;
nextLine = parser.LineNumber+1;
//if line1 parts[0] == line2 parts[0] etc.
}
}
}
Does anyone have any suggestions? Thank you.
How about saving the array into a variable:
private void csvParse(string path)
{
using (TextFieldParser parser = new TextFieldParser(path))
{
parser.Delimiters = new string[] { "," };
string[] oldParts = new string[] { string.Empty };
while (!parser.EndOfData)
{
string[] parts = parser.ReadFields();
if (parts == null || parts.Length < 1)
{
break;
}
if (oldParts[0] == parts[0])
{
// concat logic goes here
}
else
{
contact.ContactId = parts[0];
}
long nextLine;
nextLine = parser.LineNumber+1;
oldParts = parts;
//if line1 parts[0] == line2 parts[0] etc.
}
}
}
If I understand you correctly, what you are asking is essentially "how do I group the values in the second column based on the values in the first column?".
A quick and quite succinct way of doing this would be to Group By using LINQ:
var linesGroupedByUser =
from line in File.ReadAllLines(path)
let elements = line.Split(',')
let user = new {Name = elements[0], Value = elements[1]}
group user by user.Name into users
select users;
foreach (var user in linesGroupedByUser)
{
string valuesAsString = String.Join(",", user.Select(x => x.Value));
Console.WriteLine(user.Key + ", " + valuesAsString);
}
I have left out the use of your TextFieldParser class, but you can easily use that instead. This approach does, however, require that you can afford to load all of the data into memory. You don't mention whether this is viable.
The easiest way to do something like this is to convert each line to an object. You can use CsvHelper, https://www.nuget.org/packages/CsvHelper/, to do the work for you or you can iterate each line and parse to an object. It is a great tool and it knows how to properly parse CSV files into a collection of objects. Then, whether you create the collection yourself or use CsvHelper, you can use Linq to GroupBy, https://msdn.microsoft.com/en-us/library/bb534304(v=vs.100).aspx, your "key" (in this case UserId) and Aggregate, https://msdn.microsoft.com/en-us/library/bb549218(v=vs.110).aspx, the other property into a string. Then, you can use the new, grouped by, collection for your end goal (write it to file or use it for whatever you need).
You're basically finding all the unique entries so put them into a dictionary with the contact id as the key. As follows:
private void csvParse(string path)
{
using (TextFieldParser parser = new TextFieldParser(path))
{
parser.Delimiters = new string[] { "," };
Dictionary<string, List<string>> uniqueContacts = new Dictionary<string, List<string>>();
while (!parser.EndOfData)
{
string[] parts = parser.ReadFields();
if (parts == null || parts.Count() != 2)
{
break;
}
//if contact id not present in dictionary add
if (!uniqueContacts.ContainsKey(parts[0]))
uniqueContacts.Add(parts[0],new List<string>());
//now there's definitely an existing contact in dic (the one
//we've just added or a previously added one) so add to the
//list of strings for that contact
uniqueContacts[parts[0]].Add(parts[1]);
}
//now do something with that dictionary of unique user names and
// lists of strings, for example dump them to console in the
//format you specify:
foreach (var contactId in uniqueContacts.Keys)
{
var sb = new StringBuilder();
sb.Append($"contactId, ");
foreach (var bit in uniqueContacts[contactId])
{
sb.Append(bit);
if (bit != uniqueContacts[contactId].Last())
sb.Append(", ");
}
Console.WriteLine(sb);
}
}
}