c# String Query - Getting latest file revision - c#

I am writing a form application where it displays all the PDF files from a directory in a Datagridview.
Now the files name format is usually 12740-250-B-File Name (So basically XXXXX-XXX-X-XXXXXXX).
So the first number is the project number, the second number followed after the dash is the series number- and the letter is the revision of the file.
I would like to have a button when pressed, it will find the files with the same series number (XXXXX-Series No - Revision - XXXXXX) and show me the latest revision which it will be the biggest letter, So between 12763-200-A-HelloWorld and from 12763-200-B-HelloWorld I want the 12763-200-B-HelloWorld to be the result of my query.
This is what I got so far:
private void button1_Click(object sender, EventArgs e)
{
}
private void button2_Click(object sender, EventArgs e)
{
String[] files = Directory.GetFiles(#"M:\Folder Directory","*.pdf*", SearchOption.AllDirectories);
DataTable table = new DataTable();
table.Columns.Add("File Name");
for (int i = 0; i < files.Length; i++)
{
FileInfo file = new FileInfo(files[i]);
table.Rows.Add(file.Name);
}
dataGridView1.DataSource = table;
}
Thanks in advance.
Note:
In the end, the files with the latest revision will be inserted in an excel spreadsheet.

Assuming your collection is a list of file names which is the result of Directory.GetFiles(""); The below linq will work. For the below to work you need to be certain of the file format as the splitting is very sensitive to a specific file format.
var seriesNumber = "200";
var files = new List<string> { "12763-200-A-HelloWorld", "12763-200-B-HelloWorld" };
var matching = files.Where(x => x.Split('-')[1] == seriesNumber)
.OrderByDescending(x => x.Split('-')[2])
.FirstOrDefault();
Result:
Matching: "12763-200-B-HelloWorld"

You can try the following:
string dirPath = #"M:\Folder Directory";
string filePattern = "*.pdf";
DirectoryInfo di = new DirectoryInfo(dirPath);
FileInfo[] files = di.GetFiles(filePattern, SearchOption.AllDirectories);
Dictionary<string, FileInfo> matchedFiles = new Dictionary<string, FileInfo>();
foreach (FileInfo file in files)
{
string filename = file.Name;
string[] seperatedFilename = filename.Split('-');
// We are assuming that filenames are consistent
// As such,
// the value at seperatedFilename[1] will always be Series No
// the value at seperatedFilename[2] will always be Revision
// If this is not the case in every scenario, the following code should expanded to allow other cases
string seriesNo = seperatedFilename[1];
string revision = seperatedFilename[2];
if (matchedFiles.ContainsKey(seriesNo))
{
FileInfo matchedFile = matchedFiles[seriesNo];
string matchedRevision = matchedFile.Name.Split('-')[2];
// Compare on the char values - https://learn.microsoft.com/en-us/dotnet/api/system.string.compareordinal?view=netframework-4.7.2
// If the value is int, then it can be cast to integer for comparison
if (String.CompareOrdinal(matchedRevision, seriesNo) > 0)
{
// This file is higher than the previous
matchedFiles[seriesNo] = file;
}
} else
{
// Record does not exist - so its is added by default
matchedFiles.Add(seriesNo, file);
}
}
// We have a list of all files which match our criteria
foreach (FileInfo file in matchedFiles.Values)
{
// TODO : Determine if the directory path is also required for the file
Console.WriteLine(file.FullName);
}
It splits the filename into component parts and compares the revision where the series names match; storing the result in a dictionary for further processing later.

This seems to be a good situation to use a dictionary in my opinion! You could try the following:
String[] files = new string[5];
//group of files with the same series number
files[0] = "12763-200-A-HelloWorld";
files[1] = "12763-200-X-HelloWorld";
files[2] = "12763-200-C-HelloWorld";
//another group of files with the same series number
files[3] = "12763-203-C-HelloWorld";
files[4] = "12763-203-Z-HelloWorld";
//all the discting series numbers, since split will the de second position of every string after the '-'
var distinctSeriesNumbers = files.Select(f => f.Split('-')[1]).Distinct();
Dictionary<String, List<String>> filesDictionary = new Dictionary<string, List<String>>();
//for each series number, we will try to get all the files and add them to dictionary
foreach (var serieNumber in distinctSeriesNumbers)
{
var filesWithSerieNumber = files.Where(f => f.Split('-')[1] == serieNumber).ToList();
filesDictionary.Add(serieNumber, filesWithSerieNumber);
}
List<String> listOfLatestSeries = new List<string>();
//here we will go through de dictionary and get the latest file of each series number
foreach (KeyValuePair<String, List<String>> entry in filesDictionary)
{
listOfLatestSeries.Add(entry.Value.OrderByDescending(d => d.Split('-')[2]).First());
}
//now we have the file with the last series number in the list
MessageBox.Show(listOfLatestSeries[0]); //result : "12763-200-X-HelloWorld"
MessageBox.Show(listOfLatestSeries[1]); //result : "12763-203-Z-HelloWorld";

Related

Get all files not present int a datatable

I have a MySQL table with a list of file name.
I would like to get a list of all file in a directory only if their names are not present in the table.
I can put the list of the database's file in a Datatable and write something like:
string[] files = Directory.GetFiles(directory);
foreach (Datarow row in dataTable.Rows)
for (int i=0; i<files.Length; i++)
if (row[0].equals(files[i]) {
files[i].delete();
break;
}
The upper code is only a pseudo-example. Can't I directly use Directory.GetFiles(directory) by specifying a filter in order to don't write all the iteraction?
please find code snippet below
decided to do it in steps - to have more maintainable code
void Main()
{
// given a list of files from db
DataTable dataTable = new DataTable("x");
dataTable.Columns.Add("file", typeof(string));
dataTable.Rows.Add("HaxLogs.txt");dataTable.Rows.Add("swapfile.sys");dataTable.Rows.Add("four.txt");
var directory = "c:\\";
var directoryFilesWithPaths = Directory.GetFiles(directory)
.Select( x=> new FileEntry { Path = x, FileName = Path.GetFileName(x)});
var directoryFiles = directoryFilesWithPaths.Select(x => x.FileName).ToList();
var filesList = (from DataRow dr in dataTable.Rows
select dr[0].ToString()).ToList();
var filesToProcess = directoryFiles.Except(filesList);
foreach (var file in filesToProcess)
{
// process file here
Console.WriteLine(file);
}
}
A linq solution is:
Directory.GetFiles(directory)
.Where(x => !dataTable.AsEnumerable()
.Select(row => row[0].ToString())
.Contains(x))
This is my solution:
ArrayList files = new ArrayList();
files.AddRange(Directory.GetFiles(directory, "*.*", SearchOption.AllDirectories));
foreach (DataRow row in tableFiles.Rows)
{
for (int i = 0; i < files.Count; i++)
if (files[i].ToString().EndsWith(row[0].ToString()))
{
files.RemoveAt(i);
break;
}
}
I also tried with Path.GetFileName(files[i].ToString() in order to use the Equals instead of EndsWith, but with 8500 files, this solution takes 2 seconds, with GetFileName 10 seconds.

Trying to query many text files in the same folder with linq

I need to search a folder containing csv files. The records i'm interested in have 3 fields: Rec, Country and Year. My job is to search the files and see if any of the files has records for more then a single year. Below the code i have so far:
// Get each individual file from the folder.
string startFolder = #"C:\MyFileFolder\";
System.IO.DirectoryInfo dir = new System.IO.DirectoryInfo(startFolder);
IEnumerable<System.IO.FileInfo> fileList = dir.GetFiles("*.*",
System.IO.SearchOption.AllDirectories);
var queryMatchingFiles =
from file in fileList
where (file.Extension == ".dat" || file.Extension == ".csv")
select file;
Then i'm came up with this code to read year field from each file and find those where year count is more than 1(The count part was not successfully implemented)
public void GetFileData(string filesname, char sep)
{
using (StreamReader reader = new StreamReader(filesname))
{
var recs = (from line in reader.Lines(sep.ToString())
let parts = line.Split(sep)
select parts[2]);
}
below a sample file:
REC,IE,2014
REC,DE,2014
REC,FR,2015
Now i'am struggling to combine these 2 ideas to solve my problem in a single query. The query should list those files that have record for more than a year.
Thanks in advance
Something along these lines:
string startFolder = #"C:\MyFileFolder\";
System.IO.DirectoryInfo dir = new System.IO.DirectoryInfo(startFolder);
IEnumerable<System.IO.FileInfo> fileList = dir.GetFiles("*.*",
System.IO.SearchOption.AllDirectories);
var fileData =
from file in fileList
where (file.Extension == ".dat" || file.Extension == ".csv")
select GetFileData(file, ',')
;
public string GetFileData(string filesname, char sep)
{
using (StreamReader reader = new StreamReader(filesname))
{
var recs = (from line in reader.Lines(sep.ToString())
let parts = line.Split(sep)
select parts[2]);
var multipleyears = recs.Distinct().Count();
if(multipleyears > 1)
return filename;
}
}
Not on my develop machine, so this might not compile "as is", but here's a direction
var lines = // file.readalllines();
var years = from line in lines
let parts = line.Split(new [] {','})
select parts[2]);
var distinct_years = years.Distinct();
if (distinct_years >1 )
// this file has several years
"My job is to search the files and see if any of the files has records
for more then a single year."
This specifies that you want a Boolean result, one that says if any of the files has those records.
For fun I'll extend it a little bit more:
My job is to get the collection of files where any of the records is about more than a single year.
You were almost there. Let's first declare a class with the records in your file:
public class MyRecord
{
public string Rec { get; set; }
public string CountryCode { get; set; }
public int Year { get; set; }
}
I'll make an extension method of the class FileInfo that will read the file and returns the sequence of MyRecords that is in it.
For extension methods see MSDN Extension Methods (C# Programming Guide)
public static class FileInfoExtension
{
public static IEnumerable<MyRecord> ReadMyRecords(this FileInfo file, char separator)
{
var records = new List<MyRecord>();
using (var reader = new StreamReader(file.FullName))
{
var lineToProcess = reader.ReadLine();
while (lineToProcess != null)
{
var splitLines = lineToProcess.Split(new char[] { separator }, 3);
if (splitLines.Length < 3) throw new InvalidDataException();
var record = new MyRecord()
{
Rec = splitLines[0],
CountryCode = splitLines[1],
Year = Int32.Parse(splitLines[2]),
};
records.Add(record);
lineToProcess = reader.ReadLine();
}
}
return records;
}
}
I could have used string instead of FileInfo, but IMHO a string is something completely different than a filename.
After the above you can write the following:
string startFolder = #"C:\MyFileFolder\";
var directoryInfo = new DirectoryInfo(startFolder);
var allFiles = directoryInfo.EnumerateFiles("*.*", SearchOption.AllDirectories);
var sequenceOfFileRecordCollections = allFiles.ReadMyRecords(',');
So now you have per file a sequence of the MyRecords in the file. You want to know which files have more than one year, Let's add another extension method to class FileInfoExtension:
public static bool IsMultiYear(this FileInfo file, char separator)
{
// read the file, only return true if there are any records,
// and if any record has a different year than the first record
var myRecords = file.ReadMyRecords(separator);
if (myRecords.Any())
{
int firstYear = myRecords.First().Year;
return myRecords.Any(record => record.Year != firstYear);
}
else
{
return false;
}
}
The sequence of file that have more than one year in it is:
allFiles.Where(file => file.IsMultiYear(',');
Put everything in one line:
var allFilesWithMultiYear = new DirectoryInfo(#"C:\MyFileFolder\")
.EnumerateFiles("*.*", SearchOption.AllDirectories)
.Where(file => file.IsMultiYear(',');
By creating two fairly simple extension methods your problem became one highly readable statement.

multiple foreach loops inside while loop

is it possible to include multiple "foreach" statements inside any of the looping constructs like while or for ... i want to open the .wav files from two different directories simultaneously so that i can compare files from both.
here is what i am trying to so but it is certainly wrong.. any help in this regard is appreciated.
string[] fileEntries1 = Directory.GetFiles(folder1, "*.wav");
string[] fileEntries2 = Directory.GetFiles(folder11, "*.wav");
while ( foreach(string fileName1 in fileEntries1) && foreach(string fileName2 in fileEntries2))
Gramatically speaking no. This is because a foreach construct is a statement whereas the tests in a while statement must be expressions.
Your best bet is to nest the foreach blocks:
foreach(string fileName1 in fileEntries1)
{
foreach(string fileName2 in fileEntries2)
I like this kind of statements in one line. So even though most of the answers here are correct, I give you this.
string[] fileEntries1 = Directory.GetFiles(folder1, "*.wav");
string[] fileEntries2 = Directory.GetFiles(folder11, "*.wav");
foreach( var fileExistsInBoth in fileEntries1.Where(fe1 => fileEntries2.Contains(fe1) )
{
/// here you will have the records which exists in both of the lists
}
Something like this since you only need to validate same file names:
IEnumerable<string> fileEntries1 = Directory.GetFiles(folder1, "*.wav").Select(x => Path.GetFileName(x));
IEnumerable<string> fileEntries2 = Directory.GetFiles(folder2, "*.wav").Select(x => Path.GetFileName(x));
IEnumerable<string> filesToIterate = (fileEntries1.Count() > fileEntries2.Count()) ? fileEntries1 : fileEntries2;
IEnumerable<string> filesToValidate = (fileEntries1.Count() < fileEntries2.Count()) ? fileEntries1 : fileEntries2;
// Iterate the bigger collection
foreach (string fileName in filesToIterate)
{
// Find the files in smaller collection
if (filesToValidate.Contains(fileName))
{
// Get actual file and compare
}
else
{
// File does not exist in another list. Handle appropriately
}
}
.Net 2.0 based solution:
List<string> fileEntries1 = new List<string>(Directory.GetFiles(folder1, "*.wav"));
List<string> fileEntries2 = new List<string>(Directory.GetFiles(folder2, "*.wav"));
List<string> filesToIterate = (fileEntries1.Count > fileEntries2.Count) ? fileEntries1 : fileEntries2;
filesToValidate = (fileEntries1.Count < fileEntries2.Count) ? fileEntries1 : fileEntries2;
string iteratorFileName;
string validatorFilePath;
// Iterate the bigger collection
foreach (string fileName in filesToIterate)
{
iteratorFileName = Path.GetFileName(fileName);
// Find the files in smaller collection
if ((validatorFilePath = FindFile(iteratorFileName)) != null)
{
// Compare fileName and validatorFilePath files here
}
else
{
// File does not exist in another list. Handle appropriately
}
}
FindFile method:
static List<string> filesToValidate;
private static string FindFile(string fileToFind)
{
string returnValue = null;
foreach (string filePath in filesToValidate)
{
if (string.Compare(Path.GetFileName(filePath), fileToFind, true) == 0)
{
// Found the file
returnValue = filePath;
break;
}
}
if (returnValue != null)
{
// File was found in smaller list. Remove this file from the list since we do not need to look for it again
filesToValidate.Remove(returnValue);
}
return returnValue;
}
You may or may not choose to make fields and methods static based on your needs.
If you want to iterate all pairs of files in both paths respectively, you can do it as follows.
string[] fileEntries1 = Directory.GetFiles(folder1, "*.wav");
string[] fileEntries2 = Directory.GetFiles(folder11, "*.wav");
foreach(string fileName1 in fileEntries1)
{
foreach(string fileName2 in fileEntries2)
{
// to the actual comparison
}
}
This is what I would suggest, using linq
using System.Linq;
var fileEntries1 = Directory.GetFiles(folder1, "*.wav");
var fileEntries2 = Directory.GetFiles(folder11, "*.wav");
foreach (var entry1 in fileEntries1)
{
var entries = fileEntries2.Where(x => Equals(entry1, x));
if (entries.Any())
{
//We have matches
//entries is a list of matches in fileentries2 for entry1
}
}
If you want to enable both collections "in parallel", then use their iterators like this:
var fileEntriesIterator1 = Directory.EnumerateFiles(folder1, "*.wav").GetEnumerator();
var fileEntriesIterator2 = Directory.EnumerateFiles(folder11, "*.wav").GetEnumerator();
while(fileEntriesIterator1.MoveNext() && fileEntriesIterator2.MoveNext())
{
var file1 = fileEntriesIterator1.Current;
var file2 = fileEntriesIterator2.Current;
}
If one collection is shorter than the other, this loop will end when the shorter collection has no more elements.

Find new file in two folders with a cross check

I am trying to sort two folders in to a patched folder, finding which file is new in the new folder and marking it as new, so i can transfer that file only. i dont care about dates or hash changes. just what file is in the new folder that is not in the old folder.
somehow the line
pf.NFile = !( oldPatch.FindAll(s => s.Equals(f)).Count() == 0);
is always returning false. is there something wrong with my logic of cross checking?
List<string> newPatch = DirectorySearch(_newFolder);
List<string> oldPatch = DirectorySearch(_oldFolder);
foreach (string f in newPatch)
{
string filename = Path.GetFileName(f);
string Dir = (Path.GetDirectoryName(f).Replace(_newFolder, "") + #"\");
PatchFile pf = new PatchFile();
pf.Dir = Dir;
pf.FName = filename;
pf.NFile = !( oldPatch.FindAll(s => s.Equals(f)).Count() == 0);
nPatch.Files.Add(pf);
}
foreach (string f in oldPatch)
{
string filename = Path.GetFileName(f);
string Dir = (Path.GetDirectoryName(f).Replace(_oldFolder, "") + #"\");
PatchFile pf = new PatchFile();
pf.Dir = Dir;
pf.FName = filename;
if (!nPatch.Files.Exists(item => item.Dir == pf.Dir &&
item.FName == pf.FName))
{
nPatch.removeFiles.Add(pf);
}
}
I don't have the classes you are using (like DirectorySearch and PatchFile), so i can't compile your code, but IMO the line _oldPatch.FindAll(... doesn't return anything because you are comparing the full path (c:\oldpatch\filea.txt is not c:\newpatch\filea.txt) and not the file name only. IMO your algorithm could be simplified, something like this pseudocode (using List.Contains instead of List.FindAll):
var _newFolder = "d:\\temp\\xml\\b";
var _oldFolder = "d:\\temp\\xml\\a";
List<FileInfo> missing = new List<FileInfo>();
List<FileInfo> nPatch = new List<FileInfo>();
List<FileInfo> newPatch = new DirectoryInfo(_newFolder).GetFiles().ToList();
List<FileInfo> oldPatch = new DirectoryInfo(_oldFolder).GetFiles().ToList();
// take all files in new patch
foreach (var f in newPatch)
{
nPatch.Add(f);
}
// search for hits in old patch
foreach (var f in oldPatch)
{
if (!nPatch.Select (p => p.Name.ToLower()).Contains(f.Name.ToLower()))
{
missing.Add(f);
}
}
// new files are in missing
One possible solution with less code would be to select the file names, put them into a list an use the predefined List.Except or if needed List.Intersect methods. This way a solution to which file is in A but not in B could be solved fast like this:
var locationA = "d:\\temp\\xml\\a";
var locationB = "d:\\temp\\xml\\b";
// takes file names from A and B and put them into lists
var filesInA = new DirectoryInfo(locationA).GetFiles().Select (n => n.Name).ToList();
var filesInB = new DirectoryInfo(locationB).GetFiles().Select (n => n.Name).ToList();
// Except retrieves all files that are in A but not in B
foreach (var file in filesInA.Except(filesInB).ToList())
{
Console.WriteLine(file);
}
I have 1.xml, 2.xml, 3.xml in A and 1.xml, 3.xml in B. The output is 2.xml - missing in B.

how to read text file and count the same names

I wanna create text file containing one name on each line. Compute the number of times any name occurs. Output one line for each name in file and on each line print the number of occurrences followed by name.
I can open the file by using this code
private void button1_Click(object sender, EventArgs e)
{
using (OpenFileDialog dlgOpen = new OpenFileDialog())
{
try
{
// Available file extensions
openFileDialog1.Filter = "All files(*.*)|*.*";
// Initial directory
openFileDialog1.InitialDirectory = "D:";
// OpenFileDialog title
openFileDialog1.Title = "Open";
// Show OpenFileDialog box
if (openFileDialog1.ShowDialog() == DialogResult.OK)
{
// Create new StreamReader
StreamReader sr = new StreamReader(openFileDialog1.FileName, Encoding.Default);
// Get all text from the file
string str = sr.ReadToEnd();
// Close the StreamReader
sr.Close();
// Show the text in the rich textbox rtbMain
}
}
catch (Exception errorMsg)
{
MessageBox.Show(errorMsg.Message);
}
}
}
But what I want is to use the same button to read and display it in groupbox.
As this is homework, I am not going to give you code, but hopefully enough info to point you in the right direction.
I suggest you use File.ReadAllLines to read the file into an array of strings, each item in the array is one line in the file. This means you do not have to split the file contents up yourself. Then you can loop over the string array, and add each line to a Dictionary, where the key is the line read from the file, and the value is the number of occurrences. You need to check whether the key is already in the Dictionary - if not add it with a count of 1, otherwise update the existing count (+1). After that loop, have a second loop which loops over the Dictionary contents, updating your textbox with the names and their counts.
(assuming this is a homework) I used File.ReadAllLine and Dictionary<TKey, TValue>:
var nameCount = new Dictionary<string, int>();
foreach (String s in File.ReadAllLines("filename"))
{
if (nameCount.ContainsKey(s))
{
nameCount[s] = nameCount[s] + 1;
}
else
{
nameCount.Add(s, 1);
}
}
// and printing
foreach (var pair in nameCount)
{
Console.WriteLine("{0} count:{1}", pair.Key, pair.Value);
}
You can do that using Linq, without having to increment a int variable. To finaly have a dictionary containing names and counts
string names = sr.ReadAllLines();
Dictionary<string, int> namesAndCount = new Dictionary<string, int>();
foreach(var name in names)
{
if(namesAndCount.ContainsKey(name))
continue;
var count = (from n in names
where n == name
select n).Count();
namesAndCount.Add(name, count);
}
Okay, a function like this will build you distinct names with counts.
private static IDictionary<string, int> ParseNameFile(string filename)
{
var names = new Dictionary<string, int>();
using (var reader = new StreamReader(filename))
{
var line = reader.ReadLine();
while (line != null)
{
if (names.ContainsKey(line))
{
names[line]++;
}
else
{
names.Add(line, 1);
}
line = reader.ReadLine();
}
}
}
Or you could do somthing flash with linq and readAllLines.
private static IDictionary<string, int> ParseNameFile(string filename)
{
return File.ReadAllLines(filename)
.OrderBy(n => n)
.GroupBy(n => n)
.ToDictionary(g => g.Key, g => g.Count);
}
The first option does have the adavantage of not loading the whole file into memory.
As for outputting the information,
var output = new StringBuilder();
foreach (valuePair in ParseNameFile(openFileDialog1.FileName))
{
output.AppendFormat("{0} {1}\n", valuePair.Key, valuePair.Value);
}
Then you ToString() on output to put the data anywhere you want. If there will very many rows, a StreamWriter approach would be preferred.
Similar question has been asked before:
A method to count occurrences in a list
In my opinion using LINQ query is a good option.
string[] file = File.ReadAllLines(openFileDialog1.FileName, Encoding.Default);
IEnumerable<string> groupQuery =
from name in file
group name by name into g
orderby g.Key
select g;
foreach (var g in groupQuery)
{
MessageBox.Show(g.Count() + " " + g.Key);
}

Categories

Resources