Get all files not present int a datatable

Get all files not present int a datatable - c#

I have a MySQL table with a list of file name.
I would like to get a list of all file in a directory only if their names are not present in the table.
I can put the list of the database's file in a Datatable and write something like:
string[] files = Directory.GetFiles(directory);
foreach (Datarow row in dataTable.Rows)
for (int i=0; i<files.Length; i++)
if (row[0].equals(files[i]) {
files[i].delete();
break;
}
The upper code is only a pseudo-example. Can't I directly use Directory.GetFiles(directory) by specifying a filter in order to don't write all the iteraction?

please find code snippet below
decided to do it in steps - to have more maintainable code
void Main()
{
// given a list of files from db
DataTable dataTable = new DataTable("x");
dataTable.Columns.Add("file", typeof(string));
dataTable.Rows.Add("HaxLogs.txt");dataTable.Rows.Add("swapfile.sys");dataTable.Rows.Add("four.txt");
var directory = "c:\\";
var directoryFilesWithPaths = Directory.GetFiles(directory)
.Select( x=> new FileEntry { Path = x, FileName = Path.GetFileName(x)});
var directoryFiles = directoryFilesWithPaths.Select(x => x.FileName).ToList();
var filesList = (from DataRow dr in dataTable.Rows
select dr[0].ToString()).ToList();
var filesToProcess = directoryFiles.Except(filesList);
foreach (var file in filesToProcess)
{
// process file here
Console.WriteLine(file);
}
}

A linq solution is:
Directory.GetFiles(directory)
.Where(x => !dataTable.AsEnumerable()
.Select(row => row[0].ToString())
.Contains(x))

This is my solution:
ArrayList files = new ArrayList();
files.AddRange(Directory.GetFiles(directory, "*.*", SearchOption.AllDirectories));
foreach (DataRow row in tableFiles.Rows)
{
for (int i = 0; i < files.Count; i++)
if (files[i].ToString().EndsWith(row[0].ToString()))
{
files.RemoveAt(i);
break;
}
}
I also tried with Path.GetFileName(files[i].ToString() in order to use the Equals instead of EndsWith, but with 8500 files, this solution takes 2 seconds, with GetFileName 10 seconds.

Related

C# To return File Path's Last Written Date

I have an action in Blueprism that will bring back all files from a base directory (including sub folders). I also need to return the last written date. Is this possible?
public DataTable Get_Files_ALL(string Path, string Pattern, bool Recursive)
{
DataTable Paths = new DataTable("Paths");
DataColumn PathId = new DataColumn("Path", typeof(String));
Paths.Columns.Add(PathId);
var String_Array = new string[] {};
if (Recursive)
String_Array = Directory.GetFiles(Path, Pattern, System.IO.SearchOption.AllDirectories);
else
String_Array = Directory.GetFiles(Path, Pattern, System.IO.SearchOption.TopDirectoryOnly);
int count = String_Array.Length;
foreach (string Path_Array in String_Array)
{
DataRow row1 = Paths.NewRow();
row1["Path"] = Path_Array;
Paths.Rows.Add(row1);
};
return Paths;
}

You can use the following to get the last modified data of a particular file:
System.IO.File.GetLastWriteTime(path)

Adding dictionary values to csv file

I'm playing about with working with a dictionary and and adding the contents of it within an existing csv file. This is what I have so far:
List<string> files = new List<string>();
files.Add("test1");
files.Add("test2");
Dictionary<string, List<string>> data = new Dictionary<string, List<string>>();
data.Add("Test Column", files.ToList());
foreach ( var columnData in data.Keys)
{
foreach (var rowData in data[columnData])
{
var csv = File.ReadLines(filePath.ToString()).Select((line, index) => index == 0
? line + "," + columnData.ToString()
: line + "," + rowData.ToString()).ToList();
File.WriteAllLines(filePath.ToString(), csv);
}
}
This sort of works but not the way I'm intending. What I would like the output to be is something along the lines
but what I'm actually getting is:
as you'll be able to see I'm getting 2 columns instead of just 1 with a column each for both list values and the values repeating on every single row. How can I fix it so that it's like how I've got in the first image? I know it's something to do with my foreach loop and the way I'm inputting the data into the file but I'm just not sure how to fix it
Edit:
So I have the read, write and AddToCsv methods and when I try it like so:
File.WriteAllLines("file.csv", new string[] { "Col0,Col1,Col2", "0,1,2", "1,2,3", "2,3,4", "3,4,5" });
var filePath = "file.csv";
foreach (var line in File.ReadLines(filePath))
Console.WriteLine(line);
Console.WriteLine("\n\n");
List<string> files = new List<string>() { "test1", "test2" };
List<string> numbers = new List<string>() { "one", "two", "three", "four", "five" };
Dictionary<string, List<string>> newData = new Dictionary<string, List<string>>() {
{"Test Column", files},
{"Test2", numbers}
};
var data1 = ReadCsv(filePath);
AddToCsv(data1, newData);
WriteCsv(filePath.ToString(), data1);
It works perfectly but when I have the file path as an already created file like so:
var filePath = exportFile.ToString();
I get the error:
Message :Index was out of range. Must be non-negative and less than the size of the collection. (Parameter 'index')
Source :System.Private.CoreLib
Stack : at System.Collections.Generic.List1.get_Item(Int32 index) at HMHExtract.Runner.ReadCsv(String path) in C:\tfs\Agility\Client\HMH Extract\HMHExtract\Runner.cs:line 194 at HMHExtract.Runner.Extract(Nullable1 ct) in C:\tfs\Agility\Client\HMH Extract\HMHExtract\Runner.cs:line 68
Target Site :Void ThrowArgumentOutOfRange_IndexException()
The lines in question are:
line 194 - var col = colNames[i]; of the ReadCsv method
line 68 - var data1 = ReadCsv(filePath);
Edit:
So after debugging I've figured out where the issue has come from.
In the csv I am trying to update there are 17 columns so obviously 17 rows of values. So the colNames count is 17. csvRecord Count = 0 and i goes up to 16.
However when it reaches a row where in one of the fields there are 2 values separated by a comma, it counts it s 2 row values instead of just 1 so for the row value instead of being string{17} it becomes string{18} and that causes the out of range error.
To clarify, for the row it gets to which causes the error one of the fields has the values Chris Jones, Malcolm Clark. Now instead of counting them as just 1 row, the method counts them as 2 separate ones, how can I change so it doesn't count them as 2 separate rows?

The best way is to read the csv file first into a list of records, and then add columns to each record. A record is a single row of the csv file, read as a Dictionary<string, string>. The keys of this dict are the column names, and the values are the elements of the row in that column.
public static void AddToCsv(string path, Dictionary<string, List<string>> newData)
{
var fLines = File.ReadLines(path);
var colNames = fLines.First().Split(',').ToList(); // col names in first line
List<Dictionary<string, string>> rowData = new List<Dictionary<string, string>>(); // A list of records for all other rows
foreach (var line in fLines.Skip(1)) // Iterate over second through last lines
{
var row = line.Split(',');
Dictionary<string, string> csvRecord = new Dictionary<string, string>();
// Add everything from this row to the record dictionary
for (int i = 0; i < row.Length; i++)
{
var col = colNames[i];
csvRecord[col] = row[i];
}
rowData.Add(csvRecord);
}
// Now, add new data
foreach (var newColName in newData.Keys)
{
var colData = newData[newColName];
for (int i = 0; i < colData.Count; i++)
{
if (i < rowData.Count) // If the row record already exists, add the new column to it
rowData[i].Add(newColName, colData[i]);
else // Add a row record with only this column
rowData.Add(new Dictionary<string, string>() { {newColName, colData[i]} });
}
colNames.Add(newColName);
}
// Now, write all the data
StreamWriter sw = new StreamWriter(path);
// Write header
sw.WriteLine(String.Join(",", colNames));
foreach (var row in rowData)
{
var line = new List<string>();
foreach (var colName in colNames) // Iterate over columns
{
if (row.ContainsKey(colName)) // If the row contains this column, add it to the line
line.Add(row[colName]);
else // Else add an empty string
line.Add("");
}
// Join all elements in the line with a comma, then write to file
sw.WriteLine(String.Join(",", line));
}
sw.Close();
}
To use this, let's create the following CSV file file.csv:
Col0,Col1,Col2
0,1,2
1,2,3
2,3,4
3,4,5
List<string> files = new List<string>() {"test1", "test2"};
List<string> numbers = new List<string>() {"one", "two", "three", "four", "five"};
Dictionary<string, List<string>> newData = new Dictionary<string, List<string>>() {
{"Test Column", files},
{"Test2", numbers}
}
AddToCsv("file.csv", newData);
And this results in file.csv being modified to:
Col0,Col1,Col2,Test Column,Test2
0,1,2,test1,one
1,2,3,test2,two
2,3,4,,three
3,4,5,,four
,,,,five
To make this more organized, I defined a struct CsvData to hold the column names and row records, and a function ReadCsv() that reads the file into this struct, and WriteCsv() that writes the struct to a file. Then separate responsibilities -- ReadCsv() only reads the file, WriteCsv() only writes the file, and AddToCsv() only adds to the file.
public struct CsvData
{
public List<string> ColNames;
public List<Dictionary<string, string>> RowData;
}
public static CsvData ReadCsv(string path)
{
List<string> colNames = new List<string>();
List<Dictionary<string, string>> rowData = new List<Dictionary<string, string>>(); // A list of records for all other rows
if (!File.Exists(path)) return new CsvData() {ColNames = colNames, RowData = rowData };
var fLines = File.ReadLines(path);
var firstLine = fLines.FirstOrDefault(); // Read the first line
if (firstLine != null) // Only try to parse the file if the first line actually exists.
{
colNames = firstLine.Split(',').ToList(); // col names in first line
foreach (var line in fLines.Skip(1)) // Iterate over second through last lines
{
var row = line.Split(',');
Dictionary<string, string> csvRecord = new Dictionary<string, string>();
// Add everything from this row to the record dictionary
for (int i = 0; i < row.Length; i++)
{
var col = colNames[i];
csvRecord[col] = row[i];
}
rowData.Add(csvRecord);
}
}
return new CsvData() {ColNames = colNames, RowData = rowData};
}
public static void WriteCsv(string path, CsvData data)
{
StreamWriter sw = new StreamWriter(path);
// Write header
sw.WriteLine(String.Join(",", data.ColNames));
foreach (var row in data.RrowData)
{
var line = new List<string>();
foreach (var colName in data.ColNames) // Iterate over columns
{
if (row.ContainsKey(colName)) // If the row contains this column, add it to the line
line.Add(row[colName]);
else // Else add an empty string
line.Add("");
}
// Join all elements in the line with a comma, then write to file
sw.WriteLine(String.Join(",", line));
}
sw.Close();
}
public static void AddToCsv(CsvData data, Dictionary<string, List<string>> newData)
{
foreach (var newColName in newData.Keys)
{
var colData = newData[newColName];
for (int i = 0; i < colData.Count; i++)
{
if (i < data.RowData.Count) // If the row record already exists, add the new column to it
data.RowData[i].Add(newColName, colData[i]);
else // Add a row record with only this column
data.RowData.Add(new Dictionary<string, string>() { {newColName, colData[i]} });
}
data.ColNames.Add(newColName);
}
}
Then, to use this, you do:
var data = ReadCsv(path);
AddToCsv(data, newData);
WriteCsv(path, data);

I managed to figure out a way that worked for me, might not be the most efficient but it does work. It involves using csvHelper
public static void AppendFile(FileInfo fi, List<string> newColumns, DataTable newRows)
{
var settings = new CsvConfiguration(new CultureInfo("en-GB"))
{
Delimiter = ";"
};
var dt = new DataTable();
using (var reader = new StreamReader(fi.FullName))
using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture))
{
using (var dataReader = new CsvDataReader(csv))
{
dt.Load(dataReader);
foreach (var title in newColumns)
{
dt.Columns.Add(title);
}
dt.Rows.Clear();
foreach (DataRow row in newRows.Rows)
{
dt.Rows.Add(row.ItemArray);
}
}
}
using var streamWriter = new StreamWriter(fi.FullName);
using var csvWriter = new CsvWriter(streamWriter, settings);
// Write columns
foreach (DataColumn column in dt.Columns)
{
csvWriter.WriteField(column.ColumnName);
}
csvWriter.NextRecord();
// Write row values
foreach (DataRow row in dt.Rows)
{
for (var i = 0; i < dt.Columns.Count; i++)
{
csvWriter.WriteField(row[i]);
}
csvWriter.NextRecord();
}
}
I start by getting the contents of the csv file into a data table and then adding in the new columns that I need. I then clear all the rows in the datatable and add new ones in (the data that is removed is added back in via the newRows parameter) and then write the datatable to the csv file

c# String Query - Getting latest file revision

I am writing a form application where it displays all the PDF files from a directory in a Datagridview.
Now the files name format is usually 12740-250-B-File Name (So basically XXXXX-XXX-X-XXXXXXX).
So the first number is the project number, the second number followed after the dash is the series number- and the letter is the revision of the file.
I would like to have a button when pressed, it will find the files with the same series number (XXXXX-Series No - Revision - XXXXXX) and show me the latest revision which it will be the biggest letter, So between 12763-200-A-HelloWorld and from 12763-200-B-HelloWorld I want the 12763-200-B-HelloWorld to be the result of my query.
This is what I got so far:
private void button1_Click(object sender, EventArgs e)
{
}
private void button2_Click(object sender, EventArgs e)
{
String[] files = Directory.GetFiles(#"M:\Folder Directory","*.pdf*", SearchOption.AllDirectories);
DataTable table = new DataTable();
table.Columns.Add("File Name");
for (int i = 0; i < files.Length; i++)
{
FileInfo file = new FileInfo(files[i]);
table.Rows.Add(file.Name);
}
dataGridView1.DataSource = table;
}
Thanks in advance.
Note:
In the end, the files with the latest revision will be inserted in an excel spreadsheet.

Assuming your collection is a list of file names which is the result of Directory.GetFiles(""); The below linq will work. For the below to work you need to be certain of the file format as the splitting is very sensitive to a specific file format.
var seriesNumber = "200";
var files = new List<string> { "12763-200-A-HelloWorld", "12763-200-B-HelloWorld" };
var matching = files.Where(x => x.Split('-')[1] == seriesNumber)
.OrderByDescending(x => x.Split('-')[2])
.FirstOrDefault();
Result:
Matching: "12763-200-B-HelloWorld"

You can try the following:
string dirPath = #"M:\Folder Directory";
string filePattern = "*.pdf";
DirectoryInfo di = new DirectoryInfo(dirPath);
FileInfo[] files = di.GetFiles(filePattern, SearchOption.AllDirectories);
Dictionary<string, FileInfo> matchedFiles = new Dictionary<string, FileInfo>();
foreach (FileInfo file in files)
{
string filename = file.Name;
string[] seperatedFilename = filename.Split('-');
// We are assuming that filenames are consistent
// As such,
// the value at seperatedFilename[1] will always be Series No
// the value at seperatedFilename[2] will always be Revision
// If this is not the case in every scenario, the following code should expanded to allow other cases
string seriesNo = seperatedFilename[1];
string revision = seperatedFilename[2];
if (matchedFiles.ContainsKey(seriesNo))
{
FileInfo matchedFile = matchedFiles[seriesNo];
string matchedRevision = matchedFile.Name.Split('-')[2];
// Compare on the char values - https://learn.microsoft.com/en-us/dotnet/api/system.string.compareordinal?view=netframework-4.7.2
// If the value is int, then it can be cast to integer for comparison
if (String.CompareOrdinal(matchedRevision, seriesNo) > 0)
{
// This file is higher than the previous
matchedFiles[seriesNo] = file;
}
} else
{
// Record does not exist - so its is added by default
matchedFiles.Add(seriesNo, file);
}
}
// We have a list of all files which match our criteria
foreach (FileInfo file in matchedFiles.Values)
{
// TODO : Determine if the directory path is also required for the file
Console.WriteLine(file.FullName);
}
It splits the filename into component parts and compares the revision where the series names match; storing the result in a dictionary for further processing later.

This seems to be a good situation to use a dictionary in my opinion! You could try the following:
String[] files = new string[5];
//group of files with the same series number
files[0] = "12763-200-A-HelloWorld";
files[1] = "12763-200-X-HelloWorld";
files[2] = "12763-200-C-HelloWorld";
//another group of files with the same series number
files[3] = "12763-203-C-HelloWorld";
files[4] = "12763-203-Z-HelloWorld";
//all the discting series numbers, since split will the de second position of every string after the '-'
var distinctSeriesNumbers = files.Select(f => f.Split('-')[1]).Distinct();
Dictionary<String, List<String>> filesDictionary = new Dictionary<string, List<String>>();
//for each series number, we will try to get all the files and add them to dictionary
foreach (var serieNumber in distinctSeriesNumbers)
{
var filesWithSerieNumber = files.Where(f => f.Split('-')[1] == serieNumber).ToList();
filesDictionary.Add(serieNumber, filesWithSerieNumber);
}
List<String> listOfLatestSeries = new List<string>();
//here we will go through de dictionary and get the latest file of each series number
foreach (KeyValuePair<String, List<String>> entry in filesDictionary)
{
listOfLatestSeries.Add(entry.Value.OrderByDescending(d => d.Split('-')[2]).First());
}
//now we have the file with the last series number in the list
MessageBox.Show(listOfLatestSeries[0]); //result : "12763-200-X-HelloWorld"
MessageBox.Show(listOfLatestSeries[1]); //result : "12763-203-Z-HelloWorld";

Given string array of column names, how do I read a .csv file to a DataTable?

Assume I have a .csv file with 70 columns, but only 5 of the columns are what I need. I want to be able to pass a method a string array of the columns names that I want, and for it to return a datatable.
private void method(object sender, EventArgs e) {
string[] columns =
{
#"Column21",
#"Column48"
};
DataTable myDataTable = Get_DT(columns);
}
public DataTable Get_DT(string[] columns) {
DataTable ret = new DataTable();
if (columns.Length > 0)
{
foreach (string column in columns)
{
ret.Columns.Add(column);
}
string[] csvlines = File.ReadAllLines(#"path to csv file");
csvlines = csvlines.Skip(1).ToArray(); //ignore the columns in the first line of the csv file
//this is where i need help... i want to use linq to read the fields
//of the each row with only the columns name given in the string[]
//named columns
}
return ret;
}

Read the first line of the file, line.Split(',') (or whatever your delimiter is), then get the index of each column name and store that.
Then for each other line, again do a var values = line.Split(','), then get the values from the columns.
Quick and dirty version:
string[] csvlines = File.ReadAllLines(#"path to csv file");
//select the indices of the columns we want
var cols = csvlines[0].Split(',').Select((val,i) => new { val, i }).Where(x => columns.Any(c => c == x.val)).Select(x => x.i).ToList();
//now go through the remaining lines
foreach (var line in csvlines.Skip(1))
{
var line_values = line.Split(',').ToList();
var dt_values = line_values.Where(x => cols.Contains(line_values.IndexOf(x)));
//now do something with the values you got for this row, add them to your datatable
}

You can look at https://joshclose.github.io/CsvHelper/
Think Reading individual fields is what you are looking for
var csv = new CsvReader( textReader );
while( csv.Read() )
{
var intField = csv.GetField<int>( 0 );
var stringField = csv.GetField<string>( 1 );
var boolField = csv.GetField<bool>( "HeaderName" );
}

We can easily do this without writing much code.
Exceldatareader is an awesome dll for that, it will directly as a datable from the excel sheet with just one method.
here is the links for example:http://www.c-sharpcorner.com/blogs/using-iexceldatareader1
http://exceldatareader.codeplex.com/
Hope it was useful kindly let me know your thoughts or feedbacks
Thanks
Karthik

var data = File.ReadAllLines(#"path to csv file");
// the expenses row
var query = data.Single(d => d[0] == "Expenses");
//third column
int column21 = 3;
return query[column21];

As others have stated a library like CsvReader can be used for this. As for linq, I don't think its suitable for this kind of job.
I haven't tested this but it should get you through
using (TextReader textReader = new StreamReader(filePath))
{
using (var csvReader = new CsvReader(textReader))
{
var headers = csvReader.FieldHeaders;
for (int rowIndex = 0; csvReader.Read(); rowIndex++)
{
var dataRow = dataTable.NewRow();
for (int chosenColumnIndex = 0; chosenColumnIndex < columns.Count(); chosenColumnIndex++)
{
for (int headerIndex = 0; headerIndex < headers.Length; headerIndex++)
{
if (headers[headerIndex] == columns[chosenColumnIndex])
{
dataRow[chosenColumnIndex] = csvReader.GetField<string>(headerIndex);
}
}
}
dataTable.Rows.InsertAt(dataRow, rowIndex);
}
}
}

LINQ: How i can user where condition multi value by array string?

I want to use the linq where condition multi value by array string is split(',')
I list data from data file in folder. (not in database)
Code c#
public List<sFile> GettingFiles(string path)
{
//Read File in folder
List<sFile> allfile = new List<sFile>();
DirectoryInfo di = new DirectoryInfo(path);
FileInfo[] fileinfo = di.GetFiles("*.*");
foreach (FileInfo item in fileinfo)
{
allfile.Add(new sFile
{
FileName = item.Name,
Seq = int.Parse(item.Name.Substring(12, item.Name.Length - 12)),
PmnCode = item.Name.Substring(7, item.Name.Length - 12),
Path = item.DirectoryName,
Size = formatSize(item.Length),
SizeInt = int.Parse(item.Length.ToString())
});
}
return allfile;
}
public void btnQuery_Click(object sender, EventArgs e)
{
List<sFile> allFiles = GettingFiles(path); //List file in Folder
string pmnCode = txtPMNCode.Text.ToString(); //AAAA, BBBBB, CCCCC, DDDDD
string[] subPmnCode = pmnCode.Split(',');
string totalPmnCode = string.Empty;
foreach (string item2 in subPmnCode)
{
var queryData = from d in allFiles.AsQueryable()
where (d.PmnCode.Contains(item2))
select d;
//Add Column
DataTable dt = new DataTable();
dt.Columns.Add(enmField.NAME.ToString());
dt.Columns.Add(enmField.SIZE.ToString());
dt.Columns.Add(enmField.MODIFY_DATE.ToString());
dt.Columns.Add(enmField.PATH.ToString());
DataRow myRow = dt.NewRow();
foreach (sFile item in queryData.ToList())
{
myRow = dt.NewRow();
myRow[enmField.NAME.ToString()] = item.FileName.Trim();
myRow[enmField.SIZE.ToString()] = item.Size.Trim();
myRow[enmField.MODIFY_DATE.ToString()] = item.Date;
myRow[enmField.PATH.ToString()] = item.Path.Trim() + "\\" + item.FileName.Trim();
dt.Rows.Add(myRow);
}
gvDetail.DataSource = dt;
gvDetail.DataBind();
}
}
Example Data
Pmn Code
AAAAA
BBBBB
CCCCC
DDDDD
I want query wheer condition by pmn_code is AAAAA,BBBBB, DDDDD
I want show data
var queryData = from d in allFiles.AsQueryable()
where (d.PmnCode.Contains("AAAAA") &&
d.PmnCode.Contains("BBBBB") &&
d.PmnCode.Contains("DDDDD")
)
select d;
But i can not query array string by result this.
How can i use array linq?
please help me. Thanks advance ;)

Maybe you can try:
var queryData = from p in allFiles.AsQueryable()
where subPmnCode.Any(val => p.PmnCode.Contains(val))
select p;

var queryData = from d in allFiles.AsQueryable()
where (subPmnCode.Any(s => s.Trim().Equals(d)))
select d;
Checks if d matches any of the elements in the subPmnCode array. I use trim to ensure that we ignore blank spaces generated by splitting the string into an array using the comma delimiter

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Get all files not present int a datatable - c#

A linq solution is: Directory.GetFiles(directory) .Where(x => !dataTable.AsEnumerable() .Select(row => row[0].ToString()) .Contains(x))

Related

C# To return File Path's Last Written Date

Adding dictionary values to csv file

c# String Query - Getting latest file revision

Given string array of column names, how do I read a .csv file to a DataTable?

LINQ: How i can user where condition multi value by array string?

Categories

Resources