Find new file in two folders with a cross check - c#

I am trying to sort two folders in to a patched folder, finding which file is new in the new folder and marking it as new, so i can transfer that file only. i dont care about dates or hash changes. just what file is in the new folder that is not in the old folder.
somehow the line
pf.NFile = !( oldPatch.FindAll(s => s.Equals(f)).Count() == 0);
is always returning false. is there something wrong with my logic of cross checking?
List<string> newPatch = DirectorySearch(_newFolder);
List<string> oldPatch = DirectorySearch(_oldFolder);
foreach (string f in newPatch)
{
string filename = Path.GetFileName(f);
string Dir = (Path.GetDirectoryName(f).Replace(_newFolder, "") + #"\");
PatchFile pf = new PatchFile();
pf.Dir = Dir;
pf.FName = filename;
pf.NFile = !( oldPatch.FindAll(s => s.Equals(f)).Count() == 0);
nPatch.Files.Add(pf);
}
foreach (string f in oldPatch)
{
string filename = Path.GetFileName(f);
string Dir = (Path.GetDirectoryName(f).Replace(_oldFolder, "") + #"\");
PatchFile pf = new PatchFile();
pf.Dir = Dir;
pf.FName = filename;
if (!nPatch.Files.Exists(item => item.Dir == pf.Dir &&
item.FName == pf.FName))
{
nPatch.removeFiles.Add(pf);
}
}

I don't have the classes you are using (like DirectorySearch and PatchFile), so i can't compile your code, but IMO the line _oldPatch.FindAll(... doesn't return anything because you are comparing the full path (c:\oldpatch\filea.txt is not c:\newpatch\filea.txt) and not the file name only. IMO your algorithm could be simplified, something like this pseudocode (using List.Contains instead of List.FindAll):
var _newFolder = "d:\\temp\\xml\\b";
var _oldFolder = "d:\\temp\\xml\\a";
List<FileInfo> missing = new List<FileInfo>();
List<FileInfo> nPatch = new List<FileInfo>();
List<FileInfo> newPatch = new DirectoryInfo(_newFolder).GetFiles().ToList();
List<FileInfo> oldPatch = new DirectoryInfo(_oldFolder).GetFiles().ToList();
// take all files in new patch
foreach (var f in newPatch)
{
nPatch.Add(f);
}
// search for hits in old patch
foreach (var f in oldPatch)
{
if (!nPatch.Select (p => p.Name.ToLower()).Contains(f.Name.ToLower()))
{
missing.Add(f);
}
}
// new files are in missing
One possible solution with less code would be to select the file names, put them into a list an use the predefined List.Except or if needed List.Intersect methods. This way a solution to which file is in A but not in B could be solved fast like this:
var locationA = "d:\\temp\\xml\\a";
var locationB = "d:\\temp\\xml\\b";
// takes file names from A and B and put them into lists
var filesInA = new DirectoryInfo(locationA).GetFiles().Select (n => n.Name).ToList();
var filesInB = new DirectoryInfo(locationB).GetFiles().Select (n => n.Name).ToList();
// Except retrieves all files that are in A but not in B
foreach (var file in filesInA.Except(filesInB).ToList())
{
Console.WriteLine(file);
}
I have 1.xml, 2.xml, 3.xml in A and 1.xml, 3.xml in B. The output is 2.xml - missing in B.

Related

Trying to query many text files in the same folder with linq

I need to search a folder containing csv files. The records i'm interested in have 3 fields: Rec, Country and Year. My job is to search the files and see if any of the files has records for more then a single year. Below the code i have so far:
// Get each individual file from the folder.
string startFolder = #"C:\MyFileFolder\";
System.IO.DirectoryInfo dir = new System.IO.DirectoryInfo(startFolder);
IEnumerable<System.IO.FileInfo> fileList = dir.GetFiles("*.*",
System.IO.SearchOption.AllDirectories);
var queryMatchingFiles =
from file in fileList
where (file.Extension == ".dat" || file.Extension == ".csv")
select file;
Then i'm came up with this code to read year field from each file and find those where year count is more than 1(The count part was not successfully implemented)
public void GetFileData(string filesname, char sep)
{
using (StreamReader reader = new StreamReader(filesname))
{
var recs = (from line in reader.Lines(sep.ToString())
let parts = line.Split(sep)
select parts[2]);
}
below a sample file:
REC,IE,2014
REC,DE,2014
REC,FR,2015
Now i'am struggling to combine these 2 ideas to solve my problem in a single query. The query should list those files that have record for more than a year.
Thanks in advance
Something along these lines:
string startFolder = #"C:\MyFileFolder\";
System.IO.DirectoryInfo dir = new System.IO.DirectoryInfo(startFolder);
IEnumerable<System.IO.FileInfo> fileList = dir.GetFiles("*.*",
System.IO.SearchOption.AllDirectories);
var fileData =
from file in fileList
where (file.Extension == ".dat" || file.Extension == ".csv")
select GetFileData(file, ',')
;
public string GetFileData(string filesname, char sep)
{
using (StreamReader reader = new StreamReader(filesname))
{
var recs = (from line in reader.Lines(sep.ToString())
let parts = line.Split(sep)
select parts[2]);
var multipleyears = recs.Distinct().Count();
if(multipleyears > 1)
return filename;
}
}
Not on my develop machine, so this might not compile "as is", but here's a direction
var lines = // file.readalllines();
var years = from line in lines
let parts = line.Split(new [] {','})
select parts[2]);
var distinct_years = years.Distinct();
if (distinct_years >1 )
// this file has several years
"My job is to search the files and see if any of the files has records
for more then a single year."
This specifies that you want a Boolean result, one that says if any of the files has those records.
For fun I'll extend it a little bit more:
My job is to get the collection of files where any of the records is about more than a single year.
You were almost there. Let's first declare a class with the records in your file:
public class MyRecord
{
public string Rec { get; set; }
public string CountryCode { get; set; }
public int Year { get; set; }
}
I'll make an extension method of the class FileInfo that will read the file and returns the sequence of MyRecords that is in it.
For extension methods see MSDN Extension Methods (C# Programming Guide)
public static class FileInfoExtension
{
public static IEnumerable<MyRecord> ReadMyRecords(this FileInfo file, char separator)
{
var records = new List<MyRecord>();
using (var reader = new StreamReader(file.FullName))
{
var lineToProcess = reader.ReadLine();
while (lineToProcess != null)
{
var splitLines = lineToProcess.Split(new char[] { separator }, 3);
if (splitLines.Length < 3) throw new InvalidDataException();
var record = new MyRecord()
{
Rec = splitLines[0],
CountryCode = splitLines[1],
Year = Int32.Parse(splitLines[2]),
};
records.Add(record);
lineToProcess = reader.ReadLine();
}
}
return records;
}
}
I could have used string instead of FileInfo, but IMHO a string is something completely different than a filename.
After the above you can write the following:
string startFolder = #"C:\MyFileFolder\";
var directoryInfo = new DirectoryInfo(startFolder);
var allFiles = directoryInfo.EnumerateFiles("*.*", SearchOption.AllDirectories);
var sequenceOfFileRecordCollections = allFiles.ReadMyRecords(',');
So now you have per file a sequence of the MyRecords in the file. You want to know which files have more than one year, Let's add another extension method to class FileInfoExtension:
public static bool IsMultiYear(this FileInfo file, char separator)
{
// read the file, only return true if there are any records,
// and if any record has a different year than the first record
var myRecords = file.ReadMyRecords(separator);
if (myRecords.Any())
{
int firstYear = myRecords.First().Year;
return myRecords.Any(record => record.Year != firstYear);
}
else
{
return false;
}
}
The sequence of file that have more than one year in it is:
allFiles.Where(file => file.IsMultiYear(',');
Put everything in one line:
var allFilesWithMultiYear = new DirectoryInfo(#"C:\MyFileFolder\")
.EnumerateFiles("*.*", SearchOption.AllDirectories)
.Where(file => file.IsMultiYear(',');
By creating two fairly simple extension methods your problem became one highly readable statement.

Whats the best way to search files in a directory for multiple extensions and get the last write time according to filename

I am having a hard time mixing types with linq in the forloop. Basically i need to search a directory with a dbname, not knowing if the file will be .bak or .7z. If there are multiple files with the same dbname i need to get the one with extention .7z. If there are multiple files with same dbname and extention .7z I need to get the file with the last write time. This is what i have so far.
string [] files = Directory.GetFiles(directory, "*.*", SearchOption.TopDirectoryOnly);
foreach (var fileName in files)
{
var dbName = "Test";
var extention7 = ".7z";
var extentionBak = ".bak";
if (fileName.Contains(dbName) && (fileName.Contains(extention7) || fileName.Contains(extentionBak)))
{
Console.WriteLine(fileName);
}
}
I wouldn't create a LINQ only solution for this - it will be too hard to understand.
Here is what I would do:
string GetDatabaseFile(string folder, string dbName)
{
var files =
Directory.EnumerateFiles(folder, dbName + "*.*")
.Select(x => new { Path = x, Extension = Path.GetExtension(x) })
.Where(x => x.Extension == ".7z" || x.Extension == ".bak")
.ToArray();
if(files.Length == 0)
return null;
if(files.Length == 1)
return files[0].Path;
var zippedFiles = files.Where(x => x.Extension == ".7z").ToArray();
if(zippedFiles.Length == 1)
return zippedFiles[0].Path;
return zippedFiles.OrderByDescending(x => File.GetLastWriteTime(x.Path))
.First().Path;
}
Please note that this doesn't take into account the case where there are no .7z files but multiple .bak files for a DB. If this scenario can occur, you need to extend the method accordingly.
Get files in directory:
var sourceFilePaths = Directory.EnumerateFiles(sourceDirectory).Where(f => Path.GetExtension(f).ToLower() == ".exe" ||
Path.GetExtension(f).ToLower() == ".dll" ||
Path.GetExtension(f).ToLower() == ".config");
.
.
.
File compare:
var sourceFileInfo = new FileInfo(filePath);
var destinationFileInfo = new FileInfo(destinationFilePath);
var isNewer = sourceFileInfo.LastWriteTime.CompareTo(destinationFileInfo.LastWriteTime) > 0;
Instead of packing everything in one if condition you should handle all cases separate:
var dbName = "Test";
var extention7 = ".7z";
var extentionBak = ".bak";
foreach (var fileName in files)
{
if (!fileName.Contains(dbName)) continue; // wrong base name
if (File.GetExtension(filename) == extention7)
{
// handle this case:
// extract file date
// remember latest file
}
else if(File.GetExtension(filename) == extentionBak)
{
// handle this case
}
}

multiple foreach loops inside while loop

is it possible to include multiple "foreach" statements inside any of the looping constructs like while or for ... i want to open the .wav files from two different directories simultaneously so that i can compare files from both.
here is what i am trying to so but it is certainly wrong.. any help in this regard is appreciated.
string[] fileEntries1 = Directory.GetFiles(folder1, "*.wav");
string[] fileEntries2 = Directory.GetFiles(folder11, "*.wav");
while ( foreach(string fileName1 in fileEntries1) && foreach(string fileName2 in fileEntries2))
Gramatically speaking no. This is because a foreach construct is a statement whereas the tests in a while statement must be expressions.
Your best bet is to nest the foreach blocks:
foreach(string fileName1 in fileEntries1)
{
foreach(string fileName2 in fileEntries2)
I like this kind of statements in one line. So even though most of the answers here are correct, I give you this.
string[] fileEntries1 = Directory.GetFiles(folder1, "*.wav");
string[] fileEntries2 = Directory.GetFiles(folder11, "*.wav");
foreach( var fileExistsInBoth in fileEntries1.Where(fe1 => fileEntries2.Contains(fe1) )
{
/// here you will have the records which exists in both of the lists
}
Something like this since you only need to validate same file names:
IEnumerable<string> fileEntries1 = Directory.GetFiles(folder1, "*.wav").Select(x => Path.GetFileName(x));
IEnumerable<string> fileEntries2 = Directory.GetFiles(folder2, "*.wav").Select(x => Path.GetFileName(x));
IEnumerable<string> filesToIterate = (fileEntries1.Count() > fileEntries2.Count()) ? fileEntries1 : fileEntries2;
IEnumerable<string> filesToValidate = (fileEntries1.Count() < fileEntries2.Count()) ? fileEntries1 : fileEntries2;
// Iterate the bigger collection
foreach (string fileName in filesToIterate)
{
// Find the files in smaller collection
if (filesToValidate.Contains(fileName))
{
// Get actual file and compare
}
else
{
// File does not exist in another list. Handle appropriately
}
}
.Net 2.0 based solution:
List<string> fileEntries1 = new List<string>(Directory.GetFiles(folder1, "*.wav"));
List<string> fileEntries2 = new List<string>(Directory.GetFiles(folder2, "*.wav"));
List<string> filesToIterate = (fileEntries1.Count > fileEntries2.Count) ? fileEntries1 : fileEntries2;
filesToValidate = (fileEntries1.Count < fileEntries2.Count) ? fileEntries1 : fileEntries2;
string iteratorFileName;
string validatorFilePath;
// Iterate the bigger collection
foreach (string fileName in filesToIterate)
{
iteratorFileName = Path.GetFileName(fileName);
// Find the files in smaller collection
if ((validatorFilePath = FindFile(iteratorFileName)) != null)
{
// Compare fileName and validatorFilePath files here
}
else
{
// File does not exist in another list. Handle appropriately
}
}
FindFile method:
static List<string> filesToValidate;
private static string FindFile(string fileToFind)
{
string returnValue = null;
foreach (string filePath in filesToValidate)
{
if (string.Compare(Path.GetFileName(filePath), fileToFind, true) == 0)
{
// Found the file
returnValue = filePath;
break;
}
}
if (returnValue != null)
{
// File was found in smaller list. Remove this file from the list since we do not need to look for it again
filesToValidate.Remove(returnValue);
}
return returnValue;
}
You may or may not choose to make fields and methods static based on your needs.
If you want to iterate all pairs of files in both paths respectively, you can do it as follows.
string[] fileEntries1 = Directory.GetFiles(folder1, "*.wav");
string[] fileEntries2 = Directory.GetFiles(folder11, "*.wav");
foreach(string fileName1 in fileEntries1)
{
foreach(string fileName2 in fileEntries2)
{
// to the actual comparison
}
}
This is what I would suggest, using linq
using System.Linq;
var fileEntries1 = Directory.GetFiles(folder1, "*.wav");
var fileEntries2 = Directory.GetFiles(folder11, "*.wav");
foreach (var entry1 in fileEntries1)
{
var entries = fileEntries2.Where(x => Equals(entry1, x));
if (entries.Any())
{
//We have matches
//entries is a list of matches in fileentries2 for entry1
}
}
If you want to enable both collections "in parallel", then use their iterators like this:
var fileEntriesIterator1 = Directory.EnumerateFiles(folder1, "*.wav").GetEnumerator();
var fileEntriesIterator2 = Directory.EnumerateFiles(folder11, "*.wav").GetEnumerator();
while(fileEntriesIterator1.MoveNext() && fileEntriesIterator2.MoveNext())
{
var file1 = fileEntriesIterator1.Current;
var file2 = fileEntriesIterator2.Current;
}
If one collection is shorter than the other, this loop will end when the shorter collection has no more elements.

Get Sub-Files with folders' IDs - Client Object Model

I want to get some folders - sub files. I have all folders SharePoint ID in a list. My code is working but it's performance very bad because there is a lot of context.ExecuteQuery;
I want to make it maybe with a Caml Query.
using (var context = new ClientContext("http://xxx/haberlesme/"))
{
var web = context.Web;
var list = context.Web.Lists.GetById(new Guid(target));
int[] siraliIdArray;
//siraliIdArray = loadSharePointIDList(); think like this
for (var j = 0; j < siraliIdArray.Count; j++)
{
var folderName = listItemCollection[j]["Title"].ToString();//Folder Name
var currentFolder = web.GetFolderByServerRelativeUrl("/haberlesme/Notice/" + folderName);
var currentFiles = currentFolder.Files;
context.Load(currentFiles);
//I don't want to execute for n folder n times. I want to execute n folder 1 time.
context.ExecuteQuery();
var ek = new LDTOTebligEk();
//I don't want to add one - one
foreach (var file1 in currentFiles)
{
ek.DokumanPath = urlPrefix + folderName + "/" + file1.Name;
ek.DokumanAd = file1.Name;
ekler.Add(ek);
}
}
}
For example I have 100 folder but I want to get 10 folders sub folder in one Execution
Since CSOM API supports Request Batching:
The CSOM programming model is built around request batching. When you
work with the CSOM, you can perform a series of data operations on the
ClientContext object. These operations are submitted to the server in
a single request when you call the ClientContext.BeginExecuteQuery
method.
you could refactor your code as demonstrated below:
var folders = new Dictionary<string,Microsoft.SharePoint.Client.Folder>();
var folderNames = new[] {"Orders","Requests"};
foreach (var folderName in folderNames)
{
var folderKey = string.Format("/Shared Documents/{0}", folderName);
folders[folderKey] = context.Web.GetFolderByServerRelativeUrl(folderKey);
context.Load(folders[folderKey],f => f.Files);
}
context.ExecuteQuery(); //execute request only once
//print all files
var allFiles = folders.SelectMany(folder => folder.Value.Files);
foreach (var file in allFiles)
{
Console.WriteLine(file.Name);
}
Use Caml Query:
Microsoft.SharePoint.Client.List list = clientContext.Web.Lists.GetByTitle("Main Folder");
Microsoft.SharePoint.Client.CamlQuery caml = new Microsoft.SharePoint.Client.CamlQuery();
caml.ViewXml = #"<View><Query><Where><Eq><FieldRef Name='FileLeafRef'/><Value Type='Folder'>SubFolderName</Value></Eq></Where></Query></View>";
caml.FolderServerRelativeUrl = " This line should be added if the main folder is not in the site layer";
Microsoft.SharePoint.Client.ListItemCollection items = list.GetItems(caml);
clientContext.Load(items);
//Get your folder using items[0]

Directory.GetFiles get today's files only

There is nice function in .NET Directory.GetFiles, it's simple to use it when I need to get all files from directory.
Directory.GetFiles("c:\\Files")
But how (what pattern) can I use to get only files that created time have today if there are a lot of files with different created time?
Thanks!
For performance, especially if the directory search is likely to be large, the use of Directory.EnumerateFiles(), which lazily enumerates over the search path, is preferable to Directory.GetFiles(), which eagerly enumerates over the search path, collecting all matches before filtering any:
DateTime today = DateTime.Now.Date ;
FileInfo[] todaysFiles = new DirectoryInfo(#"c:\foo\bar")
.EnumerateFiles()
.Select( x => {
x.Refresh();
return x;
})
.Where( x => x.CreationTime.Date == today || x.LastWriteTime == today )
.ToArray()
;
Note that the the properties of FileSystemInfo and its subtypes can be (and are) cached, so they do not necessarily reflect current reality on the ground. Hence, the call to Refresh() to ensure the data is correct.
Try this:
var todayFiles = Directory.GetFiles("path_to_directory")
.Where(x => new FileInfo(x).CreationTime.Date == DateTime.Today.Date);
You need to get the directoryinfo for the file
public List<String> getTodaysFiles(String folderPath)
{
List<String> todaysFiles = new List<String>();
foreach (String file in Directory.GetFiles(folderPath))
{
DirectoryInfo di = new DirectoryInfo(file);
if (di.CreationTime.ToShortDateString().Equals(DateTime.Now.ToShortDateString()))
todaysFiles.Add(file);
}
return todaysFiles;
}
You could use this code:
var directory = new DirectoryInfo("C:\\MyDirectory");
var myFile = (from f in directory.GetFiles()
orderby f.LastWriteTime descending
select f).First();
// or...
var myFile = directory.GetFiles()
.OrderByDescending(f => f.LastWriteTime)
.First();
see here: How to find the most recent file in a directory using .NET, and without looping?
using System.Linq;
DirectoryInfo info = new DirectoryInfo("");
FileInfo[] files = info.GetFiles().OrderBy(p => p.CreationTime).ToArray();
foreach (FileInfo file in files)
{
// DO Something...
}
if you wanted to break it down to a specific date you could try this using a filter
var files = from c in directoryInfo.GetFiles()
where c.CreationTime >dateFilter
select c;
You should be able to get through this:
var loc = new DirectoryInfo("C:\\");
var fileList = loc.GetFiles().Where(x => x.CreationTime.ToString("dd/MM/yyyy") == currentDate);
foreach (FileInfo fileItem in fileList)
{
//Process the file
}
var directory = new DirectoryInfo(Path.GetDirectoryName(#"--DIR Path--"));
DateTime from_date = DateTime.Now.AddDays(-5);
DateTime to_date = DateTime.Now.AddDays(5);
//For Today
var filesLst = directory.GetFiles().AsEnumerable()
.Where(file.CreationTime.Date == DateTime.Now.Date ).ToArray();
//For date range + specific file extension
var filesLst = directory.GetFiles().AsEnumerable()
.Where(file => file.CreationTime.Date >= from_date.Date && file.CreationTime.Date <= to_date.Date && file.Extension == ".txt").ToArray();
//To get ReadOnly files from directory
var filesLst = directory.GetFiles().AsEnumerable()
.Where(file => file.IsReadOnly == true).ToArray();
//To get files based on it's size
int fileSizeInKB = 100;
var filesLst = directory.GetFiles().AsEnumerable()
.Where(file => (file.Length)/1024 > fileSizeInKB).ToArray();

Categories

Resources