C# Comparing two lists (FileData) - c#

I'm creating a program which is going through a folder structure. If something has changed, I want to write it into a list. My problem is that I don't know how to save the changes in the lstChanges when Comparing the two lists. What is the syntax for the if-statement? This is what I got for now:
public static void GoThroughFileSystem(DirectoryInfo x)
{
foreach (DirectoryInfo d in x.GetDirectories())
{
//Console.WriteLine("Folder: {0}", d.Name);
GoThroughFileSystem(d);
}
foreach (FileInfo f in x.GetFiles())
{
lstNew.Add(new FileData { path = f.FullName, ChangingDate = f.LastWriteTime });
if (!lstOld.Contains(new FileData { path = f.FullName, ChangingDate = f.LastWriteTime }))
{
lstChanges.Add(new FileData { path = f.FullName, ChangingDate = f.LastWriteTime });
}
}
}

Assuming you have the List<FileInfo> of files from the last iteration in your lstOld, you can update your if statement to
//using System.Linq;
if (!lstOld.Any(old => old.Path == f.FullName && old.ChangingDate == f.LastWriteTime))
List<>.Contains uses default quality comparer. So, creating a new FileInfo will not work, unless FileInfo implements IEquatable<T>.Equals() properly.

You can also try old fashion left outer join :)
var lParent = x.GetFiles();
var lChild = lstOld;
var differences = lParent.GroupJoin(
lChild,
p => p.FullName,
c => c.LastWriteTime,
(p, g) => g
.Select(c => new { FullName = p.FullName, LastWriteTime = c.LastWriteTime})
.DefaultIfEmpty(new { FullName = p.FullName, LastWriteTime = null}))
.SelectMany(g => g);
If your goal is to gather all unique values from both collections, the thing you need is called full outer join. For two identically typed collections you can just use union and remove common part:
lParent.Union(lChild).Except(lParent.Intersect(lChild));

Related

Remove duplicate files in different directories

I'm using Directory.EnumerateFiles to list files in two separate directories. Some of the files exist in both folders. How can I remove any duplicate files from the combined list?
try
{
corporateFiles = Directory.EnumerateFiles(#"\\" + corporateServer, "*.pdf", SearchOption.AllDirectories).ToList();
}
catch
{
corporateFiles = new List<string>();
}
try {
functionalFiles = Directory.EnumerateFiles(#"\\" + functionalServer, "*.pdf", SearchOption.AllDirectories).ToList();
}
catch
{
functionalFiles = new List<String>();
}
var combinedFiles = corporateFiles.Concat(functionalFiles);
It seems I cannot satisfy my lust for LINQ.
Here's a one-liner:
var combinedFiles = corporateFiles.Concat(functionalFiles.Where(x => !(corporateFiles.Select(y => y.Split('\\').Last()).ToList().Intersect(functionalFiles.Select(y => y.Split('\\').Last()))).Contains(x.Split('\\').Last())));
This keeps the filepaths from corporateFiles. You can swap them if you prefer otherwise.
I'll attempt to format this to be more readable.
EDIT: Here's the code abstracted out a bit, hopefully more readable:
// Get common file names:
var duplicateFileNames = corporateFiles.Select(y => y.Split('\\').Last()).ToList().Intersect(functionalFiles.Select(y => y.Split('\\').Last()));
// Remove entries in 'functionalFiles' that are duplicates:
var functionalFilesWithoutDuplicates = functionalFiles.Where(x => !duplicateFileNames.Contains(x.Split('\\').Last()));
// Combine the un-touched 'corporateFiles' with the filtered 'functionalFiles':
var combinedFiles = corporateFiles.Concat(functionalFilesWithoutDuplicates);
Use Union instead of Concat:
var combinedFiles = corporateFiles.Union(functionalFiles);
You can use the overload passing an IEqualityComparer<string> to compare using only the name part:
var combined = corporateFiles.Union(functionalFiles, new FileNameComparer())
class FileNameComparer : EqualityComparer<string>
{
public override bool Equals(string x, string y)
{
var name1 = Path.GetFileName(x);
var name2 = Path.GetFileName(y);
return name1 == name2;
}
public override int GetHashCode(string obj)
{
var name = Path.GetFileName(obj);
return name.GetHashCode();
}
}

Optimization of nested loops using LINQ

Can you please suggest how to write an optmized LINQ query for the following operation?
foreach (DataRow entry1 in table1.Rows)
{
var columnA = entry1["ColumnA"] as string;
if (!string.IsNullOrEmpty(columnA))
{
foreach (string entry2 in table2)
{
var dataExists = table3.Any(rows3 =>
!string.IsNullOrEmpty(rows3[entry2] as string)
&& columnA.IsEqual(rows3["ColumnB"] as string));
if (dataExists)
{
entry1[entry2] = Compute(columnA, entry2);
}
}
}
}
I tried with this, but the results don't match in terms of the unique iteration counts.
var t2t3Pair = from entry2 in table2
let entry3 = table3.FirstOrDefault(x =>
!string.IsNullOrEmpty(x[entry2] as string))
where entry3 != null
select new { entry2, entry3 };
var t1t3Pair = from pair in t2t3Pair
from entry1 in table1.AsEnumerable()
let columnA = entry1["ColumnA"] as string
where !string.IsNullOrEmpty(columnA)
&& columnA.IsEqual(pair.entry3["ColumnB"] as string)
select new { Entry1Alias = entry1, Entry2Alias = pair.entry2 };
foreach (var pair in t1t3Pair)
{
var columnA = (string)pair.Entry1Alias["ColumnA"];
pair.Entry1Alias[pair.Entry2Alias] = Compute(columnA, pair.Entry2Alias);
}
Note: IsEqual is my extension method to compare string without case sensitivity.
Apparently the bottleneck is the line
var dataExists = table3.Any(rows3 =>
!string.IsNullOrEmpty(rows3[entry2] as string)
&& columnA.IsEqual(rows3["ColumnB"] as string));
which is executed inside the innermost loop.
As usual, it can be optimized by preparing in advance a fast lookup data structure and use it inside the critical loop.
For your case, I would suggest something like this:
var dataExistsMap = table3.AsEnumerable()
.GroupBy(r => r["ColumnB"] as string)
.Where(g => !string.IsNullOrEmpty(g.Key))
.ToDictionary(g => g.Key, g => new HashSet<string>(
table2.Where(e => g.Any(r => !string.IsNullOrEmpty(r[e] as string)))
// Include the proper comparer if your IsEqual method is using non default string comparison
//, StringComparer.OrdinalIgnoreCase
)
);
foreach (DataRow entry1 in table1.Rows)
{
var columnA = entry1["ColumnA"] as string;
if (string.IsNullOrEmpty(columnA)) continue;
HashSet<string> dataExistsSet;
if (!dataExistsMap.TryGetValue(columnA, out dataExistsSet)) continue;
foreach (string entry2 in table2.Where(dataExistsSet.Contains))
entry1[entry2] = Compute(columnA, entry2);
}

Find First File in a Branching Directory

I'm trying to find the first .dcm in a directory tree then get the first full path (a/a/a/123.dcm) . However ignoring directories where the ie .dcm is not found.
example:
a/a/a/123.dcm
a/a/a/1234.dcm
a/a/a/12345.dcm
a/a/b/23.dcm
a/a/b/234.dcm
a/a/b/2345.dcm
a/a/c/23.dcm
a/a/c/234.dcm
a/a/c/2345.dcm
Answer should be: a/a/a/123.dcm, a/a/b/23.dcm and a/a/c/23.dcm
I tried:
var files = Directory.GetFiles(inputDir, "*.*", SearchOption.AllDirectories)
.Where(s => s.EndsWith(".dcm")).ToArray();
var dir = Directory.GetDirectories(inputDir, "*.*", SearchOption.AllDirectories).ToArray();
var biggest = files.First();
foreach (var item in dir)
{
DirectoryInfo di = new DirectoryInfo(item);
var q = from i in di.GetFiles("*.dcm", SearchOption.AllDirectories)
select i.Name;
var qq = q.First();
foreach (var items in qq)
{
Console.WriteLine(items);
}
}
However what I get is the answer for five directories. Answer:
a/a/a/123.dcm
a/a/a/123.dcm
a/a/a/123.dcm
a/a/b/23.dcm
a/a/c/23.dcm
I’m just wondering if there’s a simpler way to do this using LINQ or something else? Thank you so much for your help. Cheers.
Here's a LINQ version:
var inputDir = #"c:\\temp";
var files = Directory
.EnumerateFiles(inputDir, "*.dcm", SearchOption.AllDirectories)
.Select(f => new FileInfo(f))
.GroupBy(f => f.Directory.FullName, d => d, (d, f) => new { Directory = d, FirstFile = f.ToList().First() })
.ToList();
files.ForEach(f => Console.WriteLine("{0} {1}", f.Directory, f.FirstFile));

Find new file in two folders with a cross check

I am trying to sort two folders in to a patched folder, finding which file is new in the new folder and marking it as new, so i can transfer that file only. i dont care about dates or hash changes. just what file is in the new folder that is not in the old folder.
somehow the line
pf.NFile = !( oldPatch.FindAll(s => s.Equals(f)).Count() == 0);
is always returning false. is there something wrong with my logic of cross checking?
List<string> newPatch = DirectorySearch(_newFolder);
List<string> oldPatch = DirectorySearch(_oldFolder);
foreach (string f in newPatch)
{
string filename = Path.GetFileName(f);
string Dir = (Path.GetDirectoryName(f).Replace(_newFolder, "") + #"\");
PatchFile pf = new PatchFile();
pf.Dir = Dir;
pf.FName = filename;
pf.NFile = !( oldPatch.FindAll(s => s.Equals(f)).Count() == 0);
nPatch.Files.Add(pf);
}
foreach (string f in oldPatch)
{
string filename = Path.GetFileName(f);
string Dir = (Path.GetDirectoryName(f).Replace(_oldFolder, "") + #"\");
PatchFile pf = new PatchFile();
pf.Dir = Dir;
pf.FName = filename;
if (!nPatch.Files.Exists(item => item.Dir == pf.Dir &&
item.FName == pf.FName))
{
nPatch.removeFiles.Add(pf);
}
}
I don't have the classes you are using (like DirectorySearch and PatchFile), so i can't compile your code, but IMO the line _oldPatch.FindAll(... doesn't return anything because you are comparing the full path (c:\oldpatch\filea.txt is not c:\newpatch\filea.txt) and not the file name only. IMO your algorithm could be simplified, something like this pseudocode (using List.Contains instead of List.FindAll):
var _newFolder = "d:\\temp\\xml\\b";
var _oldFolder = "d:\\temp\\xml\\a";
List<FileInfo> missing = new List<FileInfo>();
List<FileInfo> nPatch = new List<FileInfo>();
List<FileInfo> newPatch = new DirectoryInfo(_newFolder).GetFiles().ToList();
List<FileInfo> oldPatch = new DirectoryInfo(_oldFolder).GetFiles().ToList();
// take all files in new patch
foreach (var f in newPatch)
{
nPatch.Add(f);
}
// search for hits in old patch
foreach (var f in oldPatch)
{
if (!nPatch.Select (p => p.Name.ToLower()).Contains(f.Name.ToLower()))
{
missing.Add(f);
}
}
// new files are in missing
One possible solution with less code would be to select the file names, put them into a list an use the predefined List.Except or if needed List.Intersect methods. This way a solution to which file is in A but not in B could be solved fast like this:
var locationA = "d:\\temp\\xml\\a";
var locationB = "d:\\temp\\xml\\b";
// takes file names from A and B and put them into lists
var filesInA = new DirectoryInfo(locationA).GetFiles().Select (n => n.Name).ToList();
var filesInB = new DirectoryInfo(locationB).GetFiles().Select (n => n.Name).ToList();
// Except retrieves all files that are in A but not in B
foreach (var file in filesInA.Except(filesInB).ToList())
{
Console.WriteLine(file);
}
I have 1.xml, 2.xml, 3.xml in A and 1.xml, 3.xml in B. The output is 2.xml - missing in B.

Directory.GetFiles get today's files only

There is nice function in .NET Directory.GetFiles, it's simple to use it when I need to get all files from directory.
Directory.GetFiles("c:\\Files")
But how (what pattern) can I use to get only files that created time have today if there are a lot of files with different created time?
Thanks!
For performance, especially if the directory search is likely to be large, the use of Directory.EnumerateFiles(), which lazily enumerates over the search path, is preferable to Directory.GetFiles(), which eagerly enumerates over the search path, collecting all matches before filtering any:
DateTime today = DateTime.Now.Date ;
FileInfo[] todaysFiles = new DirectoryInfo(#"c:\foo\bar")
.EnumerateFiles()
.Select( x => {
x.Refresh();
return x;
})
.Where( x => x.CreationTime.Date == today || x.LastWriteTime == today )
.ToArray()
;
Note that the the properties of FileSystemInfo and its subtypes can be (and are) cached, so they do not necessarily reflect current reality on the ground. Hence, the call to Refresh() to ensure the data is correct.
Try this:
var todayFiles = Directory.GetFiles("path_to_directory")
.Where(x => new FileInfo(x).CreationTime.Date == DateTime.Today.Date);
You need to get the directoryinfo for the file
public List<String> getTodaysFiles(String folderPath)
{
List<String> todaysFiles = new List<String>();
foreach (String file in Directory.GetFiles(folderPath))
{
DirectoryInfo di = new DirectoryInfo(file);
if (di.CreationTime.ToShortDateString().Equals(DateTime.Now.ToShortDateString()))
todaysFiles.Add(file);
}
return todaysFiles;
}
You could use this code:
var directory = new DirectoryInfo("C:\\MyDirectory");
var myFile = (from f in directory.GetFiles()
orderby f.LastWriteTime descending
select f).First();
// or...
var myFile = directory.GetFiles()
.OrderByDescending(f => f.LastWriteTime)
.First();
see here: How to find the most recent file in a directory using .NET, and without looping?
using System.Linq;
DirectoryInfo info = new DirectoryInfo("");
FileInfo[] files = info.GetFiles().OrderBy(p => p.CreationTime).ToArray();
foreach (FileInfo file in files)
{
// DO Something...
}
if you wanted to break it down to a specific date you could try this using a filter
var files = from c in directoryInfo.GetFiles()
where c.CreationTime >dateFilter
select c;
You should be able to get through this:
var loc = new DirectoryInfo("C:\\");
var fileList = loc.GetFiles().Where(x => x.CreationTime.ToString("dd/MM/yyyy") == currentDate);
foreach (FileInfo fileItem in fileList)
{
//Process the file
}
var directory = new DirectoryInfo(Path.GetDirectoryName(#"--DIR Path--"));
DateTime from_date = DateTime.Now.AddDays(-5);
DateTime to_date = DateTime.Now.AddDays(5);
//For Today
var filesLst = directory.GetFiles().AsEnumerable()
.Where(file.CreationTime.Date == DateTime.Now.Date ).ToArray();
//For date range + specific file extension
var filesLst = directory.GetFiles().AsEnumerable()
.Where(file => file.CreationTime.Date >= from_date.Date && file.CreationTime.Date <= to_date.Date && file.Extension == ".txt").ToArray();
//To get ReadOnly files from directory
var filesLst = directory.GetFiles().AsEnumerable()
.Where(file => file.IsReadOnly == true).ToArray();
//To get files based on it's size
int fileSizeInKB = 100;
var filesLst = directory.GetFiles().AsEnumerable()
.Where(file => (file.Length)/1024 > fileSizeInKB).ToArray();

Categories

Resources