Find file name not present in a directory - c#

I have list if files names(not paths) in a List<string>. I have to find a List<string> of files that are NOT present in a directory.
Right now I am iterating the files one by one and checking each one of them to all files in the folder.
Is there any LINQ way of achiving the same thing ?

You can use Enumerable.Except like:
List<string> compareList = new List<string>();
//.... items in the list
DirectoryInfo di = new DirectoryInfo("C:\\");
var fileArray = di.GetFiles().Select(r => r.Name).ToArray();
var filesNotPresent = compareList.Except(fileArray);

Depending on how many names you have in your list (and how many files are in the destination), you may find that the iterative approach is still the most efficient (but with some LINQ enhancements):
var missingFiles = names.Where(x => !File.Exists(x));

Related

Adding two fileinfo[] together

FileInfo[] comList = new FileInfo[]{};
FileInfo[] files
DirectoryInfo dInfo;
string[] folderList = path.Split(',');
foreach (string folder in folderList){
dInfo = new DirectoryInfo(folder);
files = dInfo.GetFiles().Where(F => extensions.Contains(f.Extension.ToLower())).ToArray();
comList.Concat(files);
}
I am trying to read multiple folders and get all the files into one FileInfo[], but after doing the concat to comList, my comList is still empty.
the path input is something like string path = "pathA,pathB,pathC"
if this is not the way to do it, what is a better ways to get all the files from different directory into one.
According to Microsoft documentation Concat will returns a new collection without modifying existing one:
public static System.Collections.Generic.IEnumerable<TSource> Concat<TSource> (
this System.Collections.Generic.IEnumerable<TSource> first,
System.Collections.Generic.IEnumerable<TSource> second);
Therefore in your scenario assigning the statement back to comList as following and you're good to go!
comList = comList.Concat(files);
Nonetheless, if you're not constraining yourself to use array for comList, you may consider to use List instead which allow us to achieve the same without redundant cast:
List<FileInfo> comList = new List<FileInfo>();
...
foreach (string folder in folderList) {
var dInfo = new DirectoryInfo(folder);
var files = dInfo.GetFiles().Where(f =>
extensions.Contains(
f.Extension.ToLower()));
comList.AddRange(files);
}
You need to use the return of comList.Concat(files), like:
comList = comList.Concat(files).ToArray();
The ToArray() method is needed because Concat() returns an IEnumerable.
Alternatively you can make comList an actual List<FileInfo> and use its AddRange method in each iteration:
comList.AddRange(files);

C# - Faster way to check if a directory and sub-directories contain at least 1 file from a list of allowed file extensions?

Part of my program involves validating 2 directories if both contain at least 1 file from a given list of allowed extensions.
if (DoesDirectoryHaveValidFiles(directory1) &&
DoesDirectoryHaveValidFiles(directory2))
My current issue is that, given a directory tree that is deep enough, or having too much sub-directories, my current implementation takes too much time for what it does.
Can anyone give me some help how I can speed up my checking?
bool DoesDirectoryHaveValidFiles(string directory)
{
var allowedExtensions = new string[] {".aaa", ".bbb", ".ccc"};
var directoryInfo = new DirectoryInfo(directory);
var fileInfos = directoryInfo.GetFiles("*.*", SearchOption.AllDirectories)
.Where(file => allowedExtensions.Any(file.FullName.ToLower().EndsWith));
return fileInfos.Count() > 0;
}
UPDATE: After SBFrancies's help for the .Where to .Any, and Dour High Arch's help for the file.Extension, I have replaced it with the following code, and I think it is now fast enough for what I need it to do.
But if you guys have other comments about my implementation, please dont hesitate to inform me, as Im still learning with regards to these parts.
bool DoesDirectoryHaveValidFiles(string directory)
{
var allowedExtensions = new string[] {".aaa", ".bbb", ".ccc"};
var directoryInfo = new DirectoryInfo(directory);
var hasValidFiles = directoryInfo.GetFiles("*.*", SearchOption.AllDirectories)
.Any(file => allowedFileExtensions.Contains(file.Extension));
return hasValidFiles;
}

How can I return a directories files/folders without the directory path included?

I am trying to extract a list of files within a folder and am currently using:
string[] files = Directory.GetFiles(txtbxNewFolder.Text);
But that returns things like "C:\Users\Dahlia\Desktop\New Folder\jerry.txt". Is there a way to return only "jerry.txt", or do I need to do some sort of split on the array strings?
I am also trying to return a list of folders within a directory and am currently using:
string[] folders = Directory.GetDirectories(txtbxOldFolder.Text);
But that returns things like "C:\Users\Dahlia\Desktop\New Folder\folder1". Is there a way to return only "folder1", or do I need to do some sort of split on the array strings?
Using LINQ you can get a list of just the files:
Directory.GetFiles(txtbxNewFolder.Text).Select(f => Path.GetFileName(f));
Though rather than GetFiles I'd probably use:
Directory.EnumerateFiles(txtbxNewFolder.Text).Select(f => Path.GetFileName(f));
It isn't as simple to get the directory name, but this should work (untested):
Directory.GetDirectories(txtbxOldFolder.Text)
.Select(d => new DirectoryInfo(d).Name);
Similarly, there is a:
Directory.EnumerateDirectories(txtbxOldFolder.Text)
.Select(d => new DirectoryInfo(d).Name);
You could use Path.GetFileName and LINQ
e.g.:
string[] files = Directory.GetFiles(txtbxNewFolder.Text)
.Select(f => Path.GetFileName(s))
.ToArray();
Have a look at the FileInfo and DirectoryInfo classes.
You can do:
foreach (String file in files) {
var fi = new FileInfo(file);
Console.Out.WriteLine(fi.Name);
}
Similar for DirectoryInfo.

How to retrieve list of files in directory, sorted by name

I am trying to get a list of all files in a folder from C#. Easy enough:
Directory.GetFiles(folder)
But I need the result sorted alphabetically-reversed, as they are all numbers and I need to know the highest number in the directory. Of course I could grab them into an array/list object and then do a sort, but I was wondering if there is some filter/parameter instead?
They are all named with leading zeros. Like:
00000000001.log
00000000002.log
00000000003.log
00000000004.log
..
00000463245.log
00000853221.log
00024323767.log
Whats the easiest way? I dont need to get the other files, just the "biggest/latest" number.
var files = Directory.EnumerateFiles(folder)
.OrderByDescending(filename => filename);
(The EnumerateFiles method is new in .NET 4, you can still use GetFiles if you're using an earlier version)
EDIT: actually you don't need to sort the file names, if you use the MaxBy method defined in MoreLinq:
var lastFile = Directory.EnumerateFiles(folder).MaxBy(filename => filename);
var files = from file in Directory.GetFiles(folder)
orderby file descending
select file;
var biggest = files.First();
if you are really after the highest number and those logfiles are named like you suggested, how about:
Directory.GetFiles(folder).Length
Extending what #Thomas said, if you only need the top X files, you can do this:
int x = 10;
var files = Directory.EnumerateFiles(folder)
.OrderByDescending(filename => filename)
.Take(x);

Using Directory.GetFiles with regex-like filter

I have a folder with two files:
Awesome.File.20091031_123002.txt
Awesome.File.Summary.20091031_123152.txt
Additionally, a third-party app handles the files as follows:
Reads a folderPath and a searchPattern out of a database
Executes Directory.GetFiles(folderPath, searchPattern), processing whatever files match the filter in bulk, then moving the files to an archive folder.
It turns out that I have to move my two files into different archive folders, so I need to handle them separately by providing different searchPatterns to select them individually. Please note that I can't modify the third-party app, but I can modify the searchPattern and file destinations in my database.
What searchPattern will allow me to select Awesome.File.20091031_123002.txt without including Awesome.File.Summary.20091031_123152.txt?
If your were going to use LINQ then...
var regexTest = new Func<string, bool>(i => Regex.IsMatch(i, #"Awesome.File.(Summary)?.[\d]+_[\d]+.txt", RegexOptions.Compiled | RegexOptions.IgnoreCase));
var files = Directory.GetFiles(#"c:\path\to\folder").Where(regexTest);
Awesome.File.????????_??????.txt
The question mark (?) acts as a single character place holder.
I wanted to try my meager linq skills here... I'm sure there is a more elegant solution, but here's mine:
string pattern = ".SUMMARY.";
string[] awesomeFiles = System.IO.Directory.GetFiles("path\\to\\awesomefiles");
IEnumerable<string> sum_files = from file in awesomeFiles
where file.ToUpper().Contains(pattern)
select file;
IEnumerable<string> other_files = from file in awesomeFiles
where !file.ToUpper().Contains(pattern)
select file;
This assumes there aren't any other files in the directory other than the two, but you can adjust the pattern here to suit your needs (i.e. add "Awesome.File" to the pattern start.)
When you iterate the collection of each, you should get what you need.
According to the documentation, searchPattern only supports the ***** and ? wildcards. You would need to write your own regex filter that takes the results of Directory.GetFiles and applies further filtering logic.
If you don't want to use Linq, here's one way.
public void FileChecker(string filePath)
{
DirectoryInfo di = new DirectoryInfo(filePath);
int _MatchCounter;
string RegexPattern = "^[a-zA-Z_a-zA-Z_a-zA-Z_0-9_0-9_0-9.csv]*$";
Regex RegexPatternMatch = new Regex(RegexPattern, RegexOptions.IgnoreCase);
foreach (FileInfo matchingFile in di.GetFiles())
{
Match m = RegexPatternMatch.Match(matchingFile.Name);
if ((m.Success))
{
MessageBox.Show(matchingFile.Name);
_MatchCounter += 1;
}
}
}

Categories

Resources