How can I optimize this method? - c#

I would like to optimize the following method, that returns the total file count of the specified folder and all subfolders, for speed and memory usage; any advice is appreciated.
Thanks.
private int countfiles(string srcdir)
{
try
{
DirectoryInfo dir = new DirectoryInfo(srcdir);
//if the source dir doesn't exist, throw an exception
if (!dir.Exists)
throw new ArgumentException("source dir doesn't exist -> " + srcdir);
int count = dir.GetFiles().Length;
//loop through each sub directory in the current dir
foreach (DirectoryInfo subdir in dir.GetDirectories())
{
//recursively call this function over and over again
count += countfiles(subdir.FullName);
}
//cleanup
dir = null;
return count;
}
catch (Exception exc)
{
MessageBox.Show(exc.Message);
return 0;
}
}
So I did some benchmarking with the suggestions that were proposed. Here are my findings:
My method, with recursion, is the slowest finding 9062 files in a directory tree in 6.234 seconds.
#Matthew’s answer, using SearchOption.AllDirectories, is the fastest finding the same 9062 files in 4.546 seconds
#Jeffery’s answer, using LINQ, is in the middle of the pack finding the same 9062 files in 5.562 seconds.
Thank you everyone for your suggestions.

Could you not change the entire method to:
int count = Directory.GetFiles(path, "*.*", SearchOption.AllDirectories).Length;

It looks pretty good to me, but I'd use a LINQ expression to get the count.
Try this:
int count = dir.GetFiles().Length + dir.GetDirectories().Sum(subdir =>countfiles(subdir.FullName));
Hope that helps!

I used the approach described here in the past, it shows with and without recursion and the one without is faster. Hope this helps ;-)
How to: Iterate Through a Directory Tree

If there are exceptions, then your user could end up seeing numerous message boxes as each call could show one. I'd consolidate them, allow the user can cancel the operation, or so that it'd back all the way out to the initial caller.

If you are using .NET 4.0, this is slightly faster but not by much.
static int RecurCount(string source)
{
int count = 0;
try
{
var dirs = Directory.EnumerateDirectories(source);
count = Directory.EnumerateFiles(source).Count();
foreach (string dir in dirs)
{
count += RecurCount(dir);
}
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
return count;
}

Related

C# Fastest way to grab all of the subdirectories from a given path [duplicate]

I have a base directory that contains several thousand folders. Inside of these folders there can be between 1 and 20 subfolders that contains between 1 and 10 files. I'd like to delete all files that are over 60 days old. I was using the code below to get the list of files that I would have to delete:
DirectoryInfo dirInfo = new DirectoryInfo(myBaseDirectory);
FileInfo[] oldFiles =
dirInfo.GetFiles("*.*", SearchOption.AllDirectories)
.Where(t=>t.CreationTime < DateTime.Now.AddDays(-60)).ToArray();
But I let this run for about 30 minutes and it still hasn't finished. I'm curious if anyone can see anyway that I could potentially improve the performance of the above line or if there is a different way I should be approaching this entirely for better performance? Suggestions?
This is (probably) as good as it's going to get:
DateTime sixtyLess = DateTime.Now.AddDays(-60);
DirectoryInfo dirInfo = new DirectoryInfo(myBaseDirectory);
FileInfo[] oldFiles =
dirInfo.EnumerateFiles("*.*", SearchOption.AllDirectories)
.AsParallel()
.Where(fi => fi.CreationTime < sixtyLess).ToArray();
Changes:
Made the the 60 days less DateTime constant, and therefore less CPU load.
Used EnumerateFiles.
Made the query parallel.
Should run in a smaller amount of time (not sure how much smaller).
Here is another solution which might be faster or slower than the first, it depends on the data:
DateTime sixtyLess = DateTime.Now.AddDays(-60);
DirectoryInfo dirInfo = new DirectoryInfo(myBaseDirectory);
FileInfo[] oldFiles =
dirInfo.EnumerateDirectories()
.AsParallel()
.SelectMany(di => di.EnumerateFiles("*.*", SearchOption.AllDirectories)
.Where(fi => fi.CreationTime < sixtyLess))
.ToArray();
Here it moves the parallelism to the main folder enumeration. Most of the changes from above apply too.
A possibly faster alternative is to use WINAPI FindNextFile. There is an excellent Faster Directory Enumeration Tool for this. Which can be used as follows:
HashSet<FileData> GetPast60(string dir)
{
DateTime retval = DateTime.Now.AddDays(-60);
HashSet<FileData> oldFiles = new HashSet<FileData>();
FileData [] files = FastDirectoryEnumerator.GetFiles(dir);
for (int i=0; i<files.Length; i++)
{
if (files[i].LastWriteTime < retval)
{
oldFiles.Add(files[i]);
}
}
return oldFiles;
}
EDIT
So, based on comments below, I decided to do a benchmark of suggested solutions here as well as others I could think of. It was interesting enough to see that EnumerateFiles seemed to out-perform FindNextFile in C#, while EnumerateFiles with AsParallel was by far the fastest followed surprisingly by command prompt count. However do note that AsParallel wasn't getting the complete file count or was missing some files counted by the others so you could say the command prompt method is the best.
Applicable Config:
Windows 7 Service Pack 1 x64
Intel(R) Core(TM) i5-3210M CPU #2.50GHz 2.50GHz
RAM: 6GB
Platform Target: x64
No Optimization (NB: Compiling with optimization will produce drastically poor performance)
Allow UnSafe Code
Start Without Debugging
Below are three screenshots:
I have included my test code below:
static void Main(string[] args)
{
Console.Title = "File Enumeration Performance Comparison";
Stopwatch watch = new Stopwatch();
watch.Start();
var allfiles = GetPast60("C:\\Users\\UserName\\Documents");
watch.Stop();
Console.WriteLine("Total time to enumerate using WINAPI =" + watch.ElapsedMilliseconds + "ms.");
Console.WriteLine("File Count: " + allfiles);
Stopwatch watch1 = new Stopwatch();
watch1.Start();
var allfiles1 = GetPast60Enum("C:\\Users\\UserName\\Documents\\");
watch1.Stop();
Console.WriteLine("Total time to enumerate using EnumerateFiles =" + watch1.ElapsedMilliseconds + "ms.");
Console.WriteLine("File Count: " + allfiles1);
Stopwatch watch2 = new Stopwatch();
watch2.Start();
var allfiles2 = Get1("C:\\Users\\UserName\\Documents\\");
watch2.Stop();
Console.WriteLine("Total time to enumerate using Get1 =" + watch2.ElapsedMilliseconds + "ms.");
Console.WriteLine("File Count: " + allfiles2);
Stopwatch watch3 = new Stopwatch();
watch3.Start();
var allfiles3 = Get2("C:\\Users\\UserName\\Documents\\");
watch3.Stop();
Console.WriteLine("Total time to enumerate using Get2 =" + watch3.ElapsedMilliseconds + "ms.");
Console.WriteLine("File Count: " + allfiles3);
Stopwatch watch4 = new Stopwatch();
watch4.Start();
var allfiles4 = RunCommand(#"dir /a: /b /s C:\Users\UserName\Documents");
watch4.Stop();
Console.WriteLine("Total time to enumerate using Command Prompt =" + watch4.ElapsedMilliseconds + "ms.");
Console.WriteLine("File Count: " + allfiles4);
Console.WriteLine("Press Any Key to Continue...");
Console.ReadLine();
}
private static int RunCommand(string command)
{
var process = new Process()
{
StartInfo = new ProcessStartInfo("cmd")
{
UseShellExecute = false,
RedirectStandardInput = true,
RedirectStandardOutput = true,
CreateNoWindow = true,
Arguments = String.Format("/c \"{0}\"", command),
}
};
int count = 0;
process.OutputDataReceived += delegate { count++; };
process.Start();
process.BeginOutputReadLine();
process.WaitForExit();
return count;
}
static int GetPast60Enum(string dir)
{
return new DirectoryInfo(dir).EnumerateFiles("*.*", SearchOption.AllDirectories).Count();
}
private static int Get2(string myBaseDirectory)
{
DirectoryInfo dirInfo = new DirectoryInfo(myBaseDirectory);
return dirInfo.EnumerateFiles("*.*", SearchOption.AllDirectories)
.AsParallel().Count();
}
private static int Get1(string myBaseDirectory)
{
DirectoryInfo dirInfo = new DirectoryInfo(myBaseDirectory);
return dirInfo.EnumerateDirectories()
.AsParallel()
.SelectMany(di => di.EnumerateFiles("*.*", SearchOption.AllDirectories))
.Count() + dirInfo.EnumerateFiles("*.*", SearchOption.TopDirectoryOnly).Count();
}
private static int GetPast60(string dir)
{
return FastDirectoryEnumerator.GetFiles(dir, "*.*", SearchOption.AllDirectories).Length;
}
NB: I concentrated on count in the benchmark not modified date.
I realize this is very late to the party but if someone else is looking for this then you can speed things up by orders of magnitude by directly parsing the the MFT or FAT of the file system, this requires admin privileges as I think it will return all files regardless of security but can probably take your 30 mins down to 30 seconds for the enumeration stage at least.
A library for NTFS is here https://github.com/LordMike/NtfsLib there is also https://discutils.codeplex.com/ which I haven't personally used.
I would only use these methods for initial discovery of files over x days old and then verify them individual before deleting, it might be overkill but I'm cautious like that.
The method Get1 in above answer (#itsnotalie & #Chibueze Opata) is missing to count the files in the root directory, so it should read:
private static int Get1(string myBaseDirectory)
{
DirectoryInfo dirInfo = new DirectoryInfo(myBaseDirectory);
return dirInfo.EnumerateDirectories()
.AsParallel()
.SelectMany(di => di.EnumerateFiles("*.*", SearchOption.AllDirectories))
.Count() + dirInfo.EnumerateFiles("*.*", SearchOption.TopDirectoryOnly).Count();
}
When using SearchOption.AllDirectories EnumerateFiles took ages to return the first item. After reading several good answers here, I have for now ended up with the function below. By only have it work on one directory at a time and calling it recursively it now returns first item almost immediately.
But I must admit that I'm not totally sure on the correct way to use .AsParallel() so don't use this blindly.
Instead of working with arrays I would strongly suggest working with enumeration instead.
Some mentions that speed of disk is limiting factor and threads won't help, in terms of total time that is very likely as long as nothing is cached by the OS, but by using multiple threads you can get the cached data returned first, while otherwise it might be possible that the cache is pruned to make space for the new results.
Recursive calls might affect stack, but there is a limit on most FSs for how many levels there can be, so should not become a real issue.
private static IEnumerable<FileInfo> EnumerateFilesParallel(DirectoryInfo dir)
{
return dir.EnumerateDirectories()
.AsParallel()
.SelectMany(EnumerateFilesParallel)
.Concat(dir.EnumerateFiles("*", SearchOption.TopDirectoryOnly).AsParallel());
}
You are using a Linq. It would be faster if you wrote your own method for searching Directories recursively with you're special case.
public static DateTime retval = DateTime.Now.AddDays(-60);
public static void WalkDirectoryTree(System.IO.DirectoryInfo root)
{
System.IO.FileInfo[] files = null;
System.IO.DirectoryInfo[] subDirs = null;
// First, process all the files directly under this folder
try
{
files = root.GetFiles("*.*");
}
// This is thrown if even one of the files requires permissions greater
// than the application provides.
catch (UnauthorizedAccessException e)
{
// This code just writes out the message and continues to recurse.
// You may decide to do something different here. For example, you
// can try to elevate your privileges and access the file again.
log.Add(e.Message);
}
catch (System.IO.DirectoryNotFoundException e)
{
Console.WriteLine(e.Message);
}
if (files != null)
{
foreach (System.IO.FileInfo fi in files)
{
if (fi.LastWriteTime < retval)
{
oldFiles.Add(files[i]);
}
Console.WriteLine(fi.FullName);
}
// Now find all the subdirectories under this directory.
subDirs = root.GetDirectories();
foreach (System.IO.DirectoryInfo dirInfo in subDirs)
{
// Resursive call for each subdirectory.
WalkDirectoryTree(dirInfo);
}
}
}
If you really want to improve performance, get your hands dirty and use the NtQueryDirectoryFile that's internal to Windows, with a large buffer size.
FindFirstFile is already slow, and while FindFirstFileEx is a bit better, the best performance will come from calling the native function directly.

Improve the performance for enumerating files and folders using .NET

I have a base directory that contains several thousand folders. Inside of these folders there can be between 1 and 20 subfolders that contains between 1 and 10 files. I'd like to delete all files that are over 60 days old. I was using the code below to get the list of files that I would have to delete:
DirectoryInfo dirInfo = new DirectoryInfo(myBaseDirectory);
FileInfo[] oldFiles =
dirInfo.GetFiles("*.*", SearchOption.AllDirectories)
.Where(t=>t.CreationTime < DateTime.Now.AddDays(-60)).ToArray();
But I let this run for about 30 minutes and it still hasn't finished. I'm curious if anyone can see anyway that I could potentially improve the performance of the above line or if there is a different way I should be approaching this entirely for better performance? Suggestions?
This is (probably) as good as it's going to get:
DateTime sixtyLess = DateTime.Now.AddDays(-60);
DirectoryInfo dirInfo = new DirectoryInfo(myBaseDirectory);
FileInfo[] oldFiles =
dirInfo.EnumerateFiles("*.*", SearchOption.AllDirectories)
.AsParallel()
.Where(fi => fi.CreationTime < sixtyLess).ToArray();
Changes:
Made the the 60 days less DateTime constant, and therefore less CPU load.
Used EnumerateFiles.
Made the query parallel.
Should run in a smaller amount of time (not sure how much smaller).
Here is another solution which might be faster or slower than the first, it depends on the data:
DateTime sixtyLess = DateTime.Now.AddDays(-60);
DirectoryInfo dirInfo = new DirectoryInfo(myBaseDirectory);
FileInfo[] oldFiles =
dirInfo.EnumerateDirectories()
.AsParallel()
.SelectMany(di => di.EnumerateFiles("*.*", SearchOption.AllDirectories)
.Where(fi => fi.CreationTime < sixtyLess))
.ToArray();
Here it moves the parallelism to the main folder enumeration. Most of the changes from above apply too.
A possibly faster alternative is to use WINAPI FindNextFile. There is an excellent Faster Directory Enumeration Tool for this. Which can be used as follows:
HashSet<FileData> GetPast60(string dir)
{
DateTime retval = DateTime.Now.AddDays(-60);
HashSet<FileData> oldFiles = new HashSet<FileData>();
FileData [] files = FastDirectoryEnumerator.GetFiles(dir);
for (int i=0; i<files.Length; i++)
{
if (files[i].LastWriteTime < retval)
{
oldFiles.Add(files[i]);
}
}
return oldFiles;
}
EDIT
So, based on comments below, I decided to do a benchmark of suggested solutions here as well as others I could think of. It was interesting enough to see that EnumerateFiles seemed to out-perform FindNextFile in C#, while EnumerateFiles with AsParallel was by far the fastest followed surprisingly by command prompt count. However do note that AsParallel wasn't getting the complete file count or was missing some files counted by the others so you could say the command prompt method is the best.
Applicable Config:
Windows 7 Service Pack 1 x64
Intel(R) Core(TM) i5-3210M CPU #2.50GHz 2.50GHz
RAM: 6GB
Platform Target: x64
No Optimization (NB: Compiling with optimization will produce drastically poor performance)
Allow UnSafe Code
Start Without Debugging
Below are three screenshots:
I have included my test code below:
static void Main(string[] args)
{
Console.Title = "File Enumeration Performance Comparison";
Stopwatch watch = new Stopwatch();
watch.Start();
var allfiles = GetPast60("C:\\Users\\UserName\\Documents");
watch.Stop();
Console.WriteLine("Total time to enumerate using WINAPI =" + watch.ElapsedMilliseconds + "ms.");
Console.WriteLine("File Count: " + allfiles);
Stopwatch watch1 = new Stopwatch();
watch1.Start();
var allfiles1 = GetPast60Enum("C:\\Users\\UserName\\Documents\\");
watch1.Stop();
Console.WriteLine("Total time to enumerate using EnumerateFiles =" + watch1.ElapsedMilliseconds + "ms.");
Console.WriteLine("File Count: " + allfiles1);
Stopwatch watch2 = new Stopwatch();
watch2.Start();
var allfiles2 = Get1("C:\\Users\\UserName\\Documents\\");
watch2.Stop();
Console.WriteLine("Total time to enumerate using Get1 =" + watch2.ElapsedMilliseconds + "ms.");
Console.WriteLine("File Count: " + allfiles2);
Stopwatch watch3 = new Stopwatch();
watch3.Start();
var allfiles3 = Get2("C:\\Users\\UserName\\Documents\\");
watch3.Stop();
Console.WriteLine("Total time to enumerate using Get2 =" + watch3.ElapsedMilliseconds + "ms.");
Console.WriteLine("File Count: " + allfiles3);
Stopwatch watch4 = new Stopwatch();
watch4.Start();
var allfiles4 = RunCommand(#"dir /a: /b /s C:\Users\UserName\Documents");
watch4.Stop();
Console.WriteLine("Total time to enumerate using Command Prompt =" + watch4.ElapsedMilliseconds + "ms.");
Console.WriteLine("File Count: " + allfiles4);
Console.WriteLine("Press Any Key to Continue...");
Console.ReadLine();
}
private static int RunCommand(string command)
{
var process = new Process()
{
StartInfo = new ProcessStartInfo("cmd")
{
UseShellExecute = false,
RedirectStandardInput = true,
RedirectStandardOutput = true,
CreateNoWindow = true,
Arguments = String.Format("/c \"{0}\"", command),
}
};
int count = 0;
process.OutputDataReceived += delegate { count++; };
process.Start();
process.BeginOutputReadLine();
process.WaitForExit();
return count;
}
static int GetPast60Enum(string dir)
{
return new DirectoryInfo(dir).EnumerateFiles("*.*", SearchOption.AllDirectories).Count();
}
private static int Get2(string myBaseDirectory)
{
DirectoryInfo dirInfo = new DirectoryInfo(myBaseDirectory);
return dirInfo.EnumerateFiles("*.*", SearchOption.AllDirectories)
.AsParallel().Count();
}
private static int Get1(string myBaseDirectory)
{
DirectoryInfo dirInfo = new DirectoryInfo(myBaseDirectory);
return dirInfo.EnumerateDirectories()
.AsParallel()
.SelectMany(di => di.EnumerateFiles("*.*", SearchOption.AllDirectories))
.Count() + dirInfo.EnumerateFiles("*.*", SearchOption.TopDirectoryOnly).Count();
}
private static int GetPast60(string dir)
{
return FastDirectoryEnumerator.GetFiles(dir, "*.*", SearchOption.AllDirectories).Length;
}
NB: I concentrated on count in the benchmark not modified date.
I realize this is very late to the party but if someone else is looking for this then you can speed things up by orders of magnitude by directly parsing the the MFT or FAT of the file system, this requires admin privileges as I think it will return all files regardless of security but can probably take your 30 mins down to 30 seconds for the enumeration stage at least.
A library for NTFS is here https://github.com/LordMike/NtfsLib there is also https://discutils.codeplex.com/ which I haven't personally used.
I would only use these methods for initial discovery of files over x days old and then verify them individual before deleting, it might be overkill but I'm cautious like that.
The method Get1 in above answer (#itsnotalie & #Chibueze Opata) is missing to count the files in the root directory, so it should read:
private static int Get1(string myBaseDirectory)
{
DirectoryInfo dirInfo = new DirectoryInfo(myBaseDirectory);
return dirInfo.EnumerateDirectories()
.AsParallel()
.SelectMany(di => di.EnumerateFiles("*.*", SearchOption.AllDirectories))
.Count() + dirInfo.EnumerateFiles("*.*", SearchOption.TopDirectoryOnly).Count();
}
When using SearchOption.AllDirectories EnumerateFiles took ages to return the first item. After reading several good answers here, I have for now ended up with the function below. By only have it work on one directory at a time and calling it recursively it now returns first item almost immediately.
But I must admit that I'm not totally sure on the correct way to use .AsParallel() so don't use this blindly.
Instead of working with arrays I would strongly suggest working with enumeration instead.
Some mentions that speed of disk is limiting factor and threads won't help, in terms of total time that is very likely as long as nothing is cached by the OS, but by using multiple threads you can get the cached data returned first, while otherwise it might be possible that the cache is pruned to make space for the new results.
Recursive calls might affect stack, but there is a limit on most FSs for how many levels there can be, so should not become a real issue.
private static IEnumerable<FileInfo> EnumerateFilesParallel(DirectoryInfo dir)
{
return dir.EnumerateDirectories()
.AsParallel()
.SelectMany(EnumerateFilesParallel)
.Concat(dir.EnumerateFiles("*", SearchOption.TopDirectoryOnly).AsParallel());
}
You are using a Linq. It would be faster if you wrote your own method for searching Directories recursively with you're special case.
public static DateTime retval = DateTime.Now.AddDays(-60);
public static void WalkDirectoryTree(System.IO.DirectoryInfo root)
{
System.IO.FileInfo[] files = null;
System.IO.DirectoryInfo[] subDirs = null;
// First, process all the files directly under this folder
try
{
files = root.GetFiles("*.*");
}
// This is thrown if even one of the files requires permissions greater
// than the application provides.
catch (UnauthorizedAccessException e)
{
// This code just writes out the message and continues to recurse.
// You may decide to do something different here. For example, you
// can try to elevate your privileges and access the file again.
log.Add(e.Message);
}
catch (System.IO.DirectoryNotFoundException e)
{
Console.WriteLine(e.Message);
}
if (files != null)
{
foreach (System.IO.FileInfo fi in files)
{
if (fi.LastWriteTime < retval)
{
oldFiles.Add(files[i]);
}
Console.WriteLine(fi.FullName);
}
// Now find all the subdirectories under this directory.
subDirs = root.GetDirectories();
foreach (System.IO.DirectoryInfo dirInfo in subDirs)
{
// Resursive call for each subdirectory.
WalkDirectoryTree(dirInfo);
}
}
}
If you really want to improve performance, get your hands dirty and use the NtQueryDirectoryFile that's internal to Windows, with a large buffer size.
FindFirstFile is already slow, and while FindFirstFileEx is a bit better, the best performance will come from calling the native function directly.

Unauthorized Access Exception DirectoryInfo.GetFiles() method

I wrote a program (on Windows 7) that call the method DirectoryInfo.GetFiles(), and in the folder "documents and settings", I have the exception of UnauthorizedAccess.
I tried lots of solutions, like:
create a manifest with
`<requestedExecutionLevel level="highestAvailable" uiAccess="false" />`
and also with this
DirectorySecurity dSecurity = Directory.GetAccessControl(dir.FullName);
dSecurity.AddAccessRule(new FileSystemAccessRule("Luca", FileSystemRights.FullControl, AccessControlType.Allow));
Directory.SetAccessControl(dir.FullName, dSecurity);
What could be the issue?
First off, you should be using DirectoryInfo.EnumerateFiles(...) instead of GetFiles(...). EnumerateFiles(...) keeps you from having to get the entire list until you actually need to.
I ran into this issue a while back and found that I ended up needing to implement a replacement IEnumerable in order to be able to complete an enumeration over folders that I may only have selected access to.
You can see the result of my research in the following thread. DirectoryInfo.EnumerateFiles(...) causes UnauthorizedAccessException (and other exceptions)
Just a Quick Copy Paste because I just had the same Problem.
Adjust the Code to your needs (because I calculate the the size, counting all files and "save" all the Files I want to copy in a List).
After you got all files in your List you can start copy them or what ever you wanna do with the Files:
private double CalculateSize(string sourcePath, Progress state, List<FileInfo> filesToCopy)
{
int _fileCount = 0;
DirectoryInfo sourceDirectory = new DirectoryInfo(sourcePath);
FileInfo[] files = null;
try
{
files = sourceDirectory.GetFiles();
}
catch(UnauthorizedAccessException ex)
{
// DO SOME LOGGING-MAGIC IN HERE...
}
if (files != null)
{
foreach (FileInfo file in files)
{
fullSizeToCopy += file.Length;
filesToCopy.Add(file);
_fileCount++;
}
}
DirectoryInfo[] directories = null;
try
{
directories = sourceDirectory.GetDirectories();
}
catch(UnauthorizedAccessException ex)
{
// Do more logging Magic in here...
}
if (directories != null)
foreach (DirectoryInfo direcotry in directories)
{
CalculateSize(direcotry.FullName, state, filesToCopy);
}
state.FileCount = _fileCount;
return fullSizeToCopy;
}
Your best bet might be to put a try/catch block around the call and ignore any directories you don't have access to. Maybe not the best solution, but it would at least make your method get all the directories you do have access to. Something like this:
try
{
directory.GetFiles();
}
catch (UnauthorizedAccessException)
{
string logMsg = string.Format("Unable to access directory {0}", directory.FullName);
//Handle any desired logging here
}
Just like blow, use EnumerateDirectories rather than DirectoryInfo.getfiles
private void ScanEmptyDirs(string dir, ref int cnt, CancellationToken token)
{
if (String.IsNullOrEmpty(dir))
{
throw new ArgumentException("Starting directory is a null reference or an empty string: dir");
}
try
{
foreach (var d in Directory.EnumerateDirectories(dir))
{
if (token.IsCancellationRequested)
{
token.ThrowIfCancellationRequested();
}
ScanEmptyDirs(d, ref cnt, token);
}
EmptyJudge(dir, ref cnt);
}
catch (UnauthorizedAccessException) { }
}

Get all files and directories in specific path fast

I am creating a backup application where c# scans a directory. Before I use to have something like this in order to get all the files and subfiles in a directory:
DirectoryInfo di = new DirectoryInfo("A:\\");
var directories= di.GetFiles("*", SearchOption.AllDirectories);
foreach (FileInfo d in directories)
{
//Add files to a list so that later they can be compared to see if each file
// needs to be copid or not
}
The only problem with that is that sometimes a file could not be accessed and I get several errors. an example of an error that I get is:
As a result I created a recursive method that will scan all files in the current directory. If there where directories in that directory then the method will be called again passing that directory. The nice thing about this method is that I could place the files inside a try catch block giving me the option to add those files to a List if there where no errors and adding the directory to another list if I had errors.
try
{
files = di.GetFiles(searchPattern, SearchOption.TopDirectoryOnly);
}
catch
{
//info of this folder was not able to get
lstFilesErrors.Add(sDir(di));
return;
}
So this method works great the only problem is that when I scan a large directory it takes to much times. How could I speed up this process? My actual method is this in case you need it.
private void startScan(DirectoryInfo di)
{
//lstFilesErrors is a list of MyFile objects
// I created that class because I wanted to store more specific information
// about a file such as its comparePath name and other properties that I need
// in order to compare it with another list
// lstFiles is a list of MyFile objects that store all the files
// that are contained in path that I want to scan
FileInfo[] files = null;
DirectoryInfo[] directories = null;
string searchPattern = "*.*";
try
{
files = di.GetFiles(searchPattern, SearchOption.TopDirectoryOnly);
}
catch
{
//info of this folder was not able to get
lstFilesErrors.Add(sDir(di));
return;
}
// if there are files in the directory then add those files to the list
if (files != null)
{
foreach (FileInfo f in files)
{
lstFiles.Add(sFile(f));
}
}
try
{
directories = di.GetDirectories(searchPattern, SearchOption.TopDirectoryOnly);
}
catch
{
lstFilesErrors.Add(sDir(di));
return;
}
// if that directory has more directories then add them to the list then
// execute this function
if (directories != null)
foreach (DirectoryInfo d in directories)
{
FileInfo[] subFiles = null;
DirectoryInfo[] subDir = null;
bool isThereAnError = false;
try
{
subFiles = d.GetFiles();
subDir = d.GetDirectories();
}
catch
{
isThereAnError = true;
}
if (isThereAnError)
lstFilesErrors.Add(sDir(d));
else
{
lstFiles.Add(sDir(d));
startScan(d);
}
}
}
Ant the problem if I try to handle the exception with something like:
DirectoryInfo di = new DirectoryInfo("A:\\");
FileInfo[] directories = null;
try
{
directories = di.GetFiles("*", SearchOption.AllDirectories);
}
catch (UnauthorizedAccessException e)
{
Console.WriteLine("There was an error with UnauthorizedAccessException");
}
catch
{
Console.WriteLine("There was antother error");
}
Is that if an exception occurs then I get no files.
This method is much faster. You can only tel when placing a lot of files in a directory. My A:\ external hard drive contains almost 1 terabit so it makes a big difference when dealing with a lot of files.
static void Main(string[] args)
{
DirectoryInfo di = new DirectoryInfo("A:\\");
FullDirList(di, "*");
Console.WriteLine("Done");
Console.Read();
}
static List<FileInfo> files = new List<FileInfo>(); // List that will hold the files and subfiles in path
static List<DirectoryInfo> folders = new List<DirectoryInfo>(); // List that hold direcotries that cannot be accessed
static void FullDirList(DirectoryInfo dir, string searchPattern)
{
// Console.WriteLine("Directory {0}", dir.FullName);
// list the files
try
{
foreach (FileInfo f in dir.GetFiles(searchPattern))
{
//Console.WriteLine("File {0}", f.FullName);
files.Add(f);
}
}
catch
{
Console.WriteLine("Directory {0} \n could not be accessed!!!!", dir.FullName);
return; // We alredy got an error trying to access dir so dont try to access it again
}
// process each directory
// If I have been able to see the files in the directory I should also be able
// to look at its directories so I dont think I should place this in a try catch block
foreach (DirectoryInfo d in dir.GetDirectories())
{
folders.Add(d);
FullDirList(d, searchPattern);
}
}
By the way I got this thanks to your comment Jim Mischel
In .NET 4.0 there's the Directory.EnumerateFiles method which returns an IEnumerable<string> and is not loading all the files in memory. It's only once you start iterating over the returned collection that files will be returned and exceptions could be handled.
There is a long history of the .NET file enumeration methods being slow. The issue is there is not an instantaneous way of enumerating large directory structures. Even the accepted answer here has its issues with GC allocations.
The best I've been able to do is wrapped up in my library and exposed as the FindFile (source) class in the CSharpTest.Net.IO namespace. This class can enumerate files and folders without unneeded GC allocations and string marshalling.
The usage is simple enough, and the RaiseOnAccessDenied property will skip the directories and files the user does not have access to:
private static long SizeOf(string directory)
{
var fcounter = new CSharpTest.Net.IO.FindFile(directory, "*", true, true, true);
fcounter.RaiseOnAccessDenied = false;
long size = 0, total = 0;
fcounter.FileFound +=
(o, e) =>
{
if (!e.IsDirectory)
{
Interlocked.Increment(ref total);
size += e.Length;
}
};
Stopwatch sw = Stopwatch.StartNew();
fcounter.Find();
Console.WriteLine("Enumerated {0:n0} files totaling {1:n0} bytes in {2:n3} seconds.",
total, size, sw.Elapsed.TotalSeconds);
return size;
}
For my local C:\ drive this outputs the following:
Enumerated 810,046 files totaling 307,707,792,662 bytes in 232.876 seconds.
Your mileage may vary by drive speed, but this is the fastest method I've found of enumerating files in managed code. The event parameter is a mutating class of type FindFile.FileFoundEventArgs so be sure you do not keep a reference to it as it's values will change for each event raised.
I know this is old, but... Another option may be to use the FileSystemWatcher like so:
void SomeMethod()
{
System.IO.FileSystemWatcher m_Watcher = new System.IO.FileSystemWatcher();
m_Watcher.Path = path;
m_Watcher.Filter = "*.*";
m_Watcher.NotifyFilter = m_Watcher.NotifyFilter = NotifyFilters.LastAccess | NotifyFilters.LastWrite | NotifyFilters.FileName | NotifyFilters.DirectoryName;
m_Watcher.Created += new FileSystemEventHandler(OnChanged);
m_Watcher.EnableRaisingEvents = true;
}
private void OnChanged(object sender, FileSystemEventArgs e)
{
string path = e.FullPath;
lock (listLock)
{
pathsToUpload.Add(path);
}
}
This would allow you to watch the directories for file changes with an extremely lightweight process, that you could then use to store the names of the files that changed so that you could back them up at the appropriate time.
(copied this piece from my other answer in your other question)
Show progress when searching all files in a directory
Fast files enumeration
Of course, as you already know, there are a lot of ways of doing the enumeration itself... but none will be instantaneous. You could try using the USN Journal of the file system to do the scan. Take a look at this project in CodePlex: MFT Scanner in VB.NET... it found all the files in my IDE SATA (not SSD) drive in less than 15 seconds, and found 311000 files.
You will have to filter the files by path, so that only the files inside the path you are looking are returned. But that is the easy part of the job!
Maybe it will be helpfull for you.
You could use "DirectoryInfo.EnumerateFiles" method and handle UnauthorizedAccessException as you need.
using System;
using System.IO;
class Program
{
static void Main(string[] args)
{
DirectoryInfo diTop = new DirectoryInfo(#"d:\");
try
{
foreach (var fi in diTop.EnumerateFiles())
{
try
{
// Display each file over 10 MB;
if (fi.Length > 10000000)
{
Console.WriteLine("{0}\t\t{1}", fi.FullName, fi.Length.ToString("N0"));
}
}
catch (UnauthorizedAccessException UnAuthTop)
{
Console.WriteLine("{0}", UnAuthTop.Message);
}
}
foreach (var di in diTop.EnumerateDirectories("*"))
{
try
{
foreach (var fi in di.EnumerateFiles("*", SearchOption.AllDirectories))
{
try
{
// Display each file over 10 MB;
if (fi.Length > 10000000)
{
Console.WriteLine("{0}\t\t{1}", fi.FullName, fi.Length.ToString("N0"));
}
}
catch (UnauthorizedAccessException UnAuthFile)
{
Console.WriteLine("UnAuthFile: {0}", UnAuthFile.Message);
}
}
}
catch (UnauthorizedAccessException UnAuthSubDir)
{
Console.WriteLine("UnAuthSubDir: {0}", UnAuthSubDir.Message);
}
}
}
catch (DirectoryNotFoundException DirNotFound)
{
Console.WriteLine("{0}", DirNotFound.Message);
}
catch (UnauthorizedAccessException UnAuthDir)
{
Console.WriteLine("UnAuthDir: {0}", UnAuthDir.Message);
}
catch (PathTooLongException LongPath)
{
Console.WriteLine("{0}", LongPath.Message);
}
}
}
You can use this to get all directories and sub-directories. Then simply loop through to process the files.
string[] folders = System.IO.Directory.GetDirectories(#"C:\My Sample Path\","*", System.IO.SearchOption.AllDirectories);
foreach(string f in folders)
{
//call some function to get all files in folder
}

How do I get a directory size (files in the directory) in C#?

I want to be able to get the size of one of the local directories using C#. I'm trying to avoid the following (pseudo like code), although in the worst case scenario I will have to settle for this:
int GetSize(Directory)
{
int Size = 0;
foreach ( File in Directory )
{
FileInfo fInfo of File;
Size += fInfo.Size;
}
foreach ( SubDirectory in Directory )
{
Size += GetSize(SubDirectory);
}
return Size;
}
Basically, is there a Walk() available somewhere so that I can walk through the directory tree? Which would save the recursion of going through each sub-directory.
A very succinct way to get a folder size in .net 4.0 is below. It still suffers from the limitation of having to traverse all files recursively, but it doesn't load a potentially huge array of filenames, and it's only two lines of code. Make sure to use the namespaces System.IO and System.Linq.
private static long GetDirectorySize(string folderPath)
{
DirectoryInfo di = new DirectoryInfo(folderPath);
return di.EnumerateFiles("*.*", SearchOption.AllDirectories).Sum(fi => fi.Length);
}
If you use Directory.GetFiles you can do a recursive seach (using SearchOption.AllDirectories), but this is a bit flaky anyway (especially if you don't have access to one of the sub-directories) - and might involve a huge single array coming back (warning klaxon...).
I'd be happy with the recursion approach unless I could show (via profiling) a bottleneck; and then I'd probably switch to (single-level) Directory.GetFiles, using a Queue<string> to emulate recursion.
Note that .NET 4.0 introduces some enumerator-based file/directory listing methods which save on the big arrays.
Here my .NET 4.0 approach
public static long GetFileSizeSumFromDirectory(string searchDirectory)
{
var files = Directory.EnumerateFiles(searchDirectory);
// get the sizeof all files in the current directory
var currentSize = (from file in files let fileInfo = new FileInfo(file) select fileInfo.Length).Sum();
var directories = Directory.EnumerateDirectories(searchDirectory);
// get the size of all files in all subdirectories
var subDirSize = (from directory in directories select GetFileSizeSumFromDirectory(directory)).Sum();
return currentSize + subDirSize;
}
Or even nicer:
// get IEnumerable from all files in the current dir and all sub dirs
var files = Directory.EnumerateFiles(searchDirectory,"*",SearchOption.AllDirectories);
// get the size of all files
long sum = (from file in files let fileInfo = new FileInfo(file) select fileInfo .Length).Sum();
As Gabriel pointed out this will fail if you have a restricted directory under the searchDirectory!
You could hide your recursion behind an extension method (to avoid the issues Marc has highlighted with the GetFiles() method):
public static class UserExtension
{
public static IEnumerable<FileInfo> Walk(this DirectoryInfo directory)
{
foreach(FileInfo file in directory.GetFiles())
{
yield return file;
}
foreach(DirectoryInfo subDirectory in directory.GetDirectories())
{
foreach(FileInfo file in subDirectory.Walk())
{
yield return file;
}
}
}
}
(You probably want to add some exception handling to this for protected folders etc.)
Then:
using static UserExtension;
long totalSize = 0L;
var startFolder = new DirectoryInfo("<path to folder>");
// iteration
foreach(FileInfo file in startFolder.Walk())
{
totalSize += file.Length;
}
// linq
totalSize = di.Walk().Sum(s => s.Length);
Basically the same code, but maybe a little neater...
First, forgive my poor english ;o)
I had a problem that took me to this page : enumerate files of a directory and his subdirectories without blocking on an UnauthorizedAccessException, and, like the new method of .Net 4 DirectoryInfo.Enumerate..., get the first result before the end of the entire query.
With the help of various examples found here and there on the web, I finally write this method :
public static IEnumerable<FileInfo> EnumerateFiles_Recursive(this DirectoryInfo directory, string searchPattern, SearchOption searchOption, Func<DirectoryInfo, Exception, bool> handleExceptionAccess)
{
Queue<DirectoryInfo> subDirectories = new Queue<DirectoryInfo>();
IEnumerable<FileSystemInfo> entries = null;
// Try to get an enumerator on fileSystemInfos of directory
try
{
entries = directory.EnumerateFileSystemInfos(searchPattern, SearchOption.TopDirectoryOnly);
}
catch (Exception e)
{
// If there's a callback delegate and this delegate return true, we don't throw the exception
if (handleExceptionAccess == null || !handleExceptionAccess(directory, e))
throw;
// If the exception wasn't throw, we make entries reference an empty collection
entries = EmptyFileSystemInfos;
}
// Yield return file entries of the directory and enqueue the subdirectories
foreach (FileSystemInfo entrie in entries)
{
if (entrie is FileInfo)
yield return (FileInfo)entrie;
else if (entrie is DirectoryInfo)
subDirectories.Enqueue((DirectoryInfo)entrie);
}
// If recursive search, we make recursive call on the method to yield return entries of the subdirectories.
if (searchOption == SearchOption.AllDirectories)
{
DirectoryInfo subDir = null;
while (subDirectories.Count > 0)
{
subDir = subDirectories.Dequeue();
foreach (FileInfo file in subDir.EnumerateFiles_Recursive(searchPattern, searchOption, handleExceptionAccess))
{
yield return file;
}
}
}
else
subDirectories.Clear();
}
I use a Queue and a recursive method to keep traditional order (content of directory and then content of first subdirectory and his own subdirectories and then content of the second...). The parameter "handleExceptionAccess" is just a function call when an exception is thrown with a directory; the function must return true to indicate that the exception must be ignored.
With this methode, you can write :
DirectoryInfo dir = new DirectoryInfo("c:\\temp");
long size = dir.EnumerateFiles_Recursive("*", SearchOption.AllDirectories, (d, ex) => true).Sum(f => f.Length);
And here we are : all exception when trying to enumerate a directory will be ignore !
Hope this help
Lionel
PS : for a reason I can't explain, my method is more quick than the framework 4 one...
PPS : you can get my test solutions with source for those methods : here TestDirEnumerate. I write EnumerateFiles_Recursive, EnumerateFiles_NonRecursive (use a queue to avoid recursion) and EnumerateFiles_NonRecursive_TraditionalOrder (use a stack of queue to avoid recursion and keep traditional order). Keep those 3 methods has no interest, I write them only for test the best one. I think to keep only the last one.
I also wrote the equivalent for EnumerateFileSystemInfos and EnumerateDirectories.
Have a look at this post:
http://social.msdn.microsoft.com/forums/en-US/vbgeneral/thread/eed54ebe-facd-4305-b64b-9dbdc65df04e
Basically there is no clean .NET way, but there is a quite straightforward COM approach so if you're happy with using COM interop and being tied to Windows, this could work for you.
the solution is already here https://stackoverflow.com/a/12665904/1498669
as in the duplicate How do I Get Folder Size in C#? shown -> you can do this also in c#
first, add the COM reference "Microsoft Scripting Runtime" to your project and use:
var fso = new Scripting.FileSystemObject();
var folder = fso.GetFolder(#"C:\Windows");
double sizeInBytes = folder.Size;
// cleanup COM
System.Runtime.InteropServices.Marshal.ReleaseComObject(folder);
System.Runtime.InteropServices.Marshal.ReleaseComObject(fso);
remember to cleanup the COM references
I've been looking some time ago for a function like the one you ask for and from what I've found on the Internet and in MSDN forums, there is no such function.
The recursive way is the only I found to obtain the size of a Folder considering all the files and subfolders that contains.
You should make it easy on yourself. Make a method and passthrough the location of the directory.
private static long GetDirectorySize(string location) {
return new DirectoryInfo(location).GetFiles("*.*", SearchOption.AllDirectories).Sum(file => file.Length);
}
-G

Categories

Resources