Get all files and directories in specific path fast - c#

I am creating a backup application where c# scans a directory. Before I use to have something like this in order to get all the files and subfiles in a directory:
DirectoryInfo di = new DirectoryInfo("A:\\");
var directories= di.GetFiles("*", SearchOption.AllDirectories);
foreach (FileInfo d in directories)
{
//Add files to a list so that later they can be compared to see if each file
// needs to be copid or not
}
The only problem with that is that sometimes a file could not be accessed and I get several errors. an example of an error that I get is:
As a result I created a recursive method that will scan all files in the current directory. If there where directories in that directory then the method will be called again passing that directory. The nice thing about this method is that I could place the files inside a try catch block giving me the option to add those files to a List if there where no errors and adding the directory to another list if I had errors.
try
{
files = di.GetFiles(searchPattern, SearchOption.TopDirectoryOnly);
}
catch
{
//info of this folder was not able to get
lstFilesErrors.Add(sDir(di));
return;
}
So this method works great the only problem is that when I scan a large directory it takes to much times. How could I speed up this process? My actual method is this in case you need it.
private void startScan(DirectoryInfo di)
{
//lstFilesErrors is a list of MyFile objects
// I created that class because I wanted to store more specific information
// about a file such as its comparePath name and other properties that I need
// in order to compare it with another list
// lstFiles is a list of MyFile objects that store all the files
// that are contained in path that I want to scan
FileInfo[] files = null;
DirectoryInfo[] directories = null;
string searchPattern = "*.*";
try
{
files = di.GetFiles(searchPattern, SearchOption.TopDirectoryOnly);
}
catch
{
//info of this folder was not able to get
lstFilesErrors.Add(sDir(di));
return;
}
// if there are files in the directory then add those files to the list
if (files != null)
{
foreach (FileInfo f in files)
{
lstFiles.Add(sFile(f));
}
}
try
{
directories = di.GetDirectories(searchPattern, SearchOption.TopDirectoryOnly);
}
catch
{
lstFilesErrors.Add(sDir(di));
return;
}
// if that directory has more directories then add them to the list then
// execute this function
if (directories != null)
foreach (DirectoryInfo d in directories)
{
FileInfo[] subFiles = null;
DirectoryInfo[] subDir = null;
bool isThereAnError = false;
try
{
subFiles = d.GetFiles();
subDir = d.GetDirectories();
}
catch
{
isThereAnError = true;
}
if (isThereAnError)
lstFilesErrors.Add(sDir(d));
else
{
lstFiles.Add(sDir(d));
startScan(d);
}
}
}
Ant the problem if I try to handle the exception with something like:
DirectoryInfo di = new DirectoryInfo("A:\\");
FileInfo[] directories = null;
try
{
directories = di.GetFiles("*", SearchOption.AllDirectories);
}
catch (UnauthorizedAccessException e)
{
Console.WriteLine("There was an error with UnauthorizedAccessException");
}
catch
{
Console.WriteLine("There was antother error");
}
Is that if an exception occurs then I get no files.

This method is much faster. You can only tel when placing a lot of files in a directory. My A:\ external hard drive contains almost 1 terabit so it makes a big difference when dealing with a lot of files.
static void Main(string[] args)
{
DirectoryInfo di = new DirectoryInfo("A:\\");
FullDirList(di, "*");
Console.WriteLine("Done");
Console.Read();
}
static List<FileInfo> files = new List<FileInfo>(); // List that will hold the files and subfiles in path
static List<DirectoryInfo> folders = new List<DirectoryInfo>(); // List that hold direcotries that cannot be accessed
static void FullDirList(DirectoryInfo dir, string searchPattern)
{
// Console.WriteLine("Directory {0}", dir.FullName);
// list the files
try
{
foreach (FileInfo f in dir.GetFiles(searchPattern))
{
//Console.WriteLine("File {0}", f.FullName);
files.Add(f);
}
}
catch
{
Console.WriteLine("Directory {0} \n could not be accessed!!!!", dir.FullName);
return; // We alredy got an error trying to access dir so dont try to access it again
}
// process each directory
// If I have been able to see the files in the directory I should also be able
// to look at its directories so I dont think I should place this in a try catch block
foreach (DirectoryInfo d in dir.GetDirectories())
{
folders.Add(d);
FullDirList(d, searchPattern);
}
}
By the way I got this thanks to your comment Jim Mischel

In .NET 4.0 there's the Directory.EnumerateFiles method which returns an IEnumerable<string> and is not loading all the files in memory. It's only once you start iterating over the returned collection that files will be returned and exceptions could be handled.

There is a long history of the .NET file enumeration methods being slow. The issue is there is not an instantaneous way of enumerating large directory structures. Even the accepted answer here has its issues with GC allocations.
The best I've been able to do is wrapped up in my library and exposed as the FindFile (source) class in the CSharpTest.Net.IO namespace. This class can enumerate files and folders without unneeded GC allocations and string marshalling.
The usage is simple enough, and the RaiseOnAccessDenied property will skip the directories and files the user does not have access to:
private static long SizeOf(string directory)
{
var fcounter = new CSharpTest.Net.IO.FindFile(directory, "*", true, true, true);
fcounter.RaiseOnAccessDenied = false;
long size = 0, total = 0;
fcounter.FileFound +=
(o, e) =>
{
if (!e.IsDirectory)
{
Interlocked.Increment(ref total);
size += e.Length;
}
};
Stopwatch sw = Stopwatch.StartNew();
fcounter.Find();
Console.WriteLine("Enumerated {0:n0} files totaling {1:n0} bytes in {2:n3} seconds.",
total, size, sw.Elapsed.TotalSeconds);
return size;
}
For my local C:\ drive this outputs the following:
Enumerated 810,046 files totaling 307,707,792,662 bytes in 232.876 seconds.
Your mileage may vary by drive speed, but this is the fastest method I've found of enumerating files in managed code. The event parameter is a mutating class of type FindFile.FileFoundEventArgs so be sure you do not keep a reference to it as it's values will change for each event raised.

I know this is old, but... Another option may be to use the FileSystemWatcher like so:
void SomeMethod()
{
System.IO.FileSystemWatcher m_Watcher = new System.IO.FileSystemWatcher();
m_Watcher.Path = path;
m_Watcher.Filter = "*.*";
m_Watcher.NotifyFilter = m_Watcher.NotifyFilter = NotifyFilters.LastAccess | NotifyFilters.LastWrite | NotifyFilters.FileName | NotifyFilters.DirectoryName;
m_Watcher.Created += new FileSystemEventHandler(OnChanged);
m_Watcher.EnableRaisingEvents = true;
}
private void OnChanged(object sender, FileSystemEventArgs e)
{
string path = e.FullPath;
lock (listLock)
{
pathsToUpload.Add(path);
}
}
This would allow you to watch the directories for file changes with an extremely lightweight process, that you could then use to store the names of the files that changed so that you could back them up at the appropriate time.

(copied this piece from my other answer in your other question)
Show progress when searching all files in a directory
Fast files enumeration
Of course, as you already know, there are a lot of ways of doing the enumeration itself... but none will be instantaneous. You could try using the USN Journal of the file system to do the scan. Take a look at this project in CodePlex: MFT Scanner in VB.NET... it found all the files in my IDE SATA (not SSD) drive in less than 15 seconds, and found 311000 files.
You will have to filter the files by path, so that only the files inside the path you are looking are returned. But that is the easy part of the job!

Maybe it will be helpfull for you.
You could use "DirectoryInfo.EnumerateFiles" method and handle UnauthorizedAccessException as you need.
using System;
using System.IO;
class Program
{
static void Main(string[] args)
{
DirectoryInfo diTop = new DirectoryInfo(#"d:\");
try
{
foreach (var fi in diTop.EnumerateFiles())
{
try
{
// Display each file over 10 MB;
if (fi.Length > 10000000)
{
Console.WriteLine("{0}\t\t{1}", fi.FullName, fi.Length.ToString("N0"));
}
}
catch (UnauthorizedAccessException UnAuthTop)
{
Console.WriteLine("{0}", UnAuthTop.Message);
}
}
foreach (var di in diTop.EnumerateDirectories("*"))
{
try
{
foreach (var fi in di.EnumerateFiles("*", SearchOption.AllDirectories))
{
try
{
// Display each file over 10 MB;
if (fi.Length > 10000000)
{
Console.WriteLine("{0}\t\t{1}", fi.FullName, fi.Length.ToString("N0"));
}
}
catch (UnauthorizedAccessException UnAuthFile)
{
Console.WriteLine("UnAuthFile: {0}", UnAuthFile.Message);
}
}
}
catch (UnauthorizedAccessException UnAuthSubDir)
{
Console.WriteLine("UnAuthSubDir: {0}", UnAuthSubDir.Message);
}
}
}
catch (DirectoryNotFoundException DirNotFound)
{
Console.WriteLine("{0}", DirNotFound.Message);
}
catch (UnauthorizedAccessException UnAuthDir)
{
Console.WriteLine("UnAuthDir: {0}", UnAuthDir.Message);
}
catch (PathTooLongException LongPath)
{
Console.WriteLine("{0}", LongPath.Message);
}
}
}

You can use this to get all directories and sub-directories. Then simply loop through to process the files.
string[] folders = System.IO.Directory.GetDirectories(#"C:\My Sample Path\","*", System.IO.SearchOption.AllDirectories);
foreach(string f in folders)
{
//call some function to get all files in folder
}

Related

File.Copy slow on 20000 files

I'm trying to copy around +20000 files from one directory to another calling three separate File.Copy method. These files can range anywhere from 26KB to 3MB. The reason for using three methods is because I need to copy these files in the following order.
*.EXT1 => *.EXT2 => *.EXT3
The whole copy routine took a total of 2-3 hours. Am I doing something wrong here or is there a faster way of copying from one directory to another on the same network?
utility.GetFilesFromSource("*.EXT1");
utility.CopyFilesToPath(srcPath);
utility.GetFilesFromSource("*.EXT2");
utility.CopyFilesToPath(srcPath);
utility.GetFilesFromSource("*.EXT3");
utility.CopyFilesToPath(srcPath);
public static void GetFilesFromSource(string fileExtension)
{
try
{
FileInfo[] Files = SourceDirectory.GetFiles(fileExtension)
.Where(f => f.Name.Contains(suffixName)).ToArray();
}
catch
{
throw;
}
}
public static void CopyFilesToPath(string path)
{
try
{
foreach (FileInfo file in Files)
{
File.Copy(file.FullName, Path.Combine(path, file.Name));
}
}
catch
{
throw;
}
}

C# File Permissions

I am currently writing a program in C# that will copy all user profile files to an external device (in this case, my home server).
When my code iterates through my files and folders, it throws an UnauthorizedAccessException.
I have Googled this and searched StackOverflow, but I am unable to find a clear solution that doesn't involve terminating my process. The idea is that it should copy the files and folders that have read permissions.
I had this when I first started, but I easily fixed it by limiting what directories I would backup (though I would prefer a full backup).
Here is my code:
FileInfo f = new FileInfo(_configuration.Destination);
if (!f.Directory.Exists)
{
f.Directory.Create();
}
string[] backupDirectories = new string[]
{
"Desktop",
"Documents",
"Downloads",
"Favorites",
"Links",
"Pictures",
"Saved Games",
"Searches",
"Videos",
".git",
".android",
".IdealC15",
".nuget",
".oracle_jre_usage",
".vs",
"Contacts"
};
foreach (string dirPath in backupDirectories)
{
DirectoryInfo dirInfo = new DirectoryInfo(_path + "\\" + dirPath);
if (dirInfo.Exists)
{
foreach (string dirP in Directory.GetDirectories(dirInfo.FullName, "*", SearchOption.AllDirectories))
{
DirectoryInfo dirI = new DirectoryInfo(dirP);
if (dirI.Exists)
{
string dir = dirP.Replace(_path, _configuration.Destination);
try
{
Directory.CreateDirectory(dir);
textBox2.Invoke((MethodInvoker) delegate
{
textBox2.AppendText("Create Directory: " + dir + Environment.NewLine);
});
} catch (Exception e)
{
textBox2.Invoke((MethodInvoker) delegate
{
textBox2.AppendText("Could NOT Create Directory: " + dir + Environment.NewLine);
});
continue;
}
foreach (FileInfo theFile in dirInfo.GetFiles("*", SearchOption.AllDirectories))
{
string newPath = theFile.FullName;
string file = newPath.Replace(_path, _configuration.Destination);
try
{
File.Copy(newPath, file, true);
textBox2.Invoke((MethodInvoker) delegate
{
textBox2.AppendText("Create File: " + file + Environment.NewLine);
});
} catch (Exception ex)
{
textBox2.Invoke((MethodInvoker) delegate
{
textBox2.AppendText("Could NOT Create File: " + file + Environment.NewLine);
});
}
}
}
}
}
}
I apologise if the code is messy, but I will describe sort of what it is doing. The first bit checks if the backup folder exists on the external drive.
The second part says what folders I need to backup (if you're able to fix this and make it backup all directories with permissions, please help me in doing so).
The first loop starts the iteration for each of the backupDirectories. The second loop starts the iteration for each of the directories in the backup directory. The third loop starts the iteration for each of the folders in the backup directory.
The exception is thrown at Directory.GetDirectories(dirInfo.FullName, "*", SearchOption.AllDirectories), and it is trying to access C:\Users\MyName\Documents\My Music. Attempting to access it in explorer does give me a permissions error, though it isn't listed in explorer when I try going to "Documents" (I am in Windows 10 Pro).
As I recommended, since the Operating System authority is higher than the application, it is likely that you cannot do more than what the Operating System would allow you to do (that is to access or not to access certain folder).
Thus, folders' accessibility is best solved in the Operating System level.
But you could still do two things in the program level to minimize the damage when you search for the items.
To use Directory.AccessControl to know the access level of a directory before you do any query on it. This is not so easy, and there are elaborated answers about this here and also here.
To minimize the damage made by unauthorized access issues by using SearchOption.TopDirectoryOnly instead of SearchOption.AllDirectories, combined with recursive search for all the accessible directories. This is how you can code it
public static List<string> GetAllAccessibleDirectories(string path, string searchPattern) {
List<string> dirPathList = new List<string>();
try {
List<string> childDirPathList = Directory.GetDirectories(path, searchPattern, SearchOption.TopDirectoryOnly).ToList(); //use TopDirectoryOnly
if (childDirPathList == null || childDirPathList.Count <= 0) //this directory has no child
return null;
foreach (string childDirPath in childDirPathList) { //foreach child directory, do recursive search
dirPathList.Add(childDirPath); //add the path
List<string> grandChildDirPath = GetAllAccessibleDirectories(childDirPath, searchPattern);
if (grandChildDirPath != null && grandChildDirPath.Count > 0) //this child directory has children and nothing has gone wrong
dirPathList.AddRange(grandChildDirPath.ToArray()); //add the grandchildren to the list
}
return dirPathList; //return the whole list found at this level
} catch {
return null; //something has gone wrong, return null
}
}
The function above minimize the damage caused by the unauthorized access only to the sub-directories which have the issue. All other accessible directories can be returned.

Unauthorized Access Exception DirectoryInfo.GetFiles() method

I wrote a program (on Windows 7) that call the method DirectoryInfo.GetFiles(), and in the folder "documents and settings", I have the exception of UnauthorizedAccess.
I tried lots of solutions, like:
create a manifest with
`<requestedExecutionLevel level="highestAvailable" uiAccess="false" />`
and also with this
DirectorySecurity dSecurity = Directory.GetAccessControl(dir.FullName);
dSecurity.AddAccessRule(new FileSystemAccessRule("Luca", FileSystemRights.FullControl, AccessControlType.Allow));
Directory.SetAccessControl(dir.FullName, dSecurity);
What could be the issue?
First off, you should be using DirectoryInfo.EnumerateFiles(...) instead of GetFiles(...). EnumerateFiles(...) keeps you from having to get the entire list until you actually need to.
I ran into this issue a while back and found that I ended up needing to implement a replacement IEnumerable in order to be able to complete an enumeration over folders that I may only have selected access to.
You can see the result of my research in the following thread. DirectoryInfo.EnumerateFiles(...) causes UnauthorizedAccessException (and other exceptions)
Just a Quick Copy Paste because I just had the same Problem.
Adjust the Code to your needs (because I calculate the the size, counting all files and "save" all the Files I want to copy in a List).
After you got all files in your List you can start copy them or what ever you wanna do with the Files:
private double CalculateSize(string sourcePath, Progress state, List<FileInfo> filesToCopy)
{
int _fileCount = 0;
DirectoryInfo sourceDirectory = new DirectoryInfo(sourcePath);
FileInfo[] files = null;
try
{
files = sourceDirectory.GetFiles();
}
catch(UnauthorizedAccessException ex)
{
// DO SOME LOGGING-MAGIC IN HERE...
}
if (files != null)
{
foreach (FileInfo file in files)
{
fullSizeToCopy += file.Length;
filesToCopy.Add(file);
_fileCount++;
}
}
DirectoryInfo[] directories = null;
try
{
directories = sourceDirectory.GetDirectories();
}
catch(UnauthorizedAccessException ex)
{
// Do more logging Magic in here...
}
if (directories != null)
foreach (DirectoryInfo direcotry in directories)
{
CalculateSize(direcotry.FullName, state, filesToCopy);
}
state.FileCount = _fileCount;
return fullSizeToCopy;
}
Your best bet might be to put a try/catch block around the call and ignore any directories you don't have access to. Maybe not the best solution, but it would at least make your method get all the directories you do have access to. Something like this:
try
{
directory.GetFiles();
}
catch (UnauthorizedAccessException)
{
string logMsg = string.Format("Unable to access directory {0}", directory.FullName);
//Handle any desired logging here
}
Just like blow, use EnumerateDirectories rather than DirectoryInfo.getfiles
private void ScanEmptyDirs(string dir, ref int cnt, CancellationToken token)
{
if (String.IsNullOrEmpty(dir))
{
throw new ArgumentException("Starting directory is a null reference or an empty string: dir");
}
try
{
foreach (var d in Directory.EnumerateDirectories(dir))
{
if (token.IsCancellationRequested)
{
token.ThrowIfCancellationRequested();
}
ScanEmptyDirs(d, ref cnt, token);
}
EmptyJudge(dir, ref cnt);
}
catch (UnauthorizedAccessException) { }
}

How to get only the files that are possible to copy?

I have this line of code: (using LINQ)
//string folder <-- folder browser dialog.
listFiles = Directory.GetFiles(folder, "*.xml",
SearchOption.AllDirectories).Select(
fileName => Path.GetFullPath(fileName)).ToList();
But sometimes my program finds protected files, such as system files or even system folders that can't be opened.
How can I surpass this problem:
Only get file name of open/free files-folders.
You can't tell, you just have to catch the exception.
What if the file is free when doing the free check, but in use when processing?
That can be a problem. If it throws an exception when going through the directories, it stops.
If you want to ignore those directories and keep going, you have to write a recursive method to do it:
List<string> GetFiles(string folder, string filter)
{
List<string> files = new List<string>();
try
{
// get all of the files in this directory
files.AddRange(Directory.GetFiles(folder, filter));
// Now recursively visit the directories
foreach (var dir in Directory.GetDirectories(folder))
{
files.AddRange(GetFiles(dir, filter));
}
}
catch (UnauthorizedAccessException)
{
// problem accessing this directory.
// ignore it and move on.
}
return files;
}
A somewhat more memory efficient version would be:
private List<string> GetFiles(string folder, string filter)
{
var files = new List<string>();
// To create a recursive Action, you have to initialize it to null,
// and then reassign it. Otherwise the compiler complains that you're
// using an unassigned variable.
Action<string> getFilesInDir = null;
getFilesInDir = new Action<string>(dir =>
{
try
{
// get all the files in this directory
files.AddRange(Directory.GetFiles(dir, filter));
// and recursively visit the directories
foreach (var subdir in Directory.GetDirectories(dir))
{
getFilesInDir(subdir);
}
}
catch (UnauthorizedAccessException)
{
// ignore exception
}
});
getFilesInDir(folder);
return files;
}
you could use something like this, potentially you will have to tweak attribute check:
Directory.GetFiles(folder, "*.xml", SearchOption.AllDirectories)
.Select(fileName => new FileInfo(Path.GetFullPath(fileName)))
.Where(n => !n.Attributes.HasFlag(FileAttributes.System))
.Select(n => n.FullName)
.ToList();

Solving UnauthorizedAccessException issue for listing files

Listing all files in a drive other than my system drive throws an UnauthorizedAccessException.
How can I solve this problem?
Is there a way to grant my application the access it needs?
My code:
Directory.GetFiles("S:\\", ...)
Here's a class that will work:
public static class FileDirectorySearcher
{
public static IEnumerable<string> Search(string searchPath, string searchPattern)
{
IEnumerable<string> files = GetFileSystemEntries(searchPath, searchPattern);
foreach (string file in files)
{
yield return file;
}
IEnumerable<string> directories = GetDirectories(searchPath);
foreach (string directory in directories)
{
files = Search(directory, searchPattern);
foreach (string file in files)
{
yield return file;
}
}
}
private static IEnumerable<string> GetDirectories(string directory)
{
IEnumerable<string> subDirectories = null;
try
{
subDirectories = Directory.EnumerateDirectories(directory, "*.*", SearchOption.TopDirectoryOnly);
}
catch (UnauthorizedAccessException)
{
}
if (subDirectories != null)
{
foreach (string subDirectory in subDirectories)
{
yield return subDirectory;
}
}
}
private static IEnumerable<string> GetFileSystemEntries(string directory, string searchPattern)
{
IEnumerable<string> files = null;
try
{
files = Directory.EnumerateFileSystemEntries(directory, searchPattern, SearchOption.TopDirectoryOnly);
}
catch (UnauthorizedAccessException)
{
}
if (files != null)
{
foreach (string file in files)
{
yield return file;
}
}
}
}
You can the use it like this:
IEnumerable<string> filesOrDirectories = FileDirectorySearcher.Search(#"C:\", "*.txt");
foreach (string fileOrDirectory in filesOrDirectories)
{
// Do something here.
}
It's recursive, but the use of yield gives it a low memory footprint (under 10KB in my testing). If you want only files that match the pattern and not directories as well just replace EnumerateFileSystemEntries with EnumerateFiles.
Are you allowed to access the drive? Can the program access the drive when it's not run from Visual Studio? Are restrictive permissions defined in the project's Security page ("Security Page, Project Designer")?
In .net core you can do something like this below. It can search for all subdirectories recursively with good performance and ignoring paths without access.
I also tried other methods found in
How to quickly check if folder is empty (.NET)? and
Is there a faster way than this to find all the files in a directory and all sub directories? and
https://www.codeproject.com/Articles/1383832/System-IO-Directory-Alternative-using-WinAPI
public static IEnumerable<string> ListFiles(string baseDir)
{
EnumerationOptions opt = new EnumerationOptions();
opt.RecurseSubdirectories = true;
opt.ReturnSpecialDirectories = false;
//opt.AttributesToSkip = FileAttributes.Hidden | FileAttributes.System;
opt.AttributesToSkip = 0;
opt.IgnoreInaccessible = true;
var tmp = Directory.EnumerateFileSystemEntries(baseDir, "*", opt);
return tmp;
}
I solved the problem. Not really but at least the source.
It was the SearchOption.AllDirectories option that caused the exception.
But when I just list the immediate files using Directories.GetFiles, it works.
This is good enough for me.
Any way to solve the recursive listing problem?

Categories

Resources