File.Copy slow on 20000 files - c#

I'm trying to copy around +20000 files from one directory to another calling three separate File.Copy method. These files can range anywhere from 26KB to 3MB. The reason for using three methods is because I need to copy these files in the following order.
*.EXT1 => *.EXT2 => *.EXT3
The whole copy routine took a total of 2-3 hours. Am I doing something wrong here or is there a faster way of copying from one directory to another on the same network?
utility.GetFilesFromSource("*.EXT1");
utility.CopyFilesToPath(srcPath);
utility.GetFilesFromSource("*.EXT2");
utility.CopyFilesToPath(srcPath);
utility.GetFilesFromSource("*.EXT3");
utility.CopyFilesToPath(srcPath);
public static void GetFilesFromSource(string fileExtension)
{
try
{
FileInfo[] Files = SourceDirectory.GetFiles(fileExtension)
.Where(f => f.Name.Contains(suffixName)).ToArray();
}
catch
{
throw;
}
}
public static void CopyFilesToPath(string path)
{
try
{
foreach (FileInfo file in Files)
{
File.Copy(file.FullName, Path.Combine(path, file.Name));
}
}
catch
{
throw;
}
}

Related

Recursive Deletion of Folders in C#

I am trying to recursively delete all empty subdirectories in a root directory and (not files)using C#. I have referred this post (Recursive delete of files and directories in C#)
and wrote the function as follows.
private void DeleteEmptySubFolders()
{
DirectoryInfo root = new DirectoryInfo(folderPath);
RecursiveDelete(root,true);
}
public static void RecursiveDelete(DirectoryInfo baseDir,bool isRootDir)
{
if (!baseDir.Exists)
return;
foreach (var dir in baseDir.EnumerateDirectories())
{
RecursiveDelete(dir,false);
}
try
{
if (!isRootDir)
baseDir.Delete(false);
}
catch(Exception ex)
{
}
}
My question is baseDir.Delete(false); will give me an exception when a directory is not empty, and I am just passing that exception without handling it(I just want to skip those ones, don't have to log). Is there a better way to do this?
Instead of try/catch, check if any files exist:
bool isEmpty = !Directory.EnumerateFileSystemEntries(baseDir.FullName).Any();
Now you can avoid trying to delete non-empty folders:
if (!isRootDir && isEmpty)
baseDir.Delete(false);

get all files within path (recursion)

I wrote a method that needed to find all files within a path, and I want to get all the files using recursion. Here's my current method:
public void doStart(DirectoryInfo dir, string filePattern)
{
try
{
foreach (FileInfo fileInfo in dir.GetFiles(filePattern))
{
if (fileFound != null)
{
fileFound(fileInfo);
}
}
}
catch (Exception)
{
}
try
{
foreach (DirectoryInfo dirInfo in dir.GetDirectories())
{
doStart(dirInfo, filePattern);
}
}
catch (Exception)
{
}
}
public void Start(string path, string filePattern)
{
doStart(new DirectoryInfo(path), filePattern);
}
Is there is better way to write this kind of recursion or is this good enough ?
Try something like this:
string[] filePaths = Directory.GetFiles(#dir, "*.filetype", SearchOption.AllDirectories);
This would recursively look through the directory, finding all files with a certain filetype ('.filetype') and returns a string array containing all found files.
Also, I'd recommend not to use empty catch blocks, as your application won't let you know if something went wrong. Either show a message box (or something similar) or log it to a database or something.
Further, what would your DoStart() method do if there is a subdirectory in a subdirectory? From what I'm seeing, I'd say it only searches on 1 sublevel.
Don't swallow all exceptions. If you need to ignore specific exceptions, catch those but let others bubble up
(style) Methods should be PascalCased (e.g. DoStart and `FileFound'
(style) Create an OnFileFound method instead of calling FileFound directly (I assume fileFound is an event handler?)
Other than that it looks fine to me.
Here is an example of true recursion. This will search until there are no more sub-directories to find, unlike Directory.GetFiles SearchOption.AllDirectories. You can modify this to add search filters as a parameter.
public IEnumerable<string> GetFilesRecursive(string ParentDirectory)
{
string[] subDirectories = Directory.GetDirectories(ParentDirectory);
foreach (string file in Directory.GetFiles(ParentDirectory))
{
yield return file;
}
foreach (string subDirectory in subDirectories)
{
foreach (string file in GetFilesRecursive(subDirectory))
{
yield return file;
}
}
}

C# code to copy files, can this snippet be improved?

While copying around 50 GB of data via local LAN share, due to connectivity issue copy failed at around 10 GB copied.
I have renamed copied 10GB of data directory to localRepository and then written a C# program to copy files from the remote server to destination, only if it is not found in local repository. If found move file from local repository to destination folder.
Although the code worked fine and accomplishes the task very well. I wonder, have I written the most efficient code? Can you find any improvements?
string destinationFolder = #"C:\DataFolder";
string remoteRepository = #"\\RemoteComputer\DataFolder";
string localRepository = #"\\LocalComputer\LocalRepository";
protected void Page_Load(object sender, EventArgs e)
{
foreach (string remoteSrcFile in Directory.EnumerateFiles(remoteRepository, "*.*", SearchOption.AllDirectories))
{
bool foundInLocalRepo = false; ;
foreach (var localSrcFile in Directory.EnumerateFiles(localRepository, "*.*", SearchOption.AllDirectories))
{
if (Path.GetFileName(remoteSrcFile).Equals(Path.GetFileName(localSrcFile)))
{
FileInfo localFile = new FileInfo(localSrcFile);
FileInfo remoteFile = new FileInfo(remoteSrcFile);
//copy this file from local repository
if (localFile.Length == remoteFile.Length)
{
try
{
File.Move(localSrcFile, PrepareDestinationPath(remoteSrcFile));
Debug.WriteLine(remoteSrcFile + " moved from local repo");
}
catch (Exception ex)
{
Debug.WriteLine(remoteSrcFile + " did not move");
}
foundInLocalRepo = true;
break;
}
}
}
if (!foundInLocalRepo)
{
//copy this file from remote repository
try
{
File.Copy(remoteSrcFile, PrepareDestinationPath(remoteSrcFile), false);
Debug.WriteLine(remoteSrcFile + " copied from remote repo");
}
catch (Exception ex)
{
Debug.WriteLine(remoteSrcFile + " did not copy");
}
}
}
}
private string PrepareDestinationPath(string remoteSrcFile)
{
string relativePath = remoteSrcFile.Split(new string[] { "DataFolder" }, StringSplitOptions.None)[1];
string copyPath = Path.GetFullPath(destinationFolder + relativePath);
Directory.CreateDirectory(Path.GetDirectoryName(copyPath));
return copyPath;
}
EDIT:
Based on answer given by Thomas I am attempting to zip the file.
Traditionally as an end user we use to zip a file and then copy. As a programmer can we zip and copy the file parallel? I mean the portion which has been zipped send it over the wire?
You are doing far too much work with the nested loop.
You should remove the inner "foreach" and replace it with some code that:
(1) Constructs the name of the file that you are looking for and
(2) Uses File.Exists() to see if exists, then
(3) Continues with the same block of code that you currently have following the "if (Path.GetFileName(remoteSrcFile)..." condition.
Something like this:
foreach (string remoteSrcFile in Directory.EnumerateFiles(remoteRepository, "*.*", SearchOption.AllDirectories))
{
string localSrcFile = Path.Combine(localRepository, Path.GetFileName(remoteSrcFile));
if (File.Exists(localSrcFile))
{
...
}
}
I would suggest zipping the files before moving. Try take a look at the very simple http://dotnetzip.codeplex.com/
Try zipping 1000 files a time, in that way, you don't have to run the for-loop that many times and establish new connections etc each time.

Scanning Computer and displaying files in listbox?

Ok, i have some code that will scan my computer and find .txt files and display them in a listbox:
private void button2_Click(object sender, EventArgs e)
{
IEnumerable<string> files = System.IO.Directory.EnumerateFiles(#"C:\", "*.txt*", System.IO.SearchOption.AllDirectories);
foreach (var f in files)
{
listBox1.Items.Add(String.Format("{0}", f));
}
}
I get an error every time i run this. It says i do not authorization to the trash bin. I do not care weather it scans the trash or not. It there any way to exclude the trash bin out of the scan? Also, can someone help me improve my code, if you see anything wrong! Thanks!
Quickest way is to put them under a try-catch block because EnumerateFiles function does not have access to the restricted files because of the operating system permissions.
private void SearchDrives()
{
foreach (String drive in Directory.GetLogicalDrives())
{
try
{
// Search for folders into the drive.
SearchFolders(drive);
}
catch (Exception) { }
}
}
//---------------------------------------------------------------------------
private void SearchFolders(String prmPath)
{
try
{
foreach (String folder in Directory.GetDirectories(prmPath))
{
// Recursive call for each subdirectory.
SearchFolders(folder);
// Create the list of files.
SearchFiles(folder);
}
}
catch (Exception) { }
}
//---------------------------------------------------------------------------
private void SearchFiles(String prmPath)
{
try
{
foreach (String file in Directory.GetFiles(prmPath))
{
FileInfo info = new FileInfo(file);
if (info.Extension == ".txt")
{
listBox1.Items.Add(info.Name);
}
}
}
catch (Exception) { }
}
//---------------------------------------------------------------------------
Not just the recycle bin, it will also fail to read the file header of several files into your system direcotry.
In general you can do it so that you do recursive calls for every folder and just use try/catch blocks to see witch ones you can or can't access. But as Andras suggested I would also go with what allready exists will save you time
Another aproach on your example

Get all files and directories in specific path fast

I am creating a backup application where c# scans a directory. Before I use to have something like this in order to get all the files and subfiles in a directory:
DirectoryInfo di = new DirectoryInfo("A:\\");
var directories= di.GetFiles("*", SearchOption.AllDirectories);
foreach (FileInfo d in directories)
{
//Add files to a list so that later they can be compared to see if each file
// needs to be copid or not
}
The only problem with that is that sometimes a file could not be accessed and I get several errors. an example of an error that I get is:
As a result I created a recursive method that will scan all files in the current directory. If there where directories in that directory then the method will be called again passing that directory. The nice thing about this method is that I could place the files inside a try catch block giving me the option to add those files to a List if there where no errors and adding the directory to another list if I had errors.
try
{
files = di.GetFiles(searchPattern, SearchOption.TopDirectoryOnly);
}
catch
{
//info of this folder was not able to get
lstFilesErrors.Add(sDir(di));
return;
}
So this method works great the only problem is that when I scan a large directory it takes to much times. How could I speed up this process? My actual method is this in case you need it.
private void startScan(DirectoryInfo di)
{
//lstFilesErrors is a list of MyFile objects
// I created that class because I wanted to store more specific information
// about a file such as its comparePath name and other properties that I need
// in order to compare it with another list
// lstFiles is a list of MyFile objects that store all the files
// that are contained in path that I want to scan
FileInfo[] files = null;
DirectoryInfo[] directories = null;
string searchPattern = "*.*";
try
{
files = di.GetFiles(searchPattern, SearchOption.TopDirectoryOnly);
}
catch
{
//info of this folder was not able to get
lstFilesErrors.Add(sDir(di));
return;
}
// if there are files in the directory then add those files to the list
if (files != null)
{
foreach (FileInfo f in files)
{
lstFiles.Add(sFile(f));
}
}
try
{
directories = di.GetDirectories(searchPattern, SearchOption.TopDirectoryOnly);
}
catch
{
lstFilesErrors.Add(sDir(di));
return;
}
// if that directory has more directories then add them to the list then
// execute this function
if (directories != null)
foreach (DirectoryInfo d in directories)
{
FileInfo[] subFiles = null;
DirectoryInfo[] subDir = null;
bool isThereAnError = false;
try
{
subFiles = d.GetFiles();
subDir = d.GetDirectories();
}
catch
{
isThereAnError = true;
}
if (isThereAnError)
lstFilesErrors.Add(sDir(d));
else
{
lstFiles.Add(sDir(d));
startScan(d);
}
}
}
Ant the problem if I try to handle the exception with something like:
DirectoryInfo di = new DirectoryInfo("A:\\");
FileInfo[] directories = null;
try
{
directories = di.GetFiles("*", SearchOption.AllDirectories);
}
catch (UnauthorizedAccessException e)
{
Console.WriteLine("There was an error with UnauthorizedAccessException");
}
catch
{
Console.WriteLine("There was antother error");
}
Is that if an exception occurs then I get no files.
This method is much faster. You can only tel when placing a lot of files in a directory. My A:\ external hard drive contains almost 1 terabit so it makes a big difference when dealing with a lot of files.
static void Main(string[] args)
{
DirectoryInfo di = new DirectoryInfo("A:\\");
FullDirList(di, "*");
Console.WriteLine("Done");
Console.Read();
}
static List<FileInfo> files = new List<FileInfo>(); // List that will hold the files and subfiles in path
static List<DirectoryInfo> folders = new List<DirectoryInfo>(); // List that hold direcotries that cannot be accessed
static void FullDirList(DirectoryInfo dir, string searchPattern)
{
// Console.WriteLine("Directory {0}", dir.FullName);
// list the files
try
{
foreach (FileInfo f in dir.GetFiles(searchPattern))
{
//Console.WriteLine("File {0}", f.FullName);
files.Add(f);
}
}
catch
{
Console.WriteLine("Directory {0} \n could not be accessed!!!!", dir.FullName);
return; // We alredy got an error trying to access dir so dont try to access it again
}
// process each directory
// If I have been able to see the files in the directory I should also be able
// to look at its directories so I dont think I should place this in a try catch block
foreach (DirectoryInfo d in dir.GetDirectories())
{
folders.Add(d);
FullDirList(d, searchPattern);
}
}
By the way I got this thanks to your comment Jim Mischel
In .NET 4.0 there's the Directory.EnumerateFiles method which returns an IEnumerable<string> and is not loading all the files in memory. It's only once you start iterating over the returned collection that files will be returned and exceptions could be handled.
There is a long history of the .NET file enumeration methods being slow. The issue is there is not an instantaneous way of enumerating large directory structures. Even the accepted answer here has its issues with GC allocations.
The best I've been able to do is wrapped up in my library and exposed as the FindFile (source) class in the CSharpTest.Net.IO namespace. This class can enumerate files and folders without unneeded GC allocations and string marshalling.
The usage is simple enough, and the RaiseOnAccessDenied property will skip the directories and files the user does not have access to:
private static long SizeOf(string directory)
{
var fcounter = new CSharpTest.Net.IO.FindFile(directory, "*", true, true, true);
fcounter.RaiseOnAccessDenied = false;
long size = 0, total = 0;
fcounter.FileFound +=
(o, e) =>
{
if (!e.IsDirectory)
{
Interlocked.Increment(ref total);
size += e.Length;
}
};
Stopwatch sw = Stopwatch.StartNew();
fcounter.Find();
Console.WriteLine("Enumerated {0:n0} files totaling {1:n0} bytes in {2:n3} seconds.",
total, size, sw.Elapsed.TotalSeconds);
return size;
}
For my local C:\ drive this outputs the following:
Enumerated 810,046 files totaling 307,707,792,662 bytes in 232.876 seconds.
Your mileage may vary by drive speed, but this is the fastest method I've found of enumerating files in managed code. The event parameter is a mutating class of type FindFile.FileFoundEventArgs so be sure you do not keep a reference to it as it's values will change for each event raised.
I know this is old, but... Another option may be to use the FileSystemWatcher like so:
void SomeMethod()
{
System.IO.FileSystemWatcher m_Watcher = new System.IO.FileSystemWatcher();
m_Watcher.Path = path;
m_Watcher.Filter = "*.*";
m_Watcher.NotifyFilter = m_Watcher.NotifyFilter = NotifyFilters.LastAccess | NotifyFilters.LastWrite | NotifyFilters.FileName | NotifyFilters.DirectoryName;
m_Watcher.Created += new FileSystemEventHandler(OnChanged);
m_Watcher.EnableRaisingEvents = true;
}
private void OnChanged(object sender, FileSystemEventArgs e)
{
string path = e.FullPath;
lock (listLock)
{
pathsToUpload.Add(path);
}
}
This would allow you to watch the directories for file changes with an extremely lightweight process, that you could then use to store the names of the files that changed so that you could back them up at the appropriate time.
(copied this piece from my other answer in your other question)
Show progress when searching all files in a directory
Fast files enumeration
Of course, as you already know, there are a lot of ways of doing the enumeration itself... but none will be instantaneous. You could try using the USN Journal of the file system to do the scan. Take a look at this project in CodePlex: MFT Scanner in VB.NET... it found all the files in my IDE SATA (not SSD) drive in less than 15 seconds, and found 311000 files.
You will have to filter the files by path, so that only the files inside the path you are looking are returned. But that is the easy part of the job!
Maybe it will be helpfull for you.
You could use "DirectoryInfo.EnumerateFiles" method and handle UnauthorizedAccessException as you need.
using System;
using System.IO;
class Program
{
static void Main(string[] args)
{
DirectoryInfo diTop = new DirectoryInfo(#"d:\");
try
{
foreach (var fi in diTop.EnumerateFiles())
{
try
{
// Display each file over 10 MB;
if (fi.Length > 10000000)
{
Console.WriteLine("{0}\t\t{1}", fi.FullName, fi.Length.ToString("N0"));
}
}
catch (UnauthorizedAccessException UnAuthTop)
{
Console.WriteLine("{0}", UnAuthTop.Message);
}
}
foreach (var di in diTop.EnumerateDirectories("*"))
{
try
{
foreach (var fi in di.EnumerateFiles("*", SearchOption.AllDirectories))
{
try
{
// Display each file over 10 MB;
if (fi.Length > 10000000)
{
Console.WriteLine("{0}\t\t{1}", fi.FullName, fi.Length.ToString("N0"));
}
}
catch (UnauthorizedAccessException UnAuthFile)
{
Console.WriteLine("UnAuthFile: {0}", UnAuthFile.Message);
}
}
}
catch (UnauthorizedAccessException UnAuthSubDir)
{
Console.WriteLine("UnAuthSubDir: {0}", UnAuthSubDir.Message);
}
}
}
catch (DirectoryNotFoundException DirNotFound)
{
Console.WriteLine("{0}", DirNotFound.Message);
}
catch (UnauthorizedAccessException UnAuthDir)
{
Console.WriteLine("UnAuthDir: {0}", UnAuthDir.Message);
}
catch (PathTooLongException LongPath)
{
Console.WriteLine("{0}", LongPath.Message);
}
}
}
You can use this to get all directories and sub-directories. Then simply loop through to process the files.
string[] folders = System.IO.Directory.GetDirectories(#"C:\My Sample Path\","*", System.IO.SearchOption.AllDirectories);
foreach(string f in folders)
{
//call some function to get all files in folder
}

Categories

Resources