Trying to list all the directories and files on a machine and sort them into size.
I get a list of the file names and their sizes but it won't stick them in order... Any suggestions greatly appreciated! Cheers
//create instance of drive which contains files
DriveInfo di = new DriveInfo(#"C:\");
//find the root directory path
DirectoryInfo dirInfo = di.RootDirectory;
try
{
//EnumerateFiles increases the performance and sort them
foreach (var fi in dirInfo.EnumerateFiles().OrderBy(f =>f.Length).ToList())
{
try
{
//Display each file
Console.WriteLine("{0}\t\t{1}", fi.FullName, fi.Length);
}
catch (UnauthorizedAccessException UnAuthTop)
{
Console.WriteLine("{0}", UnAuthTop.Message);
}
}
You could try something like this
// get your folder
DirectoryInfo di = new DirectoryInfo(you path here);
// create a list of files from that folder
List<FileInfo> fi = di.GetFiles().ToList();
// pass the files in a sorted order
var files = fi.Where(f => f.FullName != null).OrderByDescending(f => f.Length);
In this example: files will contain a list if files from the current folder level, sorted by file.length .
You might want to check if fi is not null before trying to pass it to the variable files. Then, you can iterate over files with the foreach.
[ UPDATE ]
As #Abion47 points out, there doesn't seem to be much difference between the Op's code and my solution. From what I read in the OP, the OP is not getting a sorted list, which is the desired result.
What I see that might make a difference is that, by using the EnumerateFiles, you start enumerating and can act on file info before the entire collection of files is returned. It's great for handling enormous amounts of files. And, more efficient than GetFiles, for performing operations on individual files as they become available.
Since that is the case, you might not be able to sort the returned files properly, until the complete collection has been enumerated.
By using GetFiles, you have to wait for it to return the whole collection. Making it easier to sort.
I don't think that GetFiles is ideal for handling huge collections. In that case, I would divide the work into steps or use some other approach.
Related
For my WPF project, I have to calculate the total file size in a single directory (which could have sub directories).
Sample 1
DirectoryInfo di = new DirectoryInfo(path);
var totalLength = di.EnumerateFiles("*.*", SearchOption.AllDirectories).Sum(fi => fi.Length);
if (totalLength / 1000000 >= size)
return true;
Sample 2
var sizeOfHtmlDirectory = Directory.GetFiles(path, "*.*", SearchOption.AllDirectories);
long totalLength = 0;
foreach (var file in sizeOfHtmlDirectory)
{
totalLength += new FileInfo(file).Length;
if (totalLength / 1000000 >= size)
return true;
}
Both samples work.
Sample 1 complete in a massivly faster time. I've not timed this accurately but on my PC, using the same folder with the same content/file sizes, Sample 1 takes a few seconds, Sample 2 takes a few minutes.
EDIT
I should point out, the bottle neck in Sample 2 is within the foreach loop! It reads the GetFiles quickly and enters the foreach loop quickly.
My question is, how do I find out why this is the case?
Contrary to what the other answers indicate the main difference is not EnumerateFiles vs GetFiles - it's DirectoryInfo vs Directory - in the latter case you only have strings and have to create new FileInfo instances separately which is very costly.
DirectoryInfo returns FileInfo instances that use cached information vs directly creating new FileInfo instances which does not - more details here and here.
Relevant quote (via "The Old New Thing"):
In NTFS, file system metadata is a property not of the directory entry
but rather of the file, with some of the metadata replicated into the
directory entry as a tweak to improve directory enumeration
performance. Functions like FindĀFirstĀFile report the directory
entry, and by putting the metadata that FAT users were accustomed to
getting "for free", they could avoid being slower than FAT for
directory listings. The directory-enumeration functions report the
last-updated metadata, which may not correspond to the actual metadata
if the directory entry is stale.
EnumerateFiles is asynchronous whereas GetFiles waits until all files have been enumerated before returning the collection of files. This will have a big effect on your result.
Let's say I have a list of over 100 folder paths. I would like to retrieve just one file path from each folder path. Here is the way I am doing it or plan to do it :
var Files = new List<String>();
var Directories = Directory.GetDirectories("C:\\Firstfolder\\Secondfolder\\");
Array.ForEach(Directories, D => Files.Add(Directory.GetFiles(D).FirstOrDefault()));
Now, is this the most effiecient way? Because My program will execute this code every time it starts.
Instead of Directory.GetFiles use Directory.EnumerateFiles to avoid loading all file paths into memory.This quote from the documentation explains the difference:
The EnumerateFiles and GetFiles methods differ as follows: When you use EnumerateFiles, you can start enumerating the collection of names before the whole collection is returned; when you use GetFiles, you must wait for the whole array of names to be returned before you can access the array. Therefore, when you are working with many files and directories, EnumerateFiles can be more efficient.
If you are using .Net 4.0 you should do this instead...
var Files = Directories.SelectMany(x => Directory.EnumerateFiles(x).FirstOrDefault()).ToList();
i need to get Files from a Directory on a NetDrive. The Problem is that this Dir could contains 500k File or more.
The normal ways:
Directory.GetFiles(#"L:\cs\fromSQL\Data", "*.dat",
SearchOption.TopDirectoryOnly);
or
DirectoryInfo dir = new DirectoryInfo(#"L:\cs\fromSQL\Data");
var files =
dir.GetFiles("*.dat", SearchOption.TopDirectoryOnly)
are taking way to long. They always parse the whole Directory.
Example: NetDrive-Directory Containg ~130k Files, the first option takes 15 Minutes.
Is there a way to get just a number of files (for example the oldest one's) or something other thats faster?
Thanks!
Greetings
Christoph
You can give a try on DirectoryInfo.EnumerateFiles Method
As msdn says :-
Returns an enumerable collection of file information in the current directory.
it is IEnumerable ,so it can stream entries rather than buffer them all
For example :-
foreach(var file in Directory.EnumerateFiles(path)) {
// ...
}
More details on MSDN :-
The EnumerateFiles and GetFiles methods differ as follows: When you
use EnumerateFiles, you can start enumerating the collection of
FileInfo objects before the whole collection is returned; when you use
GetFiles, you must wait for the whole array of FileInfo objects to be
returned before you can access the array. Therefore, when you are
working with many files and directories, EnumerateFiles can be more
efficient.
Use Directory.EnumerateFiles instead:
var count = Directory.EnumerateFiles(#"L:\cs\fromSQL\Data", "*.dat",
SearchOption.TopDirectoryOnly).Count();
If you want to filter some files, then use DirectoryInfo.EnumerateFiles and filter the files using Where:
var di = new DirectoryInfo(#"L:\cs\fromSQL\Data");
var count = di.EnumerateFiles("*.dat",SearchOption.TopDirectoryOnly)
.Where(file => /* your condition */)
.Count();
Any idea how to easily support file search patterns in your software, like **, *, ?
For example subfolder/**/?svn - search in all levels of subfolder for files/folders ending in "svn" 4 characters in total.
full description: http://nant.sourceforge.net/release/latest/help/types/fileset.html
If you load the directory as a directory info e.g.
DirectoryInfo directory = new DirectoryInfo(folder);
then do a search for files like this
IEnumerable<FileInfo> fileInfo = directory.GetFiles("*.svn", SearchOption.AllDirectories);
this should get you a list of fileInfo which you can manipulate
to get all subdirectories you can do the same e.g
IEnumerable<DirectoryInfo> dirInfo = directory.GetDirectories("*svn", SearchOption.AllDirectories);
anyway that should give a idea of how i'd do it. Also because fileInfo and dirInfo are IEnumerable you can add linq where queries etc. to filter results
A mix of regex and recursion should do the trick.
Another trick might be to spawn a thread for every folder or set of folders and have the thread proceed checking one more level down. This could be beneficial to speed up the process a bit.
The reason I say this is because that is highly io bound process to check folders etc. So many threads will allow you to submit more disk requests faster thus improving the speed.
This might sound silly, but have you considered downloading the nant source code to see how they did it?
I have seen questions like What is the best way to empty a directory?
But I need to know,
what is the fastest way of deleting all the files found within the directory, except any .zip files found.
Smells like linq here... or what?
By saying fastest way, I mean the Fastest execution time.
If you are using .NET 4 you can benifit the smart way .NET now parallizing your functions. This code is the fasted way to do it. This scales with your numbers of cores on the processor too.
DirectoryInfo di = new DirectoryInfo(yourDir);
var files = di.GetFiles();
files.AsParallel().Where(f => f.Extension != ".zip").ForAll((f) => f.Delete());
By fastest are you asking for the least lines of code or the quickest execution time? Here is a sample using LINQ with a parallel for each loop to delete them quickly.
string[] files = System.IO.Directory.GetFiles("c:\\temp", "*.*", IO.SearchOption.TopDirectoryOnly);
List<string> del = (
from string s in files
where ! (s.EndsWith(".zip"))
select s).ToList();
Parallel.ForEach(del, (string s) => { IO.File.Delete(s); });
At the time of writing this answer none of the previous answers used Directory.EnumerateFiles() which allows you to carry on operations on the list of files while the list is being constructed .
Code:
Parallel.ForEach(Directory.EnumerateFiles(path, "*", SearchOption.AllDirectories).AsParallel(), Item =>
{
if(!string.Equals(Path.GetExtension(Item), ".zip",StringComparison.OrdinalIgnoreCase))
File.Delete(Item);
});
as far as I know the performance gain from using AsParallel() shouldn't be significant(if found) in this case however it did make difference in my case.
I compared the time it takes to delete all but .zip files in a list of 4689 files of which 10 were zip files using 1-foreach. 2-parallel foreach. 3-IEnumerable().AsParallel().ForAll. 4-parallel foreach using IEnumerable().AsParallel() as illustrated above.
Results:
1-1545
2-1015
3-1103
4-839
the fifth and the last case was a normal foreach using Directory.GetFiles()
5-2266
of course the results weren't conclusive , as far as I know to carry on a proper benchmarking you need to use a ram drive instead of a HDD .
Note:that the performance difference between EnumerateFiles and GetFiles becomes more apparent as the number of files increases.
Here's plain old C#
foreach(string file in Directory.GetFiles(Server.MapPath("~/yourdirectory")))
{
if(Path.GetExtension(file) != ".zip")
{
File.Delete(file);
}
}
And here's LINQ
var files = from f in Directory.GetFiles("")
where Path.GetExtension(f) != ".zip"
select f;
foreach(string file in files)
File.Delete(file);