File/directory search - c#

Any idea how to easily support file search patterns in your software, like **, *, ?
For example subfolder/**/?svn - search in all levels of subfolder for files/folders ending in "svn" 4 characters in total.
full description: http://nant.sourceforge.net/release/latest/help/types/fileset.html

If you load the directory as a directory info e.g.
DirectoryInfo directory = new DirectoryInfo(folder);
then do a search for files like this
IEnumerable<FileInfo> fileInfo = directory.GetFiles("*.svn", SearchOption.AllDirectories);
this should get you a list of fileInfo which you can manipulate
to get all subdirectories you can do the same e.g
IEnumerable<DirectoryInfo> dirInfo = directory.GetDirectories("*svn", SearchOption.AllDirectories);
anyway that should give a idea of how i'd do it. Also because fileInfo and dirInfo are IEnumerable you can add linq where queries etc. to filter results

A mix of regex and recursion should do the trick.
Another trick might be to spawn a thread for every folder or set of folders and have the thread proceed checking one more level down. This could be beneficial to speed up the process a bit.
The reason I say this is because that is highly io bound process to check folders etc. So many threads will allow you to submit more disk requests faster thus improving the speed.

This might sound silly, but have you considered downloading the nant source code to see how they did it?

Related

display directories and files by size order c#

Trying to list all the directories and files on a machine and sort them into size.
I get a list of the file names and their sizes but it won't stick them in order... Any suggestions greatly appreciated! Cheers
//create instance of drive which contains files
DriveInfo di = new DriveInfo(#"C:\");
//find the root directory path
DirectoryInfo dirInfo = di.RootDirectory;
try
{
//EnumerateFiles increases the performance and sort them
foreach (var fi in dirInfo.EnumerateFiles().OrderBy(f =>f.Length).ToList())
{
try
{
//Display each file
Console.WriteLine("{0}\t\t{1}", fi.FullName, fi.Length);
}
catch (UnauthorizedAccessException UnAuthTop)
{
Console.WriteLine("{0}", UnAuthTop.Message);
}
}
You could try something like this
// get your folder
DirectoryInfo di = new DirectoryInfo(you path here);
// create a list of files from that folder
List<FileInfo> fi = di.GetFiles().ToList();
// pass the files in a sorted order
var files = fi.Where(f => f.FullName != null).OrderByDescending(f => f.Length);
In this example: files will contain a list if files from the current folder level, sorted by file.length .
You might want to check if fi is not null before trying to pass it to the variable files. Then, you can iterate over files with the foreach.
[ UPDATE ]
As #Abion47 points out, there doesn't seem to be much difference between the Op's code and my solution. From what I read in the OP, the OP is not getting a sorted list, which is the desired result.
What I see that might make a difference is that, by using the EnumerateFiles, you start enumerating and can act on file info before the entire collection of files is returned. It's great for handling enormous amounts of files. And, more efficient than GetFiles, for performing operations on individual files as they become available.
Since that is the case, you might not be able to sort the returned files properly, until the complete collection has been enumerated.
By using GetFiles, you have to wait for it to return the whole collection. Making it easier to sort.
I don't think that GetFiles is ideal for handling huge collections. In that case, I would divide the work into steps or use some other approach.

Most efficient way to retrieve one file path from a list of folders

Let's say I have a list of over 100 folder paths. I would like to retrieve just one file path from each folder path. Here is the way I am doing it or plan to do it :
var Files = new List<String>();
var Directories = Directory.GetDirectories("C:\\Firstfolder\\Secondfolder\\");
Array.ForEach(Directories, D => Files.Add(Directory.GetFiles(D).FirstOrDefault()));
Now, is this the most effiecient way? Because My program will execute this code every time it starts.
Instead of Directory.GetFiles use Directory.EnumerateFiles to avoid loading all file paths into memory.This quote from the documentation explains the difference:
The EnumerateFiles and GetFiles methods differ as follows: When you use EnumerateFiles, you can start enumerating the collection of names before the whole collection is returned; when you use GetFiles, you must wait for the whole array of names to be returned before you can access the array. Therefore, when you are working with many files and directories, EnumerateFiles can be more efficient.
If you are using .Net 4.0 you should do this instead...
var Files = Directories.SelectMany(x => Directory.EnumerateFiles(x).FirstOrDefault()).ToList();

Does DirectoryInfo.EnumerateDirectories sort items?

I have an application that has to enumerate a few folders and process the files inside them.
It has to support resuming, meaning it has to start from the folder it was processing the last time.
I was thinking of using the DirectoryInfo.EnumerateDirectories method. I'd save the name of the last processed dir in a file, skip the enumeration until I meet that dir name and continue processing from there.
However, the documentation does not say anything about the order in which files are enumerated.
Is it safe to assume that using this method the program will always process the remaining directories? Or is it possible that the next time the program runs the directories will be enumerated in another order, thus making it possible to leave some unprocessed and process others two times?
If this method is not safe, what would be a good alternative?
Internally EnumerateDirectories() uses Win32 API FindNextFile() (source code). From MSDN:
"The order in which the search returns the files, such as alphabetical order, is not guaranteed, and is dependent on the file system."
DirectoryInfo.EnumerateDirectories() return IEnumerable<DirectoryInfo> in your msdn doc. I don't think it is already sorted and even if it is sorted, have question on sorted by which field or property of DirectoryInfo.
You can do sorting by yourself on query.
// Create a DirectoryInfo of the Program Files directory.
DirectoryInfo dirPrograms = new DirectoryInfo(#"c:\program files");
DateTime StartOf2009 = new DateTime(2009, 01, 01);
// LINQ query for all directories created before 2009.
var dirs = (from dir in dirPrograms.EnumerateDirectories()
where dir.CreationTimeUtc < StartOf2009
order by dir.Name).ToList();

What is the fastest way of deleting files in a directory? (Except specific file extension)

I have seen questions like What is the best way to empty a directory?
But I need to know,
what is the fastest way of deleting all the files found within the directory, except any .zip files found.
Smells like linq here... or what?
By saying fastest way, I mean the Fastest execution time.
If you are using .NET 4 you can benifit the smart way .NET now parallizing your functions. This code is the fasted way to do it. This scales with your numbers of cores on the processor too.
DirectoryInfo di = new DirectoryInfo(yourDir);
var files = di.GetFiles();
files.AsParallel().Where(f => f.Extension != ".zip").ForAll((f) => f.Delete());
By fastest are you asking for the least lines of code or the quickest execution time? Here is a sample using LINQ with a parallel for each loop to delete them quickly.
string[] files = System.IO.Directory.GetFiles("c:\\temp", "*.*", IO.SearchOption.TopDirectoryOnly);
List<string> del = (
from string s in files
where ! (s.EndsWith(".zip"))
select s).ToList();
Parallel.ForEach(del, (string s) => { IO.File.Delete(s); });
At the time of writing this answer none of the previous answers used Directory.EnumerateFiles() which allows you to carry on operations on the list of files while the list is being constructed .
Code:
Parallel.ForEach(Directory.EnumerateFiles(path, "*", SearchOption.AllDirectories).AsParallel(), Item =>
{
if(!string.Equals(Path.GetExtension(Item), ".zip",StringComparison.OrdinalIgnoreCase))
File.Delete(Item);
});
as far as I know the performance gain from using AsParallel() shouldn't be significant(if found) in this case however it did make difference in my case.
I compared the time it takes to delete all but .zip files in a list of 4689 files of which 10 were zip files using 1-foreach. 2-parallel foreach. 3-IEnumerable().AsParallel().ForAll. 4-parallel foreach using IEnumerable().AsParallel() as illustrated above.
Results:
1-1545
2-1015
3-1103
4-839
the fifth and the last case was a normal foreach using Directory.GetFiles()
5-2266
of course the results weren't conclusive , as far as I know to carry on a proper benchmarking you need to use a ram drive instead of a HDD .
Note:that the performance difference between EnumerateFiles and GetFiles becomes more apparent as the number of files increases.
Here's plain old C#
foreach(string file in Directory.GetFiles(Server.MapPath("~/yourdirectory")))
{
if(Path.GetExtension(file) != ".zip")
{
File.Delete(file);
}
}
And here's LINQ
var files = from f in Directory.GetFiles("")
where Path.GetExtension(f) != ".zip"
select f;
foreach(string file in files)
File.Delete(file);

App to analyze folder sizes?? c# .net

I have built a small app that allows me to choose a directory and count the total size of files in that directory and its sub directories.
It allows me to select a drive and this populates a tree control with the drives immediate folders which I can then count its size!
It is written in .net and simply loops round on the directories and for each directory adds up the file sizes.
It brings my pc to a halt when It runs on say the windows or program files folders.
I had thought of Multi threading but I haven't done this before.
Any ideas to increase performance?
thanks
Your code is really going to slog since you're just using strings to refer to directories and files. Use a DirectoryInfo on your root directory; get a list of FileSystemInfos from that one using DirectoryInfo.GetFileSystemInfos(); iterate on that list, recursing in for DirectoryInfo objects and just adding the size for FileInfo objects. That should be a LOT faster.
I'd simply suggest using a background worker to preform the work. You'll probably want to make sure controls that shouldn't be usable aren't but anything that would be usable could stay usable.
Google: http://www.google.com/search?q=background+worker
This would allow your application to be multi-threaded with out some of the complexity of multiple threads. Everything has been packaged up and it convenient to use.
Do you want to increase performance or increase system responsiveness?
You can increase RESPONSIVENESS by instructing the spidering application to run its message queue loop periodically, which handles screen repaints, etc. This would allow you to give a progress update as it executes the scan, while actually decreasing performance (because you're yielding CPU priority).
This gets sub-directories:
string[] directories = Directory.GetDirectories(node.FullPath);
foreach (string dir in directories)
{
TreeNode nd = node.Nodes.Add(dir, dir.Substring(dir.LastIndexOf("\\")).Replace("\\", ""), 3);
if (showItsChildren)
ShowChildDirectories(nd, true);
size += GetDirectorySize(nd.FullPath);
}
Thsi counts file sizes:
long b = 0;
// Get array of all file names.
string[] a = Directory.GetFiles(p, "*.*");
// 2
// Calculate total bytes of all files in a loop.
foreach (string name in a)
{
// 3
// Use FileInfo to get length of each file.
FileInfo info = new FileInfo(name);
b += info.Length;
IncrementCount();
}
Try to comment out all the parts that update the UI, if it's still slow it's the disk I/O and there's nothing you can do, if it gets faster you can update the UI every X files to save UI work.
You can make your UI responsive by doing all the work in a worker thread, but it will make it slightly slower.
Disk I/O is relatively slow it is also often needed by other applications (swap file, temp files ...) also, multi-threading won't help much, all the file are on the same physical disk and it's likely the disk I/O is the bottleneck.
Just a guess, but I bet your performance hit involves the UI and not the file scan. Comment out the code that creates the TreeNode.
Try to not make your tree paint until after you complete your scan:
Make sure that the root tree node for all of your files is NOT added to the treey. Add all the children, and then add the "top" node/nodes at the very end of your processing. See how that works.

Categories

Resources