Recursive search method in c# - c#

I decided not to keep the reports in the application as embedded resources anymore, and move them to the server locally. The advantages are obvious, but considering that I want to organize the directories (common reports, letterheads, etc.) in a legible way. I need a way to load reports only by name, calling a recursive search method. My plan is to use a cache that can be built at start of the application and then search in a list OR use a method that seeks report name each time I need... Any ideas and an example eventually with code(c#) are very welcomed!
Folder Structure can be like:
\\webserver\Reports(report files with unique names)
\\webserver\Reports\Common(report files with unique names)
\\webserver\Reports\Manager1(report files with unique names)
\\webserver\Reports\Manager1\Invoices(report files with unique names)
\\webserver\Reports\ManagerN(report files with unique names)
Hope to be useful this exemplification! Thank you!

If you want to search directories recursivly for a certain extension:
var d = new DirectoryInfo("\\\\webserver\\Reports");
var files = d.GetFiles(".rpt", SearchOption.AllDirectories);

How about instead of searching every time, you record the exact location of the reports in a section of your application's configuration file? Maybe that is what you mean by building a cache.

please refer to this article by Microsoft How to recursively search directories by using Visual C#
It explains how to write your recursive function (which is very simple in c#)
If your directory tree is not too huge (less than 100 dirs and less than, say, 50 files in each dir), then a cache is not necessary (in my humble opinion). If you build a cache, you have to take care of it (ie. handle the case when it has to be updated etc)
void DirSearch(string sDir) {
try
{
foreach (string d in Directory.GetDirectories(sDir))
{
foreach (string f in Directory.GetFiles(d, txtFile.Text))
{
lstFilesFound.Items.Add(f);
}
DirSearch(d); /// Recursive Call !!
}
}
catch (System.Exception excpt)
{
Console.WriteLine(excpt.Message);
}}

Related

display directories and files by size order c#

Trying to list all the directories and files on a machine and sort them into size.
I get a list of the file names and their sizes but it won't stick them in order... Any suggestions greatly appreciated! Cheers
//create instance of drive which contains files
DriveInfo di = new DriveInfo(#"C:\");
//find the root directory path
DirectoryInfo dirInfo = di.RootDirectory;
try
{
//EnumerateFiles increases the performance and sort them
foreach (var fi in dirInfo.EnumerateFiles().OrderBy(f =>f.Length).ToList())
{
try
{
//Display each file
Console.WriteLine("{0}\t\t{1}", fi.FullName, fi.Length);
}
catch (UnauthorizedAccessException UnAuthTop)
{
Console.WriteLine("{0}", UnAuthTop.Message);
}
}
You could try something like this
// get your folder
DirectoryInfo di = new DirectoryInfo(you path here);
// create a list of files from that folder
List<FileInfo> fi = di.GetFiles().ToList();
// pass the files in a sorted order
var files = fi.Where(f => f.FullName != null).OrderByDescending(f => f.Length);
In this example: files will contain a list if files from the current folder level, sorted by file.length .
You might want to check if fi is not null before trying to pass it to the variable files. Then, you can iterate over files with the foreach.
[ UPDATE ]
As #Abion47 points out, there doesn't seem to be much difference between the Op's code and my solution. From what I read in the OP, the OP is not getting a sorted list, which is the desired result.
What I see that might make a difference is that, by using the EnumerateFiles, you start enumerating and can act on file info before the entire collection of files is returned. It's great for handling enormous amounts of files. And, more efficient than GetFiles, for performing operations on individual files as they become available.
Since that is the case, you might not be able to sort the returned files properly, until the complete collection has been enumerated.
By using GetFiles, you have to wait for it to return the whole collection. Making it easier to sort.
I don't think that GetFiles is ideal for handling huge collections. In that case, I would divide the work into steps or use some other approach.

C#, .getDirectories(string file) issue retrieving directories

I am trying to retrieve the subdirectories of a path I pass in. It proccess it and gives me half of the subdirectories but for the other half, it returns a "?" when debugging. I do not know what is causing this
Here is what I have:
string root = #"C:\Users\Documents\Meta Consumer";
string[] subDir = Directory.GetDirectories(root);
When Debugging:
1: (good)
2: (good)
3: (good)
.. ..
?: (this is where 14 is)
?: (15 is here)
.. ..
?: ?
I'm not sure the entire goal, if you intend to specifically Search for a specific item or intend to manipulate the Directory at all. One thing that I do see is you haven't specified any additional search for your array. This can be hindered I believe through deep nesting or permission issues.
Resolution One: Ensure that you have valid permission to do recursive searches within the specified directory.
Resolution Two: You can attempt to run a search for all items with a wildcard then force it to search all directories. This may help solve potential deep nesting issues you may encounter.
Resolution Three: Try the below code; see if it solves the issue.
string root = Environment.GetFolderPath(Environment.SpecialFolder.Documents);
string[] subDir = Directories.GetDirectories(root, "*", SearchOption.AllDirectories);
foreach (string s in subDir)
{
Console.WriteLine(s);
}
See if that returns the proper information that it wasn't previously. There are folders located in your Library that though are considered public to the user are still locked as they reside in the User Profile so permissions will be a good check.
Running Visual Studio as an Administrator will also help in your troubleshooting. Also you should see if there are any Inner Exceptions to help identify it as well.

Fastest way to delete files that are not in a data table?

I need to write a code in C# that will select a list of file names from a data table and delete every file in a folder that is not in this list.
One possibility would be to have both ordered by name, and then loop through my table results, and for each result, loop through my files and delete them until I find a file that matches the current result or is alphabetically bigger, and then move to the next result without resetting the current file index.
I haven't tried to actually implement this, but seems to me that this would be an O(n) since each list would be looped through just once (ignoring the sorting both lists part). The only thing I'm not sure about is whether I can be 100% sure both the file system and the database engine will sort exactly the same way (will they both consider "_" smaller than "-" and stuff like that). If not, the algorithm above just wouldn't work at all. (By the way this is a Jet Engine database.)
But since this is probably not such an uncommon problem you guys might already know a better solution. I tried search the web but couldn't find anything. Perhaps a more effective solution would be to put each list into a HashSet and find their difference.
Get the folder content into folderFiles (IEnumerable<string>)
Get the file you want to keep in filesToKeep (IEnumerable<string>)
Get a list of "not in list" files.
Delete these files.
Code Sample :
IEnumerable<FileInfo> folderFiles = new List<FileInfo>(); // Fill me.
IEnumerable<string> filesToKeep = new List<string>(); // Fill me.
foreach (string fileToDelete in folderFiles.Select(fi => fi.FullName).Except(filesToKeep))
{
File.Delete(fileToDelete);
}
Here is my suggestion for you. Assuming filesInDatabase contains a list of files which are in the database and pathOfDirectory contains the path of the directory where the files to compare are contained.
foreach (var fileToDelete in Directory.EnumerateFiles(pathOfDirectory).Where(item => !filesInDatabase.Contains(item))
{
File.Delete(fileToDelete);
}
EDIT:
This requires using System.Linq;, because it uses LINQ.
I think hashing is the way to go, but you don't really need two HashSets. Only one HashSet is needed to store the standardized file names from the datatable; the other container can be any collection data type.
First off, .Net allows you to define cultures that can be used in sorting, but I'm not all that familiar with the mechanism, so I'll let Google to give his pointers on the subject.
Second, to avoid all the culture mass, you can use a different algorithm with an idea similar to radix-sort (only without the sort) - time complexity is O(n * length_longest_file_name). File name lengths are limited (as far as I know, almost no file system will allow a file name longer then 256), so I'm assuming that n is dramatically larger then file name lengths, and if n is smaller then the max file name length, just use an O(n^2) method and avoid the work (iterating lists this small is near instant times anyways).
Note: This method does not require sorting.
The idea is to create an array of symbols that can be used as file name chars (about 60-70 chars, if this is a case sensitive search), and another flag array with a flag for each char in the first array.
Now, you create a loop for each char in the file names of the list from the DB (from 1 -> length_longest_file_name).
In each iteration (i) you go over the i-th char of each file name in the DB list. Every char you see, you set it's relevant flag to true.
When all flags are set, you go over the second list and delete every file for which the i-th char of it's name is not flagged.
Implementation might be complex, and the overhead of the two arrays might make it slower when n is small, but you can optimize this to make it better (for instance, no iterating over files that have names shorter then the current i by removing them from both lists).
Hope this helps
I have another idea that might be faster.
var filesToDelete = new List<string>(Directory.GetFiles(directoryPath));
foreach (var databaseFile in databaseFileList)
{
filesToDelete.Remove(databaseFile);
}
foreach (var fileToDelete in filesToDelete)
{
File.Delete(fileToDelete);
}
Explanation: First get all files containing in the directory. Then delete every file from that list, which is in the database. At last delete all remaining files from the list filesToDelete.

File/directory search

Any idea how to easily support file search patterns in your software, like **, *, ?
For example subfolder/**/?svn - search in all levels of subfolder for files/folders ending in "svn" 4 characters in total.
full description: http://nant.sourceforge.net/release/latest/help/types/fileset.html
If you load the directory as a directory info e.g.
DirectoryInfo directory = new DirectoryInfo(folder);
then do a search for files like this
IEnumerable<FileInfo> fileInfo = directory.GetFiles("*.svn", SearchOption.AllDirectories);
this should get you a list of fileInfo which you can manipulate
to get all subdirectories you can do the same e.g
IEnumerable<DirectoryInfo> dirInfo = directory.GetDirectories("*svn", SearchOption.AllDirectories);
anyway that should give a idea of how i'd do it. Also because fileInfo and dirInfo are IEnumerable you can add linq where queries etc. to filter results
A mix of regex and recursion should do the trick.
Another trick might be to spawn a thread for every folder or set of folders and have the thread proceed checking one more level down. This could be beneficial to speed up the process a bit.
The reason I say this is because that is highly io bound process to check folders etc. So many threads will allow you to submit more disk requests faster thus improving the speed.
This might sound silly, but have you considered downloading the nant source code to see how they did it?

What is the fastest way of deleting files in a directory? (Except specific file extension)

I have seen questions like What is the best way to empty a directory?
But I need to know,
what is the fastest way of deleting all the files found within the directory, except any .zip files found.
Smells like linq here... or what?
By saying fastest way, I mean the Fastest execution time.
If you are using .NET 4 you can benifit the smart way .NET now parallizing your functions. This code is the fasted way to do it. This scales with your numbers of cores on the processor too.
DirectoryInfo di = new DirectoryInfo(yourDir);
var files = di.GetFiles();
files.AsParallel().Where(f => f.Extension != ".zip").ForAll((f) => f.Delete());
By fastest are you asking for the least lines of code or the quickest execution time? Here is a sample using LINQ with a parallel for each loop to delete them quickly.
string[] files = System.IO.Directory.GetFiles("c:\\temp", "*.*", IO.SearchOption.TopDirectoryOnly);
List<string> del = (
from string s in files
where ! (s.EndsWith(".zip"))
select s).ToList();
Parallel.ForEach(del, (string s) => { IO.File.Delete(s); });
At the time of writing this answer none of the previous answers used Directory.EnumerateFiles() which allows you to carry on operations on the list of files while the list is being constructed .
Code:
Parallel.ForEach(Directory.EnumerateFiles(path, "*", SearchOption.AllDirectories).AsParallel(), Item =>
{
if(!string.Equals(Path.GetExtension(Item), ".zip",StringComparison.OrdinalIgnoreCase))
File.Delete(Item);
});
as far as I know the performance gain from using AsParallel() shouldn't be significant(if found) in this case however it did make difference in my case.
I compared the time it takes to delete all but .zip files in a list of 4689 files of which 10 were zip files using 1-foreach. 2-parallel foreach. 3-IEnumerable().AsParallel().ForAll. 4-parallel foreach using IEnumerable().AsParallel() as illustrated above.
Results:
1-1545
2-1015
3-1103
4-839
the fifth and the last case was a normal foreach using Directory.GetFiles()
5-2266
of course the results weren't conclusive , as far as I know to carry on a proper benchmarking you need to use a ram drive instead of a HDD .
Note:that the performance difference between EnumerateFiles and GetFiles becomes more apparent as the number of files increases.
Here's plain old C#
foreach(string file in Directory.GetFiles(Server.MapPath("~/yourdirectory")))
{
if(Path.GetExtension(file) != ".zip")
{
File.Delete(file);
}
}
And here's LINQ
var files = from f in Directory.GetFiles("")
where Path.GetExtension(f) != ".zip"
select f;
foreach(string file in files)
File.Delete(file);

Categories

Resources