Finding all files in a folder using enumeration - c#

I'm trying to list out all files under a given directory by taking sub directories as well into account.I'm using yield so that I could club this with Take where I call this (note that I'm using .NET 3.5).
Below is my code:
IEnumerable<string> Search(string sDir)
{
foreach (var file in Directory.GetFiles(sDir))
{
yield return file;
}
foreach (var directory in Directory.GetDirectories(sDir))
{
Search(directory);
}
}
I don't know what is going wrong here, but it only returns one file (which is the one under the root directory, and there is only one there as well). Can you please help?

You need to yield the results of the recursive search, otherwise you're just throwing its results away:
IEnumerable<string> Search(string sDir)
{
foreach (var file in Directory.GetFiles(sDir))
{
yield return file;
}
foreach (var directory in Directory.GetDirectories(sDir))
{
foreach(var file in Search(directory))
yield return file;
}
}
Note that if your intent is to simply get a flat list of every file, consider using Directory.GetFiles instead with the option to search all subdirectories. If your intent is to leverage LINQ (or other methods) to apply searching criteria or a limit to the total number of files retrieved, then this is a decent way to go as you'll read directories one at a time and stop once you've fulfilled your criterion.

Related

C#: Renaming files on the result of `Directory.EnumerateFiles` causes infinite loop

I have a folder contains a lot of files(>100000),
and I want to write some code to perform a batch renaming.
I use Directory.EnumerateFiles rather than Directory.GetFiles since the former does not allocate the space for all the files name so I don't need to be worried about the memory.
Here is the code:
var files = Directory.EnumerateFiles(#"C:\many_files");
foreach (var file in files)
{
// Use`File.Move` to do the renaming stuff
}
However I encountered an infinite loop.
That's because EnumerateFiles is executing along with the renaming action. So after renaming a file, the file might be reordered to the end of the "enumeration list" and EnumerateFiles will return it again.
So Is there a solution? Or I have to use the GetFiles which may cause OOM.
The easiest way to do this is to load all the filePaths in memory and rename them.
Since you're worried about memory, You can do this:
Provided there wouldn't be any additions to the directory during this process.
static void Rename(string path, int batchSize = 5)
{
var queryable = Directory.EnumerateFiles(path).AsQueryable(); //Queryable For Getting Files
var totalFiles = queryable.Count(); //Get Total File Count in Path
int count = 0;//counter
while (count <= totalFiles)
{
var filesToProcess = queryable
.Where(f => !Path.GetFileName(f).StartsWith("reviewed_")) //Get Files Without this prefix
.Skip(count)
.Take(batchSize)
.ToList();//Get Batch To Process
foreach (var file in filesToProcess)
{
//Do Renaming Stuff
Console.WriteLine(file);
}
count += batchSize;
}
}

Check for name of a directory from an array

I'm working on a program that is supposed to scan a specific directory looking for any directories within it that have specific names, and if it finds them, tell the user.
Currently, the way I am loading the names its searching for is like this:
static string path = Path.Combine(Directory.GetCurrentDirectory(), #"database.txt");
static string[] database = File.ReadAllLines(datapath);
I am using this as an array of names to look for when looking through a specific directory. I am doing so with a foreach method.
System.IO.DirectoryInfo di = new DirectoryInfo("C:\ExampleDirectory");
foreach (DirectoryInfo dir in di.GetDirectories())
{
}
Is there a way to see if any of the names in the file "database.txt" match any names of directories found within "C:\ExampleDirectory"?
The only way I can think of doing this is:
System.IO.DirectoryInfo di = new DirectoryInfo(versionspath);
foreach (DirectoryInfo dir in di.GetDirectories())
{
if(dir.Name == //Something...) {
Console.WriteLine("Match found!");
break;}
}
But this obviously won't work, and I cannot think of any other way to do this. Any help would be greatly appreciated!
Based on your other questions on stackoverflow, I presume your question is a homework or you are a passionate hobby programmer, am I right? So I'll try to explain the principle here continuing your almost complete solution.
You will need a nested loop here, a loop in a loop. In the outer loop you iterate through the directories. You already got this one. For each directory you need to loop through the names in database to see if any item in it matches the name of the directory:
System.IO.DirectoryInfo di = new DirectoryInfo(versionspath);
foreach (DirectoryInfo dir in di.GetDirectories())
{
foreach (string name in database)
{
if (dir.Name == name)
{
Console.WriteLine("Match found!");
break;
}
}
}
Depending on your goal, you might want to exit at the first matching directory. The sample code above doesn't. The single break; statement only exits the inner loop, not the outer one. So it continues to check the next directory. Try to figure it out yourself how to stop at the first match (by exiting the outer loop).
As usual, LINQ is the way to go. Whenever you have to find matches or not-matches between two lists and both lists containing different types, you'll have to use .Join() or .GroupJoin().
The .Join() comes into play, if you need to find a 1:1 relationship and the .GroupJoin() for any kind of 1-to relationship (1:0, 1:many or also 1:1).
So, if you need the directories that match your list, this sounds for a job to the .Join() operator:
public static void Main(string[] args)
{
// Where ever this comes normally from.
string[] database = new[] { "fOo", "bAr" };
string startDirectory = #"D:\baseFolder";
// A method that returns an IEnumerable<string>
// Using maybe a recursive approach to get all directories and/or files
var candidates = LoadCandidates(startDirectory);
var matches = database.Join(
candidates,
// Simply pick the database entry as is.
dbEntry => dbEntry,
// Only take the last portion of the given path.
fullPath => Path.GetFileName(fullPath),
// Return only the full path from the given matching pair.
(dbEntry, fullPath) => fullPath,
// Ignore case on comparison.
StringComparer.OrdinalIgnoreCase);
foreach (var match in matches)
{
// Shows "D:\baseFolder\foo"
Console.WriteLine(match);
}
Console.ReadKey();
}
private static IEnumerable<string> LoadCandidates(string baseFolder)
{
return new[] { #"D:\baseFolder\foo", #"D:\basefolder\baz" };
//return Directory.EnumerateDirectories(baseFolder, "*", SearchOption.AllDirectories);
}
You can use LINQ to do this
var allDirectoryNames = di.GetDirectories().Select(d => d.Name);
var matches = allDirectoryNames.Intersect(database);
if (matches.Any())
Console.WriteLine("Matches found!");
In the first line we get all the directory names, then we use the Intersect() method to see which ones are present in both allDirectoryNames and database

Counting Files Extensions C#

Hi i'm a c# begginer and i'd like to do a simple program which is going to go through a folder and count how many files are .mp3 files and how many are .flac .
Like I said the program is very basic. It will ask for the music folder path and will then go through it. I know there will be a lot of subfolders in that main music folder so it will have to open them one at the time and go through them too.
E.g
C:/Music/
will be the given directory.
But it doesn't contain any music in itself.
To get to the music files the program would have to open subfolders like
C:/Music/Electronic/deadmau5/RandomAlbumTitle/
Only then he can count the .mp3 files and .flac files and store them in two separated counters.
The program will have to do that for at least 2000 folders.
Do you know a good way or method to go through files and return its name (and extension)?
You can use System.IO.DirectoryInfo. DirectoryInfo provides a GetFiles method, which also has a recursive option, so if you're not worried about speed, you can do this:
DirectoryInfo di = new DirectoryInfo(#"C:\Music");
int numMP3 = di.GetFiles("*.mp3", SearchOption.AllDirectories).Length;
int numFLAC = di.GetFiles("*.flac", SearchOption.AllDirectories).Length;
Use DirectoryInfo and a grouping by the file extension:
var di = new DirectoryInfo(#"C:/Music/");
var extensionCounts = di.EnumerateFiles("*.*", SearchOption.AllDirectories)
.GroupBy(x => x.Extension)
.Select(g => new { Extension = g.Key, Count = g.Count() })
.ToList();
foreach (var group in extensionCounts)
{
Console.WriteLine("There are {0} files with extension {1}", group.Count,
group.Extension);
}
C# has a built in method of searching for files in all sub-directories. Make sure you add a using statement for System.IO
var path = "C:/Music/"
var files = Directory.GetFiles(path, "*.mp3", SearchOption.AllDirectories);
var count = files.Length;
Since you're a beginner you should hold off on the more flexible LINQ method until later.
int fileCount = Directory.GetFiles(_Path, "*.*", SearchOption.TopDirectoryOnly).Length
Duplicate question How to read File names recursively from subfolder using LINQ
Jon Skeet answered there with
You don't need to use LINQ to do this - it's built into the framework:
string[] files = Directory.GetFiles(directory, "*.dll",
SearchOption.AllDirectories);
or if you're using .NET 4:
IEnumerable<string> files = Directory.EnumerateFiles(directory, "*.dll",
SearchOption.AllDirectories);
To be honest, LINQ isn't great in terms of recursion. You'd probably want to write your own general-purpose recursive extension method. Given how often this sort of question is asked, I should really do that myself some time...
Here is MSDN support page, How to recursively search directories by Visual C#
Taken directly from that page:
void DirSearch(string sDir)
{
try
{
foreach (string d in Directory.GetDirectories(sDir))
{
foreach (string f in Directory.GetFiles(d, txtFile.Text))
{
lstFilesFound.Items.Add(f);
}
DirSearch(d);
}
}
catch (System.Exception excpt)
{
Console.WriteLine(excpt.Message);
}
}
You can use this code in addition to creating FileInfo objects. Once you have the file info objects you can check the Extension property to see if it matches the ones you care about.
MSDN has lots of information and examples, for example how you can iterate through a directory: http://msdn.microsoft.com/en-us/library/bb513869.aspx

Iterate through files on a certain level

I'm trying to iterate through all the files on a certain level in the folder hierachy, more specifically, in all the sub-sub folders. Before I do actual operations on the files, I also want to count all the files to be able to show a progress bar. This means the iterating method must be called 2 times. This is the relevant code, I'm using now:
Iterate(bool count)
{
foreach (string dir in Directory.GetDirectories(root))
foreach (string subdir in Directory.GetDirectories(dir))
foreach (string file in Directory.GetFiles(subdir))
{
if (count) progressBar.Maximum++;
else
{
//do operations
}
}
}
I'm wondering if there's a better way of doing this. Surely there must be a better way than adding a foreach for every folder level..?
It'd be easier to me to use LINQ here.
var files =
(from dir in Directory.GetDirectories(root)
from subdir in Directory.GetDirectories(dir)
from f in Directory.GetFiles(subdir)
select f).ToList();
var fileCount = files.Length;
foreach (var f in files) {
...
}
Before your GetFiles foreach, try this:
string [] fileEntries = Directory.GetFiles(subdir);
int intFileCount = fileEntries.length;
Or it can replace it, if the loop only serves to count the files.
The documentation of Directory.GetFiles shows how to recursively iterate a directory tree
you could download the fluent thing I wrote for System.IO (see here: http://blog.staticvoid.co.nz/2011/11/staticvoid-io-extentions-nuget.html) and then use this LINQ statement
var files = from d in di.Directories()
from dir in d.Directories()
from f in dir.Files()
select f;
write this instead your code
string [] files = Directory.GetFile("yourDirectory","*.*",SearchOptions.AllDirectories);
this will return all files in sub-directories instead of using recursion

Writing the F# recursive folder visitor in C# - seq vs IEnumerable

I often use this recursive 'visitor' in F#
let rec visitor dir filter=
seq { yield! Directory.GetFiles(dir, filter)
for subdir in Directory.GetDirectories(dir) do yield! visitor subdir filter}
Recently I've started working on implementing some F# functionality in C#, and I'm trying to reproduce this as IEnumerable, but I'm having difficulty getting any further than this:
static IEnumerable<string> Visitor(string root, string filter)
{
foreach (var file in Directory.GetFiles(root, filter))
yield return file;
foreach (var subdir in Directory.GetDirectories(root))
foreach (var file in Visitor(subdir, filter))
yield return file;
}
What I don't understand is why I have to do a double foreach in the C# version for the recursion, but not in F#... Does the seq {} implicitly do a 'concat'?
yield! does a 'flatten' operation, so it integrates the sequence you passed it into the outer sequence, implicitly performing a foreach over each element of the sequence and yield on each one.
There is no simple way to do this.
You could workaround this by defining a C# type that can store either one value or a sequence of values - using the F# notation it would be:
type EnumerationResult<'a> =
| One of 'a
| Seq of seq<'a>
(translate this to C# in any way you like :-))
Now, you could write something like:
static IEnumerable<EnumerationResult<string>> Visitor
(string root, string filter) {
foreach (var file in Directory.GetFiles(root, filter))
yield return EnumerationResult.One(file);
foreach (var subdir in Directory.GetDirectories(root))
yield return EnumerationResult.Seq(Visitor(subdir, filter))
}
}
To use it, you'd have to write a function that flattens EnumerationResult, which could be an extension method in C# with the following signature:
IEnumerable<T> Flatten(this IEnumerable<EnumerationResult<T>> res);
Now, this is a part where it gets tricky - if you implemented this in a straighforward way, it would still contain "forach" to iterate over the nested "Seq" results. However, I believe that you could write an optimized version that wouldn't have quadratic complexity.
Ok.. I guess this is a topic for a blog post rather than something that could be fully described here :-), but hopefully, it shows an idea that you can try following!
[EDIT: But of course, you can also use naive implementation of "Flatten" that would use "SelectMany" just to make the syntax of your C# iterator code nicer]
In the specific case of retrieving all files under a specific directory, this overload of Directory.GetFiles works best:
static IEnumerable<string> Visitor( string root, string filter ) {
return Directory.GetFiles( root, filter, SearchOption.AllDirectories );
}
In the general case of traversing a tree of enumerable objects, a nested foreach loop or equivalent is required (see also: All About Iterators).
Edit: Added an example of a function to flatten any tree into an enumeration:
static IEnumerable<T> Flatten<T>( T item, Func<T, IEnumerable<T>> next ) {
yield return item;
foreach( T child in next( item ) )
foreach( T flattenedChild in Flatten( child, next ) )
yield return flattenedChild;
}
This can be used to select all nested files, as before:
static IEnumerable<string> Visitor( string root, string filter ) {
return Flatten( root, dir => Directory.GetDirectories( dir ) )
.SelectMany( dir => Directory.GetFiles( dir, filter ) );
}
In C#, I use the following code for this kind of function:
public static IEnumerable<DirectoryInfo> TryGetDirectories(this DirectoryInfo dir) {
return F.Swallow(() => dir.GetDirectories(), () => new DirectoryInfo[] { });
}
public static IEnumerable<DirectoryInfo> DescendantDirs(this DirectoryInfo dir) {
return Enumerable.Repeat(dir, 1).Concat(
from kid in dir.TryGetDirectories()
where (kid.Attributes & FileAttributes.ReparsePoint) == 0
from desc in kid.DescendantDirs()
select desc);
}
This addresses IO errors (which inevitably happen, unfortunately), and avoids infinite loops due to symbolic links (in particular, you'll run into that searching some dirs in windows 7).

Categories

Resources