Writing the F# recursive folder visitor in C# - seq vs IEnumerable - c#

I often use this recursive 'visitor' in F#
let rec visitor dir filter=
seq { yield! Directory.GetFiles(dir, filter)
for subdir in Directory.GetDirectories(dir) do yield! visitor subdir filter}
Recently I've started working on implementing some F# functionality in C#, and I'm trying to reproduce this as IEnumerable, but I'm having difficulty getting any further than this:
static IEnumerable<string> Visitor(string root, string filter)
{
foreach (var file in Directory.GetFiles(root, filter))
yield return file;
foreach (var subdir in Directory.GetDirectories(root))
foreach (var file in Visitor(subdir, filter))
yield return file;
}
What I don't understand is why I have to do a double foreach in the C# version for the recursion, but not in F#... Does the seq {} implicitly do a 'concat'?

yield! does a 'flatten' operation, so it integrates the sequence you passed it into the outer sequence, implicitly performing a foreach over each element of the sequence and yield on each one.

There is no simple way to do this.
You could workaround this by defining a C# type that can store either one value or a sequence of values - using the F# notation it would be:
type EnumerationResult<'a> =
| One of 'a
| Seq of seq<'a>
(translate this to C# in any way you like :-))
Now, you could write something like:
static IEnumerable<EnumerationResult<string>> Visitor
(string root, string filter) {
foreach (var file in Directory.GetFiles(root, filter))
yield return EnumerationResult.One(file);
foreach (var subdir in Directory.GetDirectories(root))
yield return EnumerationResult.Seq(Visitor(subdir, filter))
}
}
To use it, you'd have to write a function that flattens EnumerationResult, which could be an extension method in C# with the following signature:
IEnumerable<T> Flatten(this IEnumerable<EnumerationResult<T>> res);
Now, this is a part where it gets tricky - if you implemented this in a straighforward way, it would still contain "forach" to iterate over the nested "Seq" results. However, I believe that you could write an optimized version that wouldn't have quadratic complexity.
Ok.. I guess this is a topic for a blog post rather than something that could be fully described here :-), but hopefully, it shows an idea that you can try following!
[EDIT: But of course, you can also use naive implementation of "Flatten" that would use "SelectMany" just to make the syntax of your C# iterator code nicer]

In the specific case of retrieving all files under a specific directory, this overload of Directory.GetFiles works best:
static IEnumerable<string> Visitor( string root, string filter ) {
return Directory.GetFiles( root, filter, SearchOption.AllDirectories );
}
In the general case of traversing a tree of enumerable objects, a nested foreach loop or equivalent is required (see also: All About Iterators).
Edit: Added an example of a function to flatten any tree into an enumeration:
static IEnumerable<T> Flatten<T>( T item, Func<T, IEnumerable<T>> next ) {
yield return item;
foreach( T child in next( item ) )
foreach( T flattenedChild in Flatten( child, next ) )
yield return flattenedChild;
}
This can be used to select all nested files, as before:
static IEnumerable<string> Visitor( string root, string filter ) {
return Flatten( root, dir => Directory.GetDirectories( dir ) )
.SelectMany( dir => Directory.GetFiles( dir, filter ) );
}

In C#, I use the following code for this kind of function:
public static IEnumerable<DirectoryInfo> TryGetDirectories(this DirectoryInfo dir) {
return F.Swallow(() => dir.GetDirectories(), () => new DirectoryInfo[] { });
}
public static IEnumerable<DirectoryInfo> DescendantDirs(this DirectoryInfo dir) {
return Enumerable.Repeat(dir, 1).Concat(
from kid in dir.TryGetDirectories()
where (kid.Attributes & FileAttributes.ReparsePoint) == 0
from desc in kid.DescendantDirs()
select desc);
}
This addresses IO errors (which inevitably happen, unfortunately), and avoids infinite loops due to symbolic links (in particular, you'll run into that searching some dirs in windows 7).

Related

Make EnumerateDirectory stop looking for subfolders if condition is met

I'm trying to find some directories on a network drive.
I use Directory.EnumerateDirectories for this.
The problem is that it takes very long because there are many subdirectories.
Is there a way to make the function stop searching further down into subdirectories if a match was found and carry on with the next directory on same level?
static readonly Regex RegexValidDir = new ("[0-9]{4,}\\.[0-9]+$");
var dirs = Directory.EnumerateDirectories(startDir, "*.*", SearchOption.AllDirectories)
.Where(x => RegexValidDir.IsMatch(x));
The directory structure looks like that
a\b\20220902.1\c\d\
a\b\20220902.2\c\d\e
a\b\x\20220902.3\
a\b\x\20221004.1\c\
a\b\x\20221004.2\c\
a\b\x\20221004.3\d\e\f\
...
a\v\w\x\20221104.1\c\d
a\v\w\x\20221105.1\c\d
a\v\w\x\20221106.1\c\d
a\v\w\x\20221106.2\c\d
a\v\w\x\20221106.3\c\d
a\v\w\x\20221106.4\
I'm interested in the directories with a date in the name only and want to stop searchin further down into the subdirectories of a matching dir.
Another thing is I don't know if the search pattern I'm supplying (.) is correct for my usage szenario.
The directories are found relatively quickly, but it then takes another 11 minutes to complete the search function
I don't think that it's possible to prune the enumeration efficiently with the built-in Directory.EnumerateDirectories method, in SearchOption.AllDirectories mode. My suggestion is to write a custom recursive iterator, that allows to select the children of each individual item:
static IEnumerable<T> Traverse<T>(IEnumerable<T> source,
Func<T, IEnumerable<T>> childrenSelector)
{
foreach (T item in source)
{
IEnumerable<T> children = childrenSelector(item);
yield return item;
if (children is null) continue;
foreach (T child in Traverse(children, childrenSelector))
yield return child;
}
}
Then for the directories that match the date pattern, you can just return null children, effectively stopping the recursion for those directories:
IEnumerable<string> query = Traverse(new[] { startDir }, path =>
{
if (RegexValidDir.IsMatch(path)) return null; // Stop recursion
return Directory.EnumerateDirectories(path);
}).Where(path => RegexValidDir.IsMatch(path));
This query is slightly inefficient because the RegexValidDir pattern is matched twice for each path (one in the childrenSelector and one in the predicate of the Where). In case you want to optimize it, you could modify the Traverse method by replacing the childrenSelector with a more complex lambda, that returns both the children and whether the item should be yielded by the iterator: Func<T, (IEnumerable<T>, bool)> lambda. Or alternatively use the Traverse as is, with the T being (string, bool) instead of string.

Finding all files in a folder using enumeration

I'm trying to list out all files under a given directory by taking sub directories as well into account.I'm using yield so that I could club this with Take where I call this (note that I'm using .NET 3.5).
Below is my code:
IEnumerable<string> Search(string sDir)
{
foreach (var file in Directory.GetFiles(sDir))
{
yield return file;
}
foreach (var directory in Directory.GetDirectories(sDir))
{
Search(directory);
}
}
I don't know what is going wrong here, but it only returns one file (which is the one under the root directory, and there is only one there as well). Can you please help?
You need to yield the results of the recursive search, otherwise you're just throwing its results away:
IEnumerable<string> Search(string sDir)
{
foreach (var file in Directory.GetFiles(sDir))
{
yield return file;
}
foreach (var directory in Directory.GetDirectories(sDir))
{
foreach(var file in Search(directory))
yield return file;
}
}
Note that if your intent is to simply get a flat list of every file, consider using Directory.GetFiles instead with the option to search all subdirectories. If your intent is to leverage LINQ (or other methods) to apply searching criteria or a limit to the total number of files retrieved, then this is a decent way to go as you'll read directories one at a time and stop once you've fulfilled your criterion.

Using LINQ with Action to delete old files

I would like to do something like
Action<FileInfo> deleter = f =>
{
if (....) // delete condition here
{
System.IO.File.Delete(f.FullName);
}
};
DirectoryInfo di = new DirectoryInfo(_path);
di.GetFiles("*.pdf").Select(deleter); // <= Does not compile!
di.GetFiles("*.txt").Select(deleter); // <= Does not compile!
di.GetFiles("*.dat").Select(deleter); // <= Does not compile!
in order to delete old files from a directory. But I do not know how to directly apply the delegate to the FilInfo[] without an explicit foreach (the idea listed above does not work of course).
Is it possible?
Select() is used to project items from TSource to TResult. In your case, you do not need Select because you're not projecting. Instead, use List<T>s ForEach method to delete files:
di.GetFiles("*.pdf").ToList().ForEach(deleter);
As DarkGray suggests you could, if somewhat unusually, utilise the Select to firstly action the file, and then return a null collection. I would recommend utilising the ForEach extension, like so:
ForEach LINQ Extension
public static void ForEach<TSource>(this IEnumerable<TSource> source, Action<T> action)
{
foreach(TSource item in source)
{
action(item);
}
}
You should then be able to execute the action on the array of FileInfo, as array is an enumerator. Like so:
Execution
Action<FileInfo> deleter = f =>
{
if (....) // delete condition here
{
System.IO.File.Delete(f.FullName);
}
};
DirectoryInfo di = new DirectoryInfo(_path);
di.GetFiles("*.pdf").ForEach(deleter);
Edit by Richard.
I do want to raise attention to the argument of foreach vs ForEach. In my opinion the ForEach statement should directly effect the object being passed in, and in this case it does. So I've contradicted myself. Oops! :)
di.GetFiles("*.pdf").Select(_=>{deleter(_);return null;});
or
di.GetFiles("*.pdf").ForEach(action);
public static class Hlp
{
static public void ForEach<T>(this IEnumerable<T> items, Action<T> action)
{
foreach (var item in items)
action(item);
}
}

Simple IEnumerator use (with example)

I am having trouble remembering how (but not why) to use IEnumerators in C#. I am used to Java with its wonderful documentation that explains everything to beginners quite nicely. So please, bear with me.
I have tried learning from other answers on these boards to no avail. Rather than ask a generic question that has already been asked before, I have a specific example that would clarify things for me.
Suppose I have a method that needs to be passed an IEnumerable<String> object. All the method needs to do is concatenate the letters roxxors to the end of every String in the iterator. It then will return this new iterator (of course the original IEnumerable object is left as it was).
How would I go about this? The answer here should help many with basic questions about these objects in addition to me, of course.
Here is the documentation on IEnumerator. They are used to get the values of lists, where the length is not necessarily known ahead of time (even though it could be). The word comes from enumerate, which means "to count off or name one by one".
IEnumerator and IEnumerator<T> is provided by all IEnumerable and IEnumerable<T> interfaces (the latter providing both) in .NET via GetEnumerator(). This is important because the foreach statement is designed to work directly with enumerators through those interface methods.
So for example:
IEnumerator enumerator = enumerable.GetEnumerator();
while (enumerator.MoveNext())
{
object item = enumerator.Current;
// Perform logic on the item
}
Becomes:
foreach(object item in enumerable)
{
// Perform logic on the item
}
As to your specific scenario, almost all collections in .NET implement IEnumerable. Because of that, you can do the following:
public IEnumerator Enumerate(IEnumerable enumerable)
{
// List implements IEnumerable, but could be any collection.
List<string> list = new List<string>();
foreach(string value in enumerable)
{
list.Add(value + "roxxors");
}
return list.GetEnumerator();
}
public IEnumerable<string> Appender(IEnumerable<string> strings)
{
List<string> myList = new List<string>();
foreach(string str in strings)
{
myList.Add(str + "roxxors");
}
return myList;
}
or
public IEnumerable<string> Appender(IEnumerable<string> strings)
{
foreach(string str in strings)
{
yield return str + "roxxors";
}
}
using the yield construct, or simply
var newCollection = strings.Select(str => str + "roxxors"); //(*)
or
var newCollection = from str in strings select str + "roxxors"; //(**)
where the two latter use LINQ and (**) is just syntactic sugar for (*).
If i understand you correctly then in c# the yield return compiler magic is all you need i think.
e.g.
IEnumerable<string> myMethod(IEnumerable<string> sequence)
{
foreach(string item in sequence)
{
yield return item + "roxxors";
}
}
I'd do something like:
private IEnumerable<string> DoWork(IEnumerable<string> data)
{
List<string> newData = new List<string>();
foreach(string item in data)
{
newData.Add(item + "roxxors");
}
return newData;
}
Simple stuff :)
Also you can use LINQ's Select Method:
var source = new[] { "Line 1", "Line 2" };
var result = source.Select(s => s + " roxxors");
Read more here Enumerable.Select Method

Handling common recursive functions

I've noticed that in my project, we frequently are writing recursive functions.
My question is: is there any way to create the recursive function as generic function for each hierarchy structure that is using the recursive iteration?
Maybe I can use a delegate that gets the root and the end flag of the recursion?
Any ideas?
Thanks.
My question is: is there any way to create the recursive function as generic function for each hierarchy structure that is using the recusive iteration?
may be i can use a delegate that gets the root and the end flag of the recursive?
Yes - The only thing you need is a delegate function that computes a list of children for each element. The function terminates when no children are returned.
delegate IEnumerable<TNode> ChildSelector<TNode>(TNode Root);
static IEnumerable<TNode> Traverse<TNode>(this TNode Root, ChildSelector<TNode> Children) {
// Visit current node (PreOrder)
yield return Root;
// Visit children
foreach (var Child in Children(Root))
foreach (var el in Traverse(Child, Children))
yield return el;
}
Example:
static void Main(string[] args) {
var Init = // Some path
var Data = Init.Traverse(Dir => Directory.GetDirectories(Dir, "*", SearchOption.TopDirectoryOnly));
foreach (var Dir in Data)
Console.WriteLine(Dir);
Console.ReadKey();
}
I think what you want is a way to work with hierarchical structures in a generic way ("generic" as defined in English, not necessarily as defined in .Net). For example, this is something I wrote once when I needed to get all the Controls inside a Windows Form:
public static IEnumerable<T> SelectManyRecursive<T>(this IEnumerable<T> items, Func<T, IEnumerable<T>> selector)
{
if (items == null)
throw new ArgumentNullException("items");
if (selector == null)
throw new ArgumentNullException("selector");
return SelectManyRecursiveInternal(items, selector);
}
private static IEnumerable<T> SelectManyRecursiveInternal<T>(this IEnumerable<T> items, Func<T, IEnumerable<T>> selector)
{
foreach (T item in items)
{
yield return item;
IEnumerable<T> subitems = selector(item);
if (subitems != null)
{
foreach (T subitem in subitems.SelectManyRecursive(selector))
yield return subitem;
}
}
}
// sample use, get Text from some TextBoxes in the form
var strings = form.Controls
.SelectManyRecursive(c => c.Controls) // all controls
.OfType<TextBox>() // filter by type
.Where(c => c.Text.StartWith("P")) // filter by text
.Select(c => c.Text);
Another example: a Category class where each Category could have ChildCategories (same way a Control has a Controls collection) and assuming that rootCategory is directly or indirectly the parent of all categories:
// get all categories that are enabled
var categories = from c in rootCategory.SelectManyRecursive(c => c.ChildCategories)
where c.Enabled
select c;
I'm not sure what exactly your question is asking for but a recursive function can be generic. There's no limitation on that. For instance:
int CountLinkedListNodes<T>(MyLinkedList<T> input) {
if (input == null) return 0;
return 1 + CountLinkedListNodes<T>(input.Next);
}
An easier and also generic approach might be to cache the results of the function and use the "real" function only when the result is known - the effectivness of this approach depends how frequently the same set of parameters is used during your recursion.
If you know Perl you should check the first 4 chapters of Higher-Order Perl which are available as a EBook, the ideas presented are language-independent.
It sounds like your solution can successfully use the Visitor Pattern.
You can create a specific variation of the Visitor Pattern by creating a hierarchical visitor pattern.
It is a little complex to discuss entirely here, but that should get you started into some research. The basic idea is that you have a class that knows how to traverse the structure, and then you have Visitor classes that know how to process a particular node. You can separate the traversal of the tree with the processing of nodes.

Categories

Resources