Recursive collection search - c#

I have a collection (List<Element>) of objects as described below:
class Element
{
string Name;
string Value;
ICollection<Element> ChildCollection;
IDictionary<string, string> Attributes;
}
I build a List<Element> collection of Element objects based on some XML that I read in, this I am quite happy with. How to implement searching of these elements currently has me, not stumped, but wondering if there is a better solution.
The structure of the collection looks something like this:
- Element (A)
- Element (A1)
- Element (A1.1)
- Element (A2)
- Element (B)
- Element (B1)
- Element (B1.1)
- Element (B1.2)
- Element (C)
- Element (C1)
- Element (C2)
- Element (C3)
Currently I am using recursion to search the Attributes dictionary of each top level (A, B, C) Element for a particular KeyValuePair. If I do not find it in the top level Element I start searching its ChildElement collection (1, 1.1, 2, 2.1, n, etc.) in the same manner.
What I am curious about is if there is a better method of implementing a search on these objects or if recursion is the better answer in this instance, if I should implement the search as I am currently, top -> child -> child -> etc. or if I should search in some other manner such as all top levels first?
Could I, and would it be reasonable to use the TPL to search each top level (A, B, C) in parallel?

Recursion is one way of implementing a tree search where you visit elements in depth-first order. You can implement the same algorithm with a loop instead of recursion by using a stack data structure to store the nodes of your tree that you need to visit.
If you use the same algorithm with a queue instead of a stack, the search would proceed in breath-first order.
In both cases the general algorithm looks like this:
var nodes = ... // some collection of nodes
nodes.Add(root);
while (nodes.Count != 0) {
var current = nodes.Remove ... // Take the current node from the collection.
foreach (var child in current.ChildCollection) {
nodes.Add(child);
}
// Process the current node
if (current.Attributes ...) {
...
}
}
Note that the algorithm is not recursive: it uses an explicit collection of nodes to save the current state of the search, whereas a recursive implementation uses the call stack for the same purpose. If nodes is a Stack<Element>, the search proceeds in depth-first order; if nodes is a Queue<Element>, the search proceeds in breadth-first order.

I grabbed this bit from SO somewhere, Its not mine but I cant provide a link to it. This class Flattens out a treeview for a recursive search, looks like it should do the same for you.
public static class SOExtension
{
public static IEnumerable<TreeNode> FlattenTree(this TreeView tv)
{
return FlattenTree(tv.Nodes);
}
public static IEnumerable<TreeNode> FlattenTree(this TreeNodeCollection coll)
{
return coll.Cast<TreeNode>()
.Concat(coll.Cast<TreeNode>()
.SelectMany(x => FlattenTree(x.Nodes)));
}
}
I found the link I got this from - its very easy to use. have a look. Is there a method for searching for TreeNode.Text field in TreeView.Nodes collection?

You can re-use existing components designed specifically for traversing in different ways, such as NETFx IEnumerable.Traverse Extension Method. It allows you to depth or breadth first. It lets you traverse an enumerable tree, depth or breadth first.
Example to get a flattened enumerable of directories:
IEnumerable<DirectoryInfo> directories = ... ;
IEnumerable<DirectoryInfo> allDirsFlattened = directories.Traverse(TraverseKind.BreadthFirst, dir => dir.EnumerateDirectories());
foreach (DirectoryInfo directoryInfo in allDirsFlattened)
{
...
}
For BreadhFirst it uses Queue<T> internally and for DepthFirst it uses Stack<T> internally.
It is not traversing nodes parallell and unless the traversal is resource demanding it isn't appropriate to use parallellism at this level. But that depends on the context.

Related

Optimization of sorting of subsequent child nodes under the tree like hierarchy

I have a root node which has 3 child nodes. Each of the child nodes has its own child nodes further. These child nodes are maintained as Lists. I want to sort the entire tree from top to bottom. For e.g.: First sort the 3 child nodes(according to their names) and then sort the child nodes under them. I also know the height of the tree here which is 6.
This is not a data structure tree per say but I am calling it a tree because of the kind of structure it has. I have implemented a sorting mechanism and its working fine but I am afraid this is not the most optimized way to achieve the sorting.
//sort the child nodes under root
GlobalProperties.Tree.ChildNodes.Sort((x,y) => x.Name.CompareTo(y.Name));
foreach (var node in GlobalProperties.Tree.ChildNodes)
{
//sorting at second level
node.ChildNodes.Sort((x, y) => x.Name.CompareTo(y.Name));
foreach (var secNode in node.ChildNodes)
{
//sorting at third level
secNode.ChildNodes.Sort((x, y) => x.Name.CompareTo(y.Name));
foreach (var terNode in secNode.ChildNodes)
{
//sorting at fourth level
terNode.ChildNodes.Sort((x, y) => x.Name.CompareTo(y.Name));
foreach (var fourthNode in terNode.ChildNodes)
{
//sorting at fifth level
fourthNode.ChildNodes.Sort((x, y) => x.Name.CompareTo(y.Name));
}
}
}
}
Is there any better way to achieve the same functionality?
I would go for an extension method in this case -
public static class XmlExtensions{
public static void SortChildren(tihs XmlNode node){
node.ChildNodes.Sort((x, y) => x.Name.CompareTo(y.Name));
node.ChildNodes.ForEach(x => x.SortChildren());
}
}
Then use it as this -
GlobalProperties.Tree.SortChildren();
Assuming that all your nodes have a common base type.
You could supply your base type with a sort function.
private void Sort()
{
this.ChildNodes.Sort((x, y) => x.Name.CompareTo(y.Name)); // your sort
foreach (var node in this.ChildNodes)
{
node.Sort(); // this calls the Sort function for all children
}
}
node.Sort() is a recursive call. It gets called for all children. And then all children will again call it for all of their children until you are at the bottom of your tree.
I normally don't like to get distracted by optimizing code unless performance is noticeably slow, as long as your code is clean, readable and correct, I wouldn't worry about it. Using a recursive function as indicated in the previous answer would make it easier to read, so I would definitely consider that.

Most appropriate way to construct a File and Directory class in order to easily filter results when placing them on a tree

I am creating a program that cursively finds all the files and directories in the specified path. So one node may have other nodes if that node happens to be a directory.
Here is my Node class:
class Node
{
public List<Node> Children = new List<Node>(); // if node is directory then children will be the files and directories in this direcotry
public FileSystemInfo Value { get; set; } // can eather be a FileInfo or DirectoryInfo
public bool IsDirectory
{
get{ return Value is DirectoryInfo;}
}
public long Size // HERE IS WHERE I AM HAVING PROBLEMS! I NEED TO RETRIEVE THE
{ // SIZE OF DIRECTORIES AS WELL AS FOR FILES.
get
{
long sum = 0;
if (Value is FileInfo)
sum += ((FileInfo)Value).Length;
else
sum += Children.Sum(x => x.Size);
return sum;
}
}
// this is the method I use to filter results in the tree
public Node Search(Func<Node, bool> predicate)
{
// if node is a leaf
if(this.Children.Count==0)
{
if (predicate(this))
return this;
else
return null;
}
else // Otherwise if node is not a leaf
{
var results = Children.Select(i => i.Search(predicate)).Where(i => i != null).ToList();
if (results.Any()) // THIS IS HOW REMOVE AND RECUNSTRUCT THE TREE WITH A FILTER
{
var result = (Node)MemberwiseClone();
result.Children = results;
return result;
}
return null;
}
}
}
and thanks to that node class I am able to display the tree as:
In one column I display the name of the directory or file and on the right the size. The size is formated as currency just because the commas help visualize it more clearly.
So now my problem is The reason why I have this program was to perform some advance searches. So I may only want to search for files that have the ".txt" extension for example. If I perform that filter on my tree I will get:
(note that I compile the text to a function that takes a Node and returns a bool and I pass that method to the Search method on my Node class in order to filter results. More information on how to dynamically compile code can be found at: http://www.codeproject.com/Articles/10324/Compiling-code-during-runtime) Anyways that has nothing to do with this question. The important part was that I removed all the nodes that did not matched that criteria and because I removed those nodes now the sizes of the directories changed!!!
So my question is how will I be able to filter results maintaining the real size of the directory. I guess I will have to remove the property Size and replace it with a field. The problem with that is that every time I add to the tree I will have to update the size of all the parent directories and that gets complex. Before starting coding it that way I will appreciate your opinion on how I should start implementing the class.
Since you're using recursion and your weight is a node-level property you can't expect that will continue to sum even after you remove the node. You either promote it to a upper level (collection) or use an external counter within the recursion (which counts but not depending on filter, you'll need to carry this through the recuersion).
Anyway, why are you implementing a core .NET functionality again? any reason beyond filtering or recursive search? both are pretty well implemented in the BCL.

What collection to store a tree structure?

I want to store an organisation chart in a collection. I think a tree data structure will be best suited to my needs, as I need to add multiple nodes to one node.
LinkedList only provides adding one node to another node, if I understand it correctly.
I have looked at C5 treeset collection, but it doesn't seem to have Add() method to add more than 2 nodes to one node.
I have also looked at Treeview class from Windows Forms library, but I do not want to add Windows forms dll to my project, since I am building a service layer application. (or is it fine?)
I do not want to write my own tree collection class, if there is already one provided by 3rd party?
Any suggestion please?
Thanks
Something like this can be a starting point. By using generics this one can hold a tree of anything
class TreeNode<T>
{
List<TreeNode<T>> Children = new List<TreeNode<T>>();
T Item {get;set;}
public TreeNode (T item)
{
Item = item;
}
public TreeNode<T> AddChild(T item)
{
TreeNode<T> nodeItem = new TreeNode<T>(item);
Children.Add(nodeItem);
return nodeItem;
}
}
A sample which holds a tree of strings
string root = "root";
TreeNode<string> myTreeRoot = new TreeNode<string>(root);
var first = myTreeRoot.AddChild("first child");
var second = myTreeRoot.AddChild("second child");
var grandChild = first.AddChild("first child's child");

C# - XML - Removing the nodes without using recursion

I have the following recursive method which takes the an XHTML document and marks nodes based on certain conditions and It is called like below for a number of HTML contents:-
XmlDocument document = new XmlDocument();
document.LoadXml(xmlAsString);
PrepNodesForDeletion(document.DocumentElement, document.DocumentElement);
The method definition is below
/// <summary>
/// Recursive function to identify and mark all unnecessary nodes so that they can be removed from the document.
/// </summary>
/// <param name="nodeToCompareAgainst">The node that we are recursively comparing all of its descendant nodes against</param>
/// <param name="nodeInQuestion">The node whose children we are comparing against the "nodeToCompareAgainst" node</param>
static void PrepNodesForDeletion(XmlNode nodeToCompareAgainst, XmlNode nodeInQuestion)
{
if (infinityIndex++ > 100000)
{
throw;
}
foreach (XmlNode childNode in nodeInQuestion.ChildNodes)
{
// make sure we compare all of the childNodes descendants to the nodeToCompareAgainst
PrepNodesForDeletion(nodeToCompareAgainst, childNode);
if (AreNamesSame(nodeToCompareAgainst, childNode) && AllAttributesPresent(nodeToCompareAgainst, childNode))
{
// the function AnyAttributesWithDifferingValues assumes that all attributes are present between the two nodes
if (AnyAttributesWithDifferingValues(nodeToCompareAgainst, childNode) && InnerTextIsSame(nodeToCompareAgainst, childNode))
{
MarkNodeForDeletion(nodeToCompareAgainst);
}
else if (!AnyAttributesWithDifferingValues(nodeToCompareAgainst, childNode))
{
MarkNodeForDeletion(childNode);
}
}
// make sure we compare all of the childNodes descendants to the childNode
PrepNodesForDeletion(childNode, childNode);
}
}
And then the following method which would delete the marked node:-
static void RemoveMarkedNodes(XmlDocument document)
{
// in order for us to make sure we remove everything we meant to remove, we need to do this in a while loop
// for instance, if the original xml is = <a><a><b><a/></b></a><a/></a>
// this should result in the xml being passed into this function as:
// <a><b><a DeleteNode="TRUE" /></b><a DeleteNode="TRUE"><b><a DeleteNode="TRUE" /></b></a><a DeleteNode="TRUE" /></a>
// then this function (without the while) will not delete the last <a/>, even though it is marked for deletion
// if we incorporate a while loop, then we can insure all nodes marked for deletion are removed
// TODO: understand the reason for this -- see http://groups.google.com/group/microsoft.public.dotnet.xml/browse_thread/thread/25df058a4efb5698/7dd0a8b71739216c?lnk=st&q=xmlnode+removechild+recursive&rnum=2&hl=en#7dd0a8b71739216c
XmlNodeList nodesToDelete = document.SelectNodes("//*[#DeleteNode='TRUE']");
while (nodesToDelete.Count > 0)
{
foreach (XmlNode nodeToDelete in nodesToDelete)
{
nodeToDelete.ParentNode.RemoveChild(nodeToDelete);
}
nodesToDelete = document.SelectNodes("//*[#DeleteNode='TRUE']");
}
}
When I use the PrepNodesForDeletion method without the infinityIndex counter, I get OutOfMemoryException for few HTML contents. However, If I use infinityIndex counter, It may not be deleting nodes for some HTML contents.
Could anybody suggest any way to remove recursion. Also I am not familiar with the HtmlAgility pack. So, If this can be done using that, could somebody provide some code sample.
Well, if I understand your algorithm correctly you want to do this:
For each node in the tree compare it against all its child nodes in a non-recursive fashion, correct?
// walk the tree in DFS
public void XmlTreeWalk(XmlNode root, Action<XmlNode, XmlNode> action)
{
var nodesToCompare = new Stack<XmlNode>();
foreach (XmlNode child in root.ChildNodes)
{
nodesToCompare.Push(child);
}
while (nodesToCompare.Count > 0)
{
var top = nodesToCompare.Pop();
action(root, top);
foreach (XmlNode child in top.ChildNodes)
{
nodesToCompare.Push(child);
}
}
}
// for each node: prepare all its children for deletion
public void PrepareForDeletion(XmlNode root)
{
XmlTreeWalk(root, (r, c) => PrepareSubtreeForDeletion(r, c));
}
// for each node, compare all its children against the toCompare node
private void PrepareSubtreeForDeletion(XmlNode toCompare, XmlNode root)
{
XmlTreeWalk(root, (unused, current) => MarkNodeForDeletion(toCompare, current));
}
// your delete logic
public void MarkNodeForDeletion(XmlNode toCompare, XmlNode toCompareAgains)
{
...
}
What this should do is: Walk the tree top to bottom and for each node walk the subtree of that node comparing all children against this node.
I haven't tested it so it might contain bugs but the idea should be clear. Apparently this algorithm is O(n^2).
To remove recursion, the childs and parents must know about each other.
Then you can traverse say down the right leg from the root parent, until you reach the right most bottom leg.
And then from there, go up one, then down left one, and then down right until bottom. Repeat up one, down left, and then right as far as possible, etc. until you have looped over the entire tree structure.
I'm not sure on what you are attempting to do, to suggest how to use this method on your problem.
Your problem is that you have badly formed XML and as a direct result your DOM is a mess. What I think you are going to have to do is to use a SAX parser (which must exist for .net) and implement the logic to fix the DOM yourself which appears to be what you re trying to do.
This method isn't recursive but is going to require you to do some work that you didn't realize that you needed to do.
also note that you are getting an out of memory exception and not a stack overflow exception which reinforces the idea that too much recursion is not your problem per se.

Enumerating Collections that are not inherently IEnumerable?

When you want to recursively enumerate a hierarchical object, selecting some elements based on some criteria, there are numerous examples of techniques like "flattening" and then filtering using Linq : like those found here :
link text
But, when you are enumerating something like the Controls collection of a Form, or the Nodes collection of a TreeView, I have been unable to use these types of techniques because they seem to require an argument (to the extension method) which is an IEnumerable collection : passing in SomeForm.Controls does not compile.
The most useful thing I found was this :
link text
Which does give you an extension method for Control.ControlCollection with an IEnumerable result you can then use with Linq.
I've modified the above example to parse the Nodes of a TreeView with no problem.
public static IEnumerable<TreeNode> GetNodesRecursively(this TreeNodeCollection nodeCollection)
{
foreach (TreeNode theNode in nodeCollection)
{
yield return theNode;
if (theNode.Nodes.Count > 0)
{
foreach (TreeNode subNode in theNode.Nodes.GetNodesRecursively())
{
yield return subNode;
}
}
}
}
This is the kind of code I'm writing now using the extension method :
var theNodes = treeView1.Nodes.GetNodesRecursively();
var filteredNodes =
(
from n in theNodes
where n.Text.Contains("1")
select n
).ToList();
And I think there may be a more elegant way to do this where the constraint(s) are passed in.
What I want to know if it is possible to define such procedures generically, so that : at run-time I can pass in the type of collection, as well as the actual collection, to a generic parameter, so the code is independent of whether it's a TreeNodeCollection or Controls.Collection.
It would also interest me to know if there's any other way (cheaper ? fastser ?) than that shown in the second link (above) to get a TreeNodeCollection or Control.ControlCollection in a form usable by Linq.
A comment by Leppie about 'SelectMany in the SO post linked to first (above) seems like a clue.
My experiments with SelectMany have been : well, call them "disasters." :)
Appreciate any pointers. I have spent several hours reading every SO post I could find that touched on these areas, and rambling my way into such exotica as the "y-combinator." A "humbling" experience, I might add :)
This code should do the trick
public static class Extensions
{
public static IEnumerable<T> GetRecursively<T>(this IEnumerable collection,
Func<T, IEnumerable> selector)
{
foreach (var item in collection.OfType<T>())
{
yield return item;
IEnumerable<T> children = selector(item).GetRecursively(selector);
foreach (var child in children)
{
yield return child;
}
}
}
}
Here's an example of how to use it
TreeView view = new TreeView();
// ...
IEnumerable<TreeNode> nodes = view.Nodes.
.GetRecursively<TreeNode>(item => item.Nodes);
Update: In response to Eric Lippert's post.
Here's a much improved version using the technique discussed in All About Iterators.
public static class Extensions
{
public static IEnumerable<T> GetItems<T>(this IEnumerable collection,
Func<T, IEnumerable> selector)
{
Stack<IEnumerable<T>> stack = new Stack<IEnumerable<T>>();
stack.Push(collection.OfType<T>());
while (stack.Count > 0)
{
IEnumerable<T> items = stack.Pop();
foreach (var item in items)
{
yield return item;
IEnumerable<T> children = selector(item).OfType<T>();
stack.Push(children);
}
}
}
}
I did a simple performance test using the following benchmarking technique. The results speak for themselves. The depth of the tree has only marginal impact on the performance of the second solution; whereas the performance decreases rapidly for the first solution, eventually leadning to a StackOverflowException when the depth of the tree becomes too great.
You seem to be on the right track and the answers above have some good ideas. But I note that all these recursive solutions have some deep flaws.
Let's suppose the tree in question has a total of n nodes with a max tree depth of d <= n.
First off, they consume system stack space in the depth of the tree. If the tree structure is very deep, then this can blow the stack and crash the program. Tree depth d is O(lg n), depending on the branching factor of the tree. Worse case is no branching at all -- just a linked list -- in which case a tree with only a few hundred nodes will blow the stack.
Second, what you're doing here is building an iterator that calls an iterator that calls an iterator ... so that every MoveNext() on the top iterator actually does a chain of calls that is again O(d) in cost. If you do this on every node, then the total cost in calls is O(nd) which is worst case O(n^2) and best case O(n lg n). You can do better than both; there's no reason why this cannot be linear in time.
The trick is to stop using the small, fragile system stack to keep track of what to do next, and to start using a heap-allocated stack to explicitly keep track.
You should add to your reading list Wes Dyer's article on this:
https://blogs.msdn.microsoft.com/wesdyer/2007/03/23/all-about-iterators/
He gives some good techniques at the end for writing recursive iterators.
I'm not sure about TreeNodes, but you can make the Controls collection of a form IEnumerable by using System.Linq and, for example
var ts = (from t in this.Controls.OfType<TextBox>
where t.Name.Contains("fish")
select t);
//Will get all the textboxes whose Names contain "fish"
Sorry to say I don't know how to make this recursive, off the top of my head.
Based on mrydengren's solution:
public static IEnumerable<T> GetRecursively<T>(this IEnumerable collection,
Func<T, IEnumerable> selector,
Func<T, bool> predicate)
{
foreach (var item in collection.OfType<T>())
{
if(!predicate(item)) continue;
yield return item;
IEnumerable<T> children = selector(item).GetRecursively(selector, predicate);
foreach (var child in children)
{
yield return child;
}
}
}
var theNodes = treeView1.Nodes.GetRecursively<TreeNode>(
x => x.Nodes,
n => n.Text.Contains("1")).ToList();
Edit: for BillW
I guess you are asking for something like this.
public static IEnumerable<T> <T,TCollection> GetNodesRecursively(this TCollection nodeCollection, Func<T, TCollection> getSub)
where T, TCollection: IEnumerable
{
foreach (var theNode in )
{
yield return theNode;
foreach (var subNode in GetNodesRecursively(theNode, getSub))
{
yield return subNode;
}
}
}
var all_control = GetNodesRecursively(control, c=>c.Controls).ToList();

Categories

Resources