Traversing arbitrarily large binary tree inorder - c#

I'm stuck at finding a solution. C#, .NET 4.0, VS2010
I can easily write a recursive one, but can't for the life of me figure out something that won't overflow the stack if the tree is arbitrarily large.
This is a binary tree question, and i am trying to write a
public IEnumerable<T> Values()
method.
Here is the full code in case you are interested: http://pastebin.com/xr2f3y7g
Obviously, the version currently in there doesn't work. I probably should mention that I am a newbie in C#, transitioning from C++.

Here is a method for inorder traversal, that uses explicit stack. The stack is created on the heap, so it can be much larger, than the stack the processor uses.
public IEnumerable<T> Values()
{
Stack<Node> stack = new Stack<Node>();
Node current = this.root;
while(current != null)
{
while(current.leftChild != null)
{
stack.Push(current);
current = current.leftChild;
}
yield return current.data;
while(current.rightChild == null && stack.Count > 0)
{
current = stack.Pop();
yield return current.data;
}
current = current.rightChild;
}
}
If you can't use a stack and your nodes happen to have parent pointers, you can try solutions from this question

Assuming the tree is anywhere near balanced, its maximum depth is log2(n), so you'd need a huge tree to overflow the stack.
You can transform any recursive algorithm into an iterative one, but in this case, it will likely require either backward pointers or an explicit stack, both of which look expensive.
Having said that, recursion is typically not so great in .NET because any local variables in calling instances of a method cannot be GC'ed until the stack gets unwound after the terminating condition. I don't know whether the JIT will automatically optimize tail-end recursion to make it iterative, but that would help.

Related

Will these variables be garbage-collected?

I was practicing my coding chops today and solving the "remove all elements of a certain value from a linked list" problem today. The solution I came up with was
public void RemoveAll ( T val )
{
if(_root == null)
return;
if(_root.Value == val)
{
_root = _root.Next;
RemoveAll(val);
}
Node last = _root,
cur = _root.Next;
while(cur != null)
{
if(cur.Value == val)
last.Next = cur.Next;
else
last = cur;
cur = cur.Next;
}
}
and here's my question:
When cur.Value == val I'm doing something like changing the list from
A -> B -> C
to
A -> C
Will the compiler or run-time environment see that B is no longer in use and dispose of it? Or should I do that explicitely?
I have a second question which is whether a call stack blows up for recursive void methods. As you see here, there is a chance of the method calling itself. But since it's a method that doesn't return a value, can't the run-time environment just wipe the data about the last call? There is no reason for it to remain in memory (right?).
Will the compiler or run-time environment see that B is no longer in use and dispose of it? Or should I do that explicitely?
GC, when it runs, will realize there is no active references to that object and clean it up (assuming nobody else holds a reference to that object). You can't manually clean a single object in .NET. In .NET memory is managed and cleaned by Garbage Collector as needed.
I have a second question which is whether a call stack blows up for recursive void methods. As you see here, there is a chance of the method calling itself. But since it's a method that doesn't return a value, can't the run-time environment just wipe the data about the last call? There is no reason for it to remain in memory (right?).
You're describing tail recursion. C# compiler will not generate tail-recursive calls. Becaus eof that it's possible you're going to run into StackOverflowException if your recursion is too deep.
That limitation is not a CLR limitation - .NET Framework does support tail calls. It's C# compiler which doesn't emit tail IL opcode. You can get Tail Recursion working in .NET Framework when generating IL by hand or when using F#, which generates tail calls whenever appropriate.
See https://stackoverflow.com/a/15865150/1163867 for more details.
PS. I think your code has a bug. Looks like you should return early after recursive call into RemoveAll:
if(_root.Value == val)
{
_root = _root.Next;
RemoveAll(val);
return;
}

Recursive Approach versus Stack for Depth First Search

I have a method as below which searches a collection and evaluates a condition recursively:
public static bool Recurse(this INodeViewModel node, Func<INodeViewModel,bool> predicate)
{
INodeViewModel currentNode = node;
return predicate(currentNode) || node.Children.Select(x => Recurse(x, predicate)).Any(found => found);
}
Alternatively this can be implemented using a stack to avoid recursion as below:
public static bool UsingStack(this INodeViewModel node, Func<INodeViewModel, bool> predicate)
{
var stack = new Stack<INodeViewModel>();
stack.Push(node);
while(stack.Any())
{
var current = stack.Pop();
if (predicate(current))
return true;
foreach (var child in current.Children)
{
stack.Push(child);
}
}
return false;
}
My question is, does the stack version offer any performance benefits when the depth of the tree is large compared to the recursive version?
My question is, does the stack version offer any performance benefits when the depth of the tree is large compared to the recursive version?
Yes. The recursive version is infinitely slower than the iterative version when the depth of the tree is large. That's because the recursive version will blow the call stack, cause an unstoppable out-of-stack-space exception, and terminate your program before the bool is returned. The iterative version will not do that until heap space is exhausted, and heap space is potentially thousands of times larger than stack space.
Not giving a result at all is obviously worse performance than giving a result in any finite amount of time.
If however your question really is "does the stack version offer any benefit when the tree is deep, but not so deep that it blows the stack" then the answer is:
You've already written the program both ways. Run it and find out. Don't show random strangers on the internet pictures of two horses and ask which is faster; race them and then you'll know.
Also: I would be inclined to solve your problem by writing methods that do traversals and yield each element. If you can write methods IEnumerable<INode> BreadthFirstTraversal(this INode node) and IEnumerable<INode> DepthFirstTraversal(this INode node) then you don't need to be writing your own search; you can just say node.DepthFirstTraversal().Where(predicate).FirstOrDefault() when you want to search.
Let's make this clear first: Recursion is not for speed. Anything it does can be done at least as fast, and often faster, with iteration. Recursion's benefits come in the clarity of the code.
With that said, unless you absolutely need the fastest possible code (and frankly, you almost never do), the second (data-recursive) version isn't even worth considering, as it adds complexity for no good reason. It's especially worthless in C#, as each Stack operation involves a method call, and eliminating recursion is mostly about getting rid of the method calls. You're almost certainly adding work, forcing method calls for stuff that the runtime could handle far more efficiently with the built-in stack.
Eric makes a reasonable point about stack overflows, but in order for that to be an issue, you'd need a tree thousands of nodes deep, or you'd have to be searching from an already deep call stack, or the predicate would need to be recursive itself (possibly by triggering other searches). With an even slightly balanced tree and a predicate that doesn't cause more recursion, stack depth should not be an issue; the default stack is already large enough to handle quite a bit of recursion, and can be made bigger if needed.
With all that said, though: I'm guessing, as are you, as is everyone who hasn't actually implemented and tested both versions. If you care that much, time it.
The second version has several advantages:
You can easily switch from DFS to BFS by using a Queue instead of a Stack.
If depth is too large, it will throw an OutOfMemoryException which can be handled. (I believe a StackOverflowException is automatically rethrown).
Performance and memory usage might be better, because the recursive approach save all local variables (including compiler generated) on the call stack.

Can a stack overflow happen for any other reason that recursion?

I'm getting a stack overflow exception for a segment of code that doesn't seem to be able to produce a stackoverflow... It looks like this:
public String WriteToFile(XmlDocument pDoc, String pPath)
{
string source = "";
string seq = "";
string sourcenet = "";
XmlNodelist sourceNode = pDoc.GetElementsByTagName(XmlUtils.Nodes.Source);
source = sourceNode.Item(0).InnerText;
XmlNodelist sqList= pDoc.GetElementsByTagName(XmlUtils.Nodes.Seq);
seq = sqList.Item(0).InnerText;
XmlNodelist sourceNets = pDoc.GetElementsByTagName(XmlUtils.Nodes.SourceNets);
sourcenet = sourceNets.Item(0).InnerText;
string fileName = Folders.GetMyFileName(source, seq, sourcenet);
string fullPath = Path.Combine(pPath, fileName);
pDoc.Save(pFullPathFile); <--- Stackoverflow is raised here
return pFullPathFile;
}
There are no recursive calls, if you examine the call stack it has a depth of 2 before going to "external code" (which I'm guessing is not that external but part of the framework that starts the thread, which has debugging turn off).
¿Is there anyway the exception can be risen because anything other than a recursive call? It does ALWAYS fails in the pDoc.Save method call... and pDoc isn't actually that big... more like 32KB of data...
A stack overflow exception can occur any time that the stack exceeds it's maximum size. This is mostly commonly done with by ...
Having a deeply nested stack which is not recursive. Think of event storms where event A leads to event B which leads to event C all of which have handlers that deeply grow the stack.
Having a shallow stack which occurs after some large stack allocations
Stack overflow simply means you have exhausted the stack, it doesn't need to be caused by recursion. Of course, because recursion utilizes the stack, it is often the cause of a stack overflow exception, but it doesn't need to be.
That being said, with the information you provided, it doesn't sound like there should be anything causing a stack overflow in the code you provided.
Threads in C# have a 1MB stack by default, but you can create a new thread with a smaller stack. Do you create threads yourself in this program, and are you setting the stack size ?
Also, have a look at the external code section (right click where it says External Code in the Call Stack window, choose "Show external code"). See if something looks wrong, is the framework for some reason going through a lot of method calls to do the save ?
There is indeed a recursive call.
pDoc.Save() calls WriteTo(XmlWriter w) on the document, which calls WriteContentTo(XmlWriter w).
This then calls WriteTo(XmlWriter w) on the all the nodes at the root level, which will contain one element node (possibly also some comments, whitespace, processing instructions, a document declarataion...).
On that element, this will cause it to write its tag ('<', element name, and then any attributes) followed by calling WriteContentTo(XmlWriter w) which calls WriteTo(XmlWriter w) on every child element, which calls WriteContentTo(XmlWriter w), and so on and so on.
Hence this is indeed recursive in how each element calls the same method on its child elements, and with a sufficiently deep document on sufficiently small stack space (default is 1MB on most applications, but 256KB on ASP.NET), you'll have a stack overflow.
For the record, you can also have a stack overflow without recursion as long as you burn through your stack space one way or another. stackalloc is a great way to find yourself doing this while only a few calls deep.
If you're in trouble due to this recursion, then remember that the implementation of WriteTo is essentially (manually inlining WriteContentTo into it):
w.WriteStartElement(this.Prefix, this.LocalName, this.NamespaceURI);
if (this.HasAttributes)
{
XmlAttributeCollection attributes = this.Attributes;
for (int i = 0; i < attributes.Count; i++)
{
attributes[i].WriteTo(w);
}
}
if (this.IsEmpty)
{
w.WriteEndElement();
}
else
{
for (XmlNode node = this.FirstChild; node != null; node = node.NextSibling)
{
node.WriteTo(w);
}
w.WriteFullEndElement();
}
Replace this with an iterative version, and you won't overflow the stack. Of course if you've somehow managed to put the document into a condition where it's got an element that's an ancestor to itself (does XmlDocument protect against that? I don't know off the top of my head), then it'll turn a stack-overflow into an infinite loop, which if anything is worse.
In some languages/runtimes a stack overflow can happen because of large memory allocations that are unrelated to the call stack itself. It's entirely possible that the 'external code' (I assume the framework) is running either into that situation or has actually a classic recursion overflow problem you can't see because you can't necessarily debug into it.

How can I write this method so that it is eligible for tail recursion optimization?

Does someone know of an algorithm to make a simple recursion to a tail one?
More specificaly, how would you apply the algorithm to the following code?
namespace Testing
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine(match("?**", "aaa"));
Console.WriteLine(match("*#*?", "aa1$a1a1"));
Console.WriteLine(match("*#*", "aa11"));
Console.WriteLine(match("??*", "0110"));
Console.WriteLine(match("", "abc"));
Console.WriteLine(match("???", ""));
Console.ReadLine();
}
public static bool match(string p, string s)
{
if (p.Length == 0)
return true;
if (p.Length > s.Length)
return false;
bool firstLetterMatches = false;
char nextCharInStr = s[0];
switch (p[0])
{
case '*':
firstLetterMatches = 'a'<= nextCharInStr && nextCharInStr <= 'z';
break;
case '#':
firstLetterMatches = '0'<= nextCharInStr && nextCharInStr <= '9';
break;
case '?':
firstLetterMatches = ('a'<= nextCharInStr && nextCharInStr <= 'z') ||
('0'<= nextCharInStr && nextCharInStr <= '9');
break;
default:
return false;
}
return match(p,s.Substring(1)) ||
(firstLetterMatches && match(p.Substring(1),s.Substring(1)));
}
}
}
Thanks!
I'm assuming that you're asking because you actually have a real-world problem with blowing the stack. It looks like you're doing a string manipulation here recursing on one-smaller substrings. This is potentially extremely inefficient and dangerous. Strings can easily be so long that the recursive algorithm blows the stack, and because strings are immutable but not persistent, you're creating a new string every time you call substring; that's going to create minimum O(n^2) bytes of strings that have to be copied around.
(Also, it looks like you're doing some sort of longest-matching-subsequence pattern; as mquander points out, it is likely that some sort of memoization strategy will help with the time complexity; it often does with this sort of problem.)
To solve the string allocation problem you can pass around instead of a string, a string and the index that is to be treated as the beginning of the string. Now you're merely incrementing an integer, rather than allocating and copying a string.
In order to solve your recursion problem, there are a number of techniques you can use. I wrote a series of articles about various ways to turn simple tree-recursive algorithms into algorithms that consume heap instead of call stack space. Some of them are in JScript, but the ideas are easily translatable to C#.
Finally, in C# 5 we will be introducing an "await" keyword which causes the compiler to do a continuation passing style transformation on the program. The intention of this keyword is to make asynchronous programming easier, but a side effect of it is that it makes stackless programming much easier too. If you're interested, download the Community Technology Preview that we've released, and you can automatically transform your program into one that consumes no stack.
OK, so, the articles on turning recursive algorithms into algorithms that consume heap, not stack, start here:
http://blogs.msdn.com/b/ericlippert/archive/2005/07/27/recursion-part-one-recursive-data-structures-and-functions.aspx
All my articles on continuation passing style are here: (start from the bottom)
http://blogs.msdn.com/b/ericlippert/archive/tags/continuation+passing+style/
And the ones on the asynchrony are here: (again, start from the bottom)
http://blogs.msdn.com/b/ericlippert/archive/tags/async/
Sort of. You can make any recursive algorithm tail-recursive, awkwardly, by converting it into continuation-passing style. The effect is just to take the call stack and pass it around explicitly. But that won't give you the benefit you're probably thinking of, which is to be able to discard prior state after recursive calls to save space. You're just putting the state somewhere else.
The real question might be: Can you change any recursive algorithm to require only constant space, potentially by way of using tail recursion? The answer, of course, is maybe. Typically, recursive functions that use tree recursion (where recursive calls branch into multiple deeper recursive calls) might be hard to transform this way. Your algorithm fits this description.
(I initially suggested memoizing match or using DP for this problem, which would speed it up, but I guess that wouldn't actually save you space. Oh well.)
The simple way is to using a while(true) loop. Example:
public static bool match(string p, string s)
{
while (true)
{
// normal code
...
// tail call handling
// instead of return match(x, y)
var t1 = x; // need to use temps for evaluation of x
var t2 = y; // same here
p = t1;
s = t2;
continue;
}
}
This process is commonly known as TCE (or tail call elimination).
Update:
I missed the || at the end. You cannot convert this, as no calls to match are in a tail position. || requires the result to be evaluated before returning. The method could be perhaps rewritten to avoid that.
C# doesn't current support tail recursion
See this related question on the matter
Also check out this msdn article on a similar technique called trampolining

How do I avoid changing the Stack Size AND avoid getting a Stack Overflow in C#

I've been trying to find an answer to this question for a few hours now on the web and on this site, and I'm not quite there.
I understand that .NET allocates 1MB to apps, and that it's best to avoid stack overflow by recoding instead of forcing stack size.
I'm working on a "shortest path" app that works great up to about 3000 nodes, at which point it overflows. Here's the method that causes problems:
public void findShortestPath(int current, int end, int currentCost)
{
if (!weight.ContainsKey(current))
{
weight.Add(current, currentCost);
}
Node currentNode = graph[current];
var sortedEdges = (from entry in currentNode.edges orderby entry.Value ascending select entry);
foreach (KeyValuePair<int, int> nextNode in sortedEdges)
{
if (!visited.ContainsKey(nextNode.Key) || !visited[nextNode.Key])
{
int nextNodeCost = currentCost + nextNode.Value;
if (!weight.ContainsKey(nextNode.Key))
{
weight.Add(nextNode.Key, nextNodeCost);
}
else if (weight[nextNode.Key] > nextNodeCost)
{
weight[nextNode.Key] = nextNodeCost;
}
}
}
visited.Add(current, true);
foreach (KeyValuePair<int, int> nextNode in sortedEdges)
{
if(!visited.ContainsKey(nextNode.Key) || !visited[nextNode.Key]){
findShortestPath(nextNode.Key, end, weight[nextNode.Key]);
}
}
}//findShortestPath
For reference, the Node class has one member:
public Dictionary<int, int> edges = new Dictionary<int, int>();
graph[] is:
private Dictionary<int, Node> graph = new Dictonary<int, Node>();
I've tried to opimize the code so that it isn't carrying any more baggage than needed from one iteration (recursion?) to the next, but with a 100K-Node graph with each node having between 1-9 edges it's going to hit that 1MB limit pretty quickly.
Anyway, I'm new to C# and code optimization, if anyone could give me some pointers (not like this) I would appreciate it.
The classic technique to avoid deep recursive stack dives is to simply avoid recursion by writing the algorithm iteratively and managing your own "stack" with an appropriate list data structure. Most likely you will need this approach here given the sheer size of your input set.
A while back I explored this problem in my blog. Or, rather, I explored a related problem: how do you find the depth of a binary tree without using recursion? A recursive tree depth solution is trivial, but blows the stack if the tree is highly imbalanced.
My recommendation is to study ways of solving this simpler problem, and then decide which of them, if any, could be adapted to your slightly more complex algorithm.
Note that in these articles the examples are given entirely in JScript. However, it should not be difficult to adapt them to C#.
Here we start by defining the problem.
http://blogs.msdn.com/ericlippert/archive/2005/07/27/recursion-part-one-recursive-data-structures-and-functions.aspx
The first attempt at a solution is the classic technique that you'll probably adopt: define an explicit stack; use it rather than relying upon the operating system and compiler implementing the stack for you. This is what most people do when faced with this problem.
http://blogs.msdn.com/ericlippert/archive/2005/08/01/recursion-part-two-unrolling-a-recursive-function-with-an-explicit-stack.aspx
The problem with that solution is that it's a bit of a mess. We can go even farther than simply making our own stack. We can make our own little domain-specific virtual machine that has its own heap-allocated stack, and then solve the problem by writing a program that targets that machine! This is actually easier than it sounds; the operations of the machine can be extremely high level.
http://blogs.msdn.com/ericlippert/archive/2005/08/04/recursion-part-three-building-a-dispatch-engine.aspx
And finally, if you are really a glutton for punishment (or a compiler developer) you can rewrite your program in Continuation Passing Style, thereby eliminating the need for a stack at all:
http://blogs.msdn.com/ericlippert/archive/2005/08/08/recursion-part-four-continuation-passing-style.aspx
http://blogs.msdn.com/ericlippert/archive/2005/08/11/recursion-part-five-more-on-cps.aspx
http://blogs.msdn.com/ericlippert/archive/2005/08/15/recursion-part-six-making-cps-work.aspx
CPS is a particularly clever way of moving the implicit stack data structure off the system stack and onto the heap by encoding it in the relationships between a bunch of delegates.
Here are all of my articles on recursion:
http://blogs.msdn.com/ericlippert/archive/tags/Recursion/default.aspx
You could convert the code to use a 'work queue' rather than being recursive. Something along the following pseudocode:
Queue<Task> work;
while( work.Count != 0 )
{
Task t = work.Dequeue();
... whatever
foreach(Task more in t.MoreTasks)
work.Enqueue(more);
}
I know that is cryptic but it's the basic concept of what you'll need to do. Since your only getting 3000 nodes with your current code, you will at best get to 12~15k without any parameters. So you need to kill the recursion completely.
Is your Node a struct or a class? If it's the former, make it a class so that it's allocated on the heap instead of on the stack.
I would first verify that you are actually overflowing the stack: you actually see a StackOverflowException get thrown by the runtime.
If this is indeed the case, you have a few options:
Modify your recursive function so that the .NET runtime can convert it into a tail-recursive function.
Modify your recursive function so that it is iterative and uses a custom data structure rather than the managed stack.
Option 1 is not always possible, and assumes that the rules the CLR uses to generate tail recursive calls will remain stable in the future. The primary benefit, is that when possible, tail recursion is actually a convenient way of writing recursive algorithms without sacrificing clarity.
Option 2 is a more work, but is not sensitive to the implementation of the CLR and can be implemented for any recursive algorithm (where tail recursion may not always be possible). Generally, you need to capture and pass state information between iterations of some loop, together with information on how to "unroll" the data structure that takes the places of the stack (typically a List<> or Stack<>). One way of unrolling recursion into iteration is through continuation passing pattern.
More resources on C# tail recursion:
Why doesn't .NET/C# optimize for tail-call recursion?
http://geekswithblogs.net/jwhitehorn/archive/2007/06/06/113060.aspx
I would first make sure I know why I'm getting a stack overflow. Is it actually because of the recursion? The recursive method isn't putting much onto the stack. Maybe it's because of the storage of the nodes?
Also, BTW, I don't see the end parameter ever changing. That suggests it doesn't need to be a parameter, carried on each stack frame.

Categories

Resources