Does creating new Processes help me for Traversing a big tree?

Does creating new Processes help me for Traversing a big tree? - c#

Let's think of it as a family tree, a father has kids, those kids have kids, those kids have kids, etc...
So I have a recursive function that gets the father uses Recursion to get the children and for now just print them to debug output window...But at some point ( after one hour of letting it run and printing like 26000 rows) it gives me a StackOverFlowException.
So Am really running out of memory? hmmm? then shouldn't I get an "Out of memory exception"? on other posts I found people were saying if the number of recursive calls are too much, you might still get a SOF exception...
Anyway, my first thought was to break the tree into smaller sub-strees..so I know for a fact that my root father always has these five kids, so Instead of Calling my method one time with root passed to it, I said ok call it five times with Kids of root Passes to it.. It helped I think..but still one of them is so big - 26000 rows when it crashes - and still have this issue.
How about Application Domains and Creating new Processes at run time at some certain level of depth? Does that help?
How about creating my own Stack and using that instead of recursive methods? does that help?
here is also a high-level of my code, please take a look, maybe there is actually something silly wrong with this that causes SOF error:
private void MyLoadMethod(string conceptCKI)
{
// make some script calls to DB, so that moTargetConceptList2 will have Concept-Relations for the current node.
// when this is zero, it means its a leaf.
int numberofKids = moTargetConceptList2.ConceptReltns.Count();
if (numberofKids == 0)
return;
for (int i = 1; i <= numberofKids; i++)
{
oUCMRConceptReltn = moTargetConceptList2.ConceptReltns.get_ItemByIndex(i, false);
//Get the concept linked to the relation concept
if (oUCMRConceptReltn.SourceCKI == sConceptCKI)
{
oConcept = moTargetConceptList2.ItemByKeyConceptCKI(oUCMRConceptReltn.TargetCKI, false);
}
else
{
oConcept = moTargetConceptList2.ItemByKeyConceptCKI(oUCMRConceptReltn.SourceCKI, false);
}
//builder.AppendLine("\t" + oConcept.PrimaryCTerm.SourceString);
Debug.WriteLine(oConcept.PrimaryCTerm.SourceString);
MyLoadMethod(oConcept.ConceptCKI);
}
}

How about creating my own Stack and using that instead of recursive methods? does that help?
Yes!
When you instantiate a Stack<T> this will live on the heap and can grow arbitrarily large (until you run out of addressable memory).
If you use recursion you use the call stack. The call stack is much smaller than the heap. The default is 1 MB of call stack space per thread. Note this can be changed, but it's not advisable.

StackOverflowException is quite different to OutOfMemoryException.
OOME means that there is no memory available to the process at all. This could be upon trying to create a new thread with a new stack, or in trying to create a new object on the heap (and a few other cases).
SOE means that the thread's stack - by default 1M, though it can be set differently in thread creation or if the executable has a different default; hence ASP.NET threads have 256k as a default rather than 1M - was exhausted. This could be upon calling a method, or allocating a local.
When you call a function (method or property), the arguments of the call are placed on the stack, the address the function should return to when it returns are put on the stack, then execution jumps to the function called. Then some locals will be placed on the stack. Some more may be placed on it as the function continues to execute. stackalloc will also explicitly use some stack space where otherwise heap allocation would be used.
Then it calls another function, and the same happens again. Then that function returns, and execution jumps back to the stored return address, and the pointer within the stack moves back up (no need to clean up the values placed on the stack, they're just ignored now) and that space is available again.
If you use up that 1M of space, you get a StackOverflowException. Because 1M (or even 256k) is a large amount of memory for these such use (we don't put really large objects in the stack) the three things that are likely to cause an SOE are:
Someone thought it would be a good idea to optimise by using stackalloc when it wasn't, and they used up that 1M fast.
Someone thought it would be a good idea to optimise by creating a thread with a smaller than usual stack when it wasn't, and they use up that tiny stack.
A recursive (whether directly or through several steps) call falls into an infinite loop.
It wasn't quite infinite, but it was large enough.
You've got case 4. 1 and 2 are quite rare (and you need to be quite deliberate to risk them). Case 3 is by far the most common, and indicates a bug in that the recursion shouldn't be infinite, but a mistake means it is.
Ironically, in this case you should be glad you took the recursive approach rather than iterative - the SOE reveals the bug and where it is, while with an iterative approach you'd probably have an infinite loop bringing everything to a halt, and that can be harder to find.
Now for case 4, we've got two options. In the very very rare cases where we've got just slightly too many calls, we can run it on a thread with a larger stack. This doesn't apply to you.
Instead, you need to change from a recursive approach to an iterative one. Most of the time, this isn't very hard thought it can be fiddly. Instead of calling itself again, the method uses a loop. For example, consider the classic teaching-example of a factorial method:
private static int Fac(int n)
{
return n <= 1 ? 1 : n * Fac(n - 1);
}
Instead of using recursion we loop in the same method:
private static int Fac(int n)
{
int ret = 1;
for(int i = 1; i <= n, ++i)
ret *= i;
return ret;
}
You can see why there's less stack space here. The iterative version will also be faster 99% of the time. Now, imagine we accidentally call Fac(n) in the first, and leave out the ++i in the second - the equivalent bug in each, and it causes an SOE in the first and a program that never stops in the second.
For the sort of code you're talking about, where you keep producing more and more results as you go based on previous results, you can place the results you've got in a data-structure (Queue<T> and Stack<T> both serve well for a lot of cases) so the code becomes something like):
private void MyLoadMethod(string firstConceptCKI)
{
Queue<string> pendingItems = new Queue<string>();
pendingItems.Enqueue(firstConceptCKI);
while(pendingItems.Count != 0)
{
string conceptCKI = pendingItems.Dequeue();
// make some script calls to DB, so that moTargetConceptList2 will have Concept-Relations for the current node.
// when this is zero, it means its a leaf.
int numberofKids = moTargetConceptList2.ConceptReltns.Count();
for (int i = 1; i <= numberofKids; i++)
{
oUCMRConceptReltn = moTargetConceptList2.ConceptReltns.get_ItemByIndex(i, false);
//Get the concept linked to the relation concept
if (oUCMRConceptReltn.SourceCKI == sConceptCKI)
{
oConcept = moTargetConceptList2.ItemByKeyConceptCKI(oUCMRConceptReltn.TargetCKI, false);
}
else
{
oConcept = moTargetConceptList2.ItemByKeyConceptCKI(oUCMRConceptReltn.SourceCKI, false);
}
//builder.AppendLine("\t" + oConcept.PrimaryCTerm.SourceString);
Debug.WriteLine(oConcept.PrimaryCTerm.SourceString);
pendingItems.Enque(oConcept.ConceptCKI);
}
}
}
(I haven't completely checked this, just added the queuing instead of recursing to the code in your question).
This should then do more or less the same as your code, but iteratively. Hopefully that means it'll work. Note that there is a possible infinite loop in this code if the data you are retrieving has a loop. In that case this code will throw an exception when it fills the queue with far too much stuff to cope. You can either debug the source data, or use a HashSet to avoid enqueuing items that have already been processed.
Edit: Better add how to use a HashSet to catch duplicates. First set up a HashSet, this could just be:
HashSet<string> seen = new HashSet<string>();
Or if the strings are used case-insensitively, you'd be better with:
HashSet<string> seen = new HashSet<string>(StringComparison.InvariantCultureIgnoreCase) // or StringComparison.CurrentCultureIgnoreCase if that's closer to how the string is used in the rest of the code.
Then before you go to use the string (or perhaps before you go to add it to the queue, you have one of the following:
If duplicate strings shouldn't happen:
if(!seen.Add(conceptCKI))
throw new InvalidOperationException("Attempt to use \" + conceptCKI + "\" which was already seen.");
Or if duplicate strings are valid, and we just want to skip performing the second call:
if(!seen.Add(conceptCKI))
continue;//skip rest of loop, and move on to the next one.

I think you have a recursion's ring (infinite recursion), not a really stack overflow error. If you are got more memory for stack - you will get the overflow error too.
For test it:
Declare a global variable for storing a operable objects:
private Dictionary<int,object> _operableIds = new Dictionary<int,object>();
...
private void Start()
{
_operableIds.Clear();
Recurtion(start_id);
}
...
private void Recurtion(int object_id)
{
if(_operableIds.ContainsKey(object_id))
throw new Exception("Have a ring!");
else
_operableIds.Add(object_id, null/*or object*/);
...
Recurtion(other_id)
...
_operableIds.Remove(object_id);
}

Related

Loop or Recursive method? Which one is better to use?

I want to ask user to enter a value less than 10. I am using the following code. Which one is better to use? Loop or Recursive method. Someone said me using Recursive function method may cause Memory Leakage. Is it true?
class Program
{
static void Main(string[] args)
{
int x;
do
{
Console.WriteLine("Please Enther a value less than 10.");
x = int.Parse(Console.ReadLine());
} while (x > 10);
//Uncomment the bellow method and comment previous to test the Recursive method
//Value();
}
static string Value()
{
Console.WriteLine("Please Enther a value less than 10.");
return int.Parse(Console.ReadLine()) > 9 ? Value() : "";
}
}

It would probably be a long time before recursion became an issue in this example.
Recursive methods run the risk of causing stack overflow exceptions if they keep running for a long time without completing. This is because each method call results in data being stored on the stack (which has very limited space) - more info here:
https://en.wikipedia.org/wiki/Stack_overflow#:~:text=The%20most%2Dcommon%20cause%20of%20stack%20overflow%20is%20excessively%20deep%20or%20infinite%20recursion%2C%20in%20which%20a%20function%20calls%20itself%20so%20many%20times%20that%20the%20space%20needed%20to%20store%20the%20variables%20and%20information%20associated%20with%20each%20call%20is%20more%20than%20can%20fit%20on%20the%20stack.
In your case unless they enter a number greater than or equal to 10 loads of times or you have very little memory it should be fine.
Generally it's better to use loops than recursion as they are simpler to understand. Recursion is a useful tool for achieving good performance in certain scenarios but generally loops should be preferred.

Recursive functions are used when you have a base case and a regular case. The base case is vital, since it marks the end of recursion. For example, you can create a function factorial(n) to calculate the factorial of a number n. The base case happens when n reaches 1 and you just return 1, while in the regular case you just multiply n by factorial(n - 1).
In general (there are a few optimization cases in which you can save memory), recursive functions create a new stack frame for each call. So, for factorial(3), there are at least three stack frames being created, the factorial(3) itself, the factorial(2) one, and finally the one that finished recursions which would be factorial(1).
At least, in that case you know when you are going to finish, and how much memory you need, so a compiler can work with that in advance.
All that discussion above means that you are misunderstanding recursion if you think you can use it in order to validate user input. There can be only one call with the correct answer, which would be the base case, or even hundreds or thousands or regular case instances as well. This has the potential of overflowing the stack of your program, without any way to prevent that, on the part of the compiler or on your part.
Another way to see this is that recursion is used as a way of abstraction: you specify what needs to happen, even near to the mathematical problem, and not how it should happen. In your example, that level of abstraction is unneeded.

Recursive function causing an overflow despite it's not infinite

The following function I wrote causing the program to crash due to the stack overflow, although the recursion is finite.
public static void Key(char[] chars, int i, int l, string str) {
string newStr=null;
for(int j=0; j<l; j++)
newStr+=chars[i/(int)Math.Pow(68, j)%68];
if(newStr==str)
return;
Key(chars, ++i, l, newStr);
}
When I call the method with these parameters, all goes fine:
Key(chars, 0, 4, "aaaa");
But when it comes to bigger number of calls, it throws the StackOverflowException. So I assume the problem is that althogh the method is finite the call stack gets filled up before the the methods' job is done. So I have a few questions about that:
Why the functions don't get clear from the stack, they are no longer needed, they don't return any value.
And if so, is there a way i could clear the stack manually? I tried the StackTrace class but it's helpless in this case.

1) The function clears when it has ended execution. Calling Key inside of itself means that every call to it will be on the stack until the last call ends, at which stage they will all end in reverse order.
2) You can't clear the stack and carry on with the call.

The stack is still limited. For standard C# applications it is 1 MB. For ASP it is 256 KB. If you need more stack space than that, you'll see the exception.
If you create the thread yourself, you can adjust the stack size using this constructor.
Alternatively, you can rewrite your algorithm so it keeps track of state without using recursion.

It looks like the exit condition of NewStr == Str will never happen and eventually, you will run out of stack.

So, regardless of whether your code reaches its base case or not, your code should never get a stack overflow exception in production.
For example, this should give us a stack overflow right?
private static void Main(string[] args)
{
RecursorKey(0);
}
public static int RecursorKey(int val)
{
return RecursorKey(val ++);
}
In fact it doesn't, if you use .NET 4 and are not debugging and your binary is compiled as release.
This is because the clr is smart enough to do what is called tail recursion. Tail recursion isn't applicable everywhere, but in your case it is, and I was easily able to reproduce your exact problem. In your case, each time you call the function it pushes another stack frame onto the stack, so you get the overflow, even though the algorithm would theoretically end at some point.
To solve your issue, here is how you can enable optimizations.
However, I should point out that Jon Skeet recommends to not rely on tail call optimizations. Given that Jon's a smart guy, I'd listen to it him. If your code is going to encounter large stack depths, try rewriting it without recursion.

In C#, does copying a member variable to a local stack variable improve performance?

I quite often write code that copies member variables to a local stack variable in the belief that it will improve performance by removing the pointer dereference that has to take place whenever accessing member variables.
Is this valid?
For example
public class Manager {
private readonly Constraint[] mConstraints;
public void DoSomethingPossiblyFaster()
{
var constraints = mConstraints;
for (var i = 0; i < constraints.Length; i++)
{
var constraint = constraints[i];
// Do something with it
}
}
public void DoSomethingPossiblySlower()
{
for (var i = 0; i < mConstraints.Length; i++)
{
var constraint = mConstraints[i];
// Do something with it
}
}
}
My thinking is that DoSomethingPossiblyFaster is actually faster than DoSomethingPossiblySlower.
I know this is pretty much a micro optimisation, but it would be useful to have a definitive answer.
Edit
Just to add a little bit of background around this. Our application has to process a lot of data coming from telecom networks, and this method is likely to be called about 1 billion times a day for some of our servers. My view is that every little helps, and sometimes all I am trying to do is give the compiler a few hints.

Which is more readable? That should usually be your primary motivating factor. Do you even need to use a for loop instead of foreach?
As mConstraints is readonly I'd potentially expect the JIT compiler to do this for you - but really, what are you doing in the loop? The chances of this being significant are pretty small. I'd almost always pick the second approach simply for readability - and I'd prefer foreach where possible. Whether the JIT compiler optimizes this case will very much depend on the JIT itself - which may vary between versions, architectures, and even how large the method is or other factors. There can be no "definitive" answer here, as it's always possible that an alternative JIT will optimize differently.
If you think you're in a corner case where this really matters, you should benchmark it - thoroughly, with as realistic data as possible. Only then should you change your code away from the most readable form. If you're "quite often" writing code like this, it seems unlikely that you're doing yourself any favours.
Even if the readability difference is relatively small, I'd say it's still present and significant - whereas I'd certainly expect the performance difference to be negligible.

If the compiler/JIT isn't already doing this or a similar optimization for you (this is a big if), then DoSomethingPossiblyFaster should be faster than DoSomethingPossiblySlower. The best way to explain why is to look at a rough translation of the C# code to straight C.
When a non-static member function is called, a hidden pointer to this is passed into the function. You'd have roughly the following, ignoring virtual function dispatch since it's irrelevant to the question (or equivalently making Manager sealed for simplicity):
struct Manager {
Constraint* mConstraints;
int mLength;
}
void DoSomethingPossiblyFaster(Manager* this) {
Constraint* constraints = this->mConstraints;
int length = this->mLength;
for (int i = 0; i < length; i++)
{
Constraint constraint = constraints[i];
// Do something with it
}
}
void DoSomethingPossiblySlower()
{
for (int i = 0; i < this->mLength; i++)
{
Constraint constraint = (this->mConstraints)[i];
// Do something with it
}
}
The difference is that in DoSomethingPossiblyFaster, mConstraints lives on the stack and access only requires one layer of pointer indirection, since it's at a fixed offset from the stack pointer. In DoSomethingPossiblySlower, if the compiler misses the optimization opportunity, there's an extra pointer indirection. The compiler has to read a fixed offset from the stack pointer to access this and then read a fixed offset from this to get mConstraints.
There are two possible optimizations that could negate this hit:
The compiler could do exactly what you did manually and cache mConstraints on the stack.
The compiler could store this in a register so that it doesn't need to fetch it from the stack on every loop iteration before dereferencing it. This means that fetching mConstraints from this or from the stack is basically the same operation: A single dereference of a fixed offset from a pointer that's already in a register.

You know the response you will get, right? "Time it."
There is probably not a definitive answer. First, the compiler might do the optimization for you. Second, even if it doesn't, indirect addressing at the assembly level may not be significantly slower. Third, it depends on the cost of making the local copy, compared to the number of loop iterations. Then there are caching effects to consider.
I love to optimize, but this is one place I would definitely say wait until you have a problem, then experiment. This is a possible optimization that can be added when needed, not one of those optimizations that needs to be planned up front to avoid a massive ripple effect later.
Edit: (towards a definitive answer)
Compiling both functions in release mode and examining the IL with IL Dasm shows that in both places the "PossiblyFaster" function uses the local variable, it has one less instruction
ldloc.0 vs
ldarg.0; ldfld class Constraint[] Manager::mConstraints
Of course, this is still one level removed from the machine code - you don't know what the JIT compiler will do for you. But it is likely that "PossiblyFaster" is marginally faster.
However, I still don't recommend adding the extra variable until you are sure this function is the most expensive thing in your system.

I've profiled this and came up with a bunch of interesting results that are probably only valid for my specific example, but I thought would be worth while noting here.
The fastest is X86 release mode. That runs one iteration of my test in 7.1 seconds, whereas the equivalent X64 code takes 8.6 seconds. This was running 5 iterations, each iteration processing the loop 19.2 million times.
The fastest approach for the loop was:
foreach (var constraint in mConstraints)
{
... do stuff ...
}
The second fastest approach, which massively surprised me was the following
for (var i = 0; i < mConstraints.Length; i++)
{
var constraint = mConstraints[i];
... do stuff ...
}
I guess this was because mConstraints was stored in a register for the loop.
This slowed down when I removed the readonly option for mConstraints.
So, my summary from this is that being readable in this situation does give performance as well.

Simple threading question, locking access to shared resource or entire function?

This is a paraphrasing of a question I had before. It's a simple threading question but I can't seem to understand it.
If i have shared code:
private static object objSync = new object();
private int a = 0;
public void function()
{
lock(objSync)
a += 2;
lock(objSync)
a += 3;
Console.WriteLine(a.ToString());
a=0;
}
I can't expect 'a' to equal 5 in the end because the first thread gains a lock sets 'a' to 2 and then then the next thread can get a lock set it to '4' before the first thread is able to add 3 and you'll end up with 7 in the end.
The solution as I understand it would be to put a lock around the entire thing and then you could always expect 5. Now my question is what if between the two locks there is a a million lines of code. I can't imagine putting a lock around a million lines of code. How would you ensure thread safety but not take the fail to performance by putting a lock around a million and two lines of code?
EDIT:
This is nonsense code that I wrote for the question. The actual application is a line monitoring system. There are two screens that show the current line, one for the clerks and one for the public. The screen for the clerk accepts 'clicks' through the serial port which progresses the line by one and then updates the public screen through a notify event (see observer design pattern). The problem is they aren't synchronized if you spam the clicker. I imagine what's happening is the first screen adds to the line number, displays it and then updates the public screen, but before the public screen has a chance to show the line number from the database, the clerk clicks again and the number goes out of synch. The 'a' above represents a value I retrieve from the db and not just a simple int. There are a few places that I change the value and I need all of it to happen atomically.

It all depends on what you define as "success". If it is a counter, that could still be the right result. If you only want to update it if it is what you thought it was, then take a snapshot inside the first lock, and likewise compare to the snapshot in the final lock. In that scenario you can replace much of the code with some Interlocked.CompareExchange. Likewise in the "counter" scenario, Interlocked.Increment is your friend.
If the code must match, then you have a few options:
if the compare fails, run the entire million lines again; repeat until you win the race (by matching)
if the compare fails, throw an exception
block for the million lines; it might still be a fast million lines...
think of some other resolution strategy that makes sense for your scenario
In the first two you should watch for polluting things by leaving any related data in an intermediate state.
And btw; the final assignment to 0 should probably also be part of the same locking strategy. Locks only work if all access respects them.
But only your scenario can tell us what the "right" behaviour in a conflict is.

It is a matter of design actually. If your method calculates different values for "a", depending on certain circumstances, but you do not want another thread changing "a"'s value while an operation is active, something like this is more appropriate:
private static object objSync = new object();
private int a = 0;
public void function()
{
int b;
lock(objSync)
b = a;
b += 2;
//million lines of code
b +=3;
lock(objSync)
{
a = b;
Console.WriteLine(a.ToString());
a=0;
}
}

How do I avoid changing the Stack Size AND avoid getting a Stack Overflow in C#

I've been trying to find an answer to this question for a few hours now on the web and on this site, and I'm not quite there.
I understand that .NET allocates 1MB to apps, and that it's best to avoid stack overflow by recoding instead of forcing stack size.
I'm working on a "shortest path" app that works great up to about 3000 nodes, at which point it overflows. Here's the method that causes problems:
public void findShortestPath(int current, int end, int currentCost)
{
if (!weight.ContainsKey(current))
{
weight.Add(current, currentCost);
}
Node currentNode = graph[current];
var sortedEdges = (from entry in currentNode.edges orderby entry.Value ascending select entry);
foreach (KeyValuePair<int, int> nextNode in sortedEdges)
{
if (!visited.ContainsKey(nextNode.Key) || !visited[nextNode.Key])
{
int nextNodeCost = currentCost + nextNode.Value;
if (!weight.ContainsKey(nextNode.Key))
{
weight.Add(nextNode.Key, nextNodeCost);
}
else if (weight[nextNode.Key] > nextNodeCost)
{
weight[nextNode.Key] = nextNodeCost;
}
}
}
visited.Add(current, true);
foreach (KeyValuePair<int, int> nextNode in sortedEdges)
{
if(!visited.ContainsKey(nextNode.Key) || !visited[nextNode.Key]){
findShortestPath(nextNode.Key, end, weight[nextNode.Key]);
}
}
}//findShortestPath
For reference, the Node class has one member:
public Dictionary<int, int> edges = new Dictionary<int, int>();
graph[] is:
private Dictionary<int, Node> graph = new Dictonary<int, Node>();
I've tried to opimize the code so that it isn't carrying any more baggage than needed from one iteration (recursion?) to the next, but with a 100K-Node graph with each node having between 1-9 edges it's going to hit that 1MB limit pretty quickly.
Anyway, I'm new to C# and code optimization, if anyone could give me some pointers (not like this) I would appreciate it.

The classic technique to avoid deep recursive stack dives is to simply avoid recursion by writing the algorithm iteratively and managing your own "stack" with an appropriate list data structure. Most likely you will need this approach here given the sheer size of your input set.

A while back I explored this problem in my blog. Or, rather, I explored a related problem: how do you find the depth of a binary tree without using recursion? A recursive tree depth solution is trivial, but blows the stack if the tree is highly imbalanced.
My recommendation is to study ways of solving this simpler problem, and then decide which of them, if any, could be adapted to your slightly more complex algorithm.
Note that in these articles the examples are given entirely in JScript. However, it should not be difficult to adapt them to C#.
Here we start by defining the problem.
http://blogs.msdn.com/ericlippert/archive/2005/07/27/recursion-part-one-recursive-data-structures-and-functions.aspx
The first attempt at a solution is the classic technique that you'll probably adopt: define an explicit stack; use it rather than relying upon the operating system and compiler implementing the stack for you. This is what most people do when faced with this problem.
http://blogs.msdn.com/ericlippert/archive/2005/08/01/recursion-part-two-unrolling-a-recursive-function-with-an-explicit-stack.aspx
The problem with that solution is that it's a bit of a mess. We can go even farther than simply making our own stack. We can make our own little domain-specific virtual machine that has its own heap-allocated stack, and then solve the problem by writing a program that targets that machine! This is actually easier than it sounds; the operations of the machine can be extremely high level.
http://blogs.msdn.com/ericlippert/archive/2005/08/04/recursion-part-three-building-a-dispatch-engine.aspx
And finally, if you are really a glutton for punishment (or a compiler developer) you can rewrite your program in Continuation Passing Style, thereby eliminating the need for a stack at all:
http://blogs.msdn.com/ericlippert/archive/2005/08/08/recursion-part-four-continuation-passing-style.aspx
http://blogs.msdn.com/ericlippert/archive/2005/08/11/recursion-part-five-more-on-cps.aspx
http://blogs.msdn.com/ericlippert/archive/2005/08/15/recursion-part-six-making-cps-work.aspx
CPS is a particularly clever way of moving the implicit stack data structure off the system stack and onto the heap by encoding it in the relationships between a bunch of delegates.
Here are all of my articles on recursion:
http://blogs.msdn.com/ericlippert/archive/tags/Recursion/default.aspx

You could convert the code to use a 'work queue' rather than being recursive. Something along the following pseudocode:
Queue<Task> work;
while( work.Count != 0 )
{
Task t = work.Dequeue();
... whatever
foreach(Task more in t.MoreTasks)
work.Enqueue(more);
}
I know that is cryptic but it's the basic concept of what you'll need to do. Since your only getting 3000 nodes with your current code, you will at best get to 12~15k without any parameters. So you need to kill the recursion completely.

Is your Node a struct or a class? If it's the former, make it a class so that it's allocated on the heap instead of on the stack.

I would first verify that you are actually overflowing the stack: you actually see a StackOverflowException get thrown by the runtime.
If this is indeed the case, you have a few options:
Modify your recursive function so that the .NET runtime can convert it into a tail-recursive function.
Modify your recursive function so that it is iterative and uses a custom data structure rather than the managed stack.
Option 1 is not always possible, and assumes that the rules the CLR uses to generate tail recursive calls will remain stable in the future. The primary benefit, is that when possible, tail recursion is actually a convenient way of writing recursive algorithms without sacrificing clarity.
Option 2 is a more work, but is not sensitive to the implementation of the CLR and can be implemented for any recursive algorithm (where tail recursion may not always be possible). Generally, you need to capture and pass state information between iterations of some loop, together with information on how to "unroll" the data structure that takes the places of the stack (typically a List<> or Stack<>). One way of unrolling recursion into iteration is through continuation passing pattern.
More resources on C# tail recursion:
Why doesn't .NET/C# optimize for tail-call recursion?
http://geekswithblogs.net/jwhitehorn/archive/2007/06/06/113060.aspx

I would first make sure I know why I'm getting a stack overflow. Is it actually because of the recursion? The recursive method isn't putting much onto the stack. Maybe it's because of the storage of the nodes?
Also, BTW, I don't see the end parameter ever changing. That suggests it doesn't need to be a parameter, carried on each stack frame.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.