Will these variables be garbage-collected? - c#

I was practicing my coding chops today and solving the "remove all elements of a certain value from a linked list" problem today. The solution I came up with was
public void RemoveAll ( T val )
{
if(_root == null)
return;
if(_root.Value == val)
{
_root = _root.Next;
RemoveAll(val);
}
Node last = _root,
cur = _root.Next;
while(cur != null)
{
if(cur.Value == val)
last.Next = cur.Next;
else
last = cur;
cur = cur.Next;
}
}
and here's my question:
When cur.Value == val I'm doing something like changing the list from
A -> B -> C
to
A -> C
Will the compiler or run-time environment see that B is no longer in use and dispose of it? Or should I do that explicitely?
I have a second question which is whether a call stack blows up for recursive void methods. As you see here, there is a chance of the method calling itself. But since it's a method that doesn't return a value, can't the run-time environment just wipe the data about the last call? There is no reason for it to remain in memory (right?).

Will the compiler or run-time environment see that B is no longer in use and dispose of it? Or should I do that explicitely?
GC, when it runs, will realize there is no active references to that object and clean it up (assuming nobody else holds a reference to that object). You can't manually clean a single object in .NET. In .NET memory is managed and cleaned by Garbage Collector as needed.
I have a second question which is whether a call stack blows up for recursive void methods. As you see here, there is a chance of the method calling itself. But since it's a method that doesn't return a value, can't the run-time environment just wipe the data about the last call? There is no reason for it to remain in memory (right?).
You're describing tail recursion. C# compiler will not generate tail-recursive calls. Becaus eof that it's possible you're going to run into StackOverflowException if your recursion is too deep.
That limitation is not a CLR limitation - .NET Framework does support tail calls. It's C# compiler which doesn't emit tail IL opcode. You can get Tail Recursion working in .NET Framework when generating IL by hand or when using F#, which generates tail calls whenever appropriate.
See https://stackoverflow.com/a/15865150/1163867 for more details.
PS. I think your code has a bug. Looks like you should return early after recursive call into RemoveAll:
if(_root.Value == val)
{
_root = _root.Next;
RemoveAll(val);
return;
}

Related

Does creating new Processes help me for Traversing a big tree?

Let's think of it as a family tree, a father has kids, those kids have kids, those kids have kids, etc...
So I have a recursive function that gets the father uses Recursion to get the children and for now just print them to debug output window...But at some point ( after one hour of letting it run and printing like 26000 rows) it gives me a StackOverFlowException.
So Am really running out of memory? hmmm? then shouldn't I get an "Out of memory exception"? on other posts I found people were saying if the number of recursive calls are too much, you might still get a SOF exception...
Anyway, my first thought was to break the tree into smaller sub-strees..so I know for a fact that my root father always has these five kids, so Instead of Calling my method one time with root passed to it, I said ok call it five times with Kids of root Passes to it.. It helped I think..but still one of them is so big - 26000 rows when it crashes - and still have this issue.
How about Application Domains and Creating new Processes at run time at some certain level of depth? Does that help?
How about creating my own Stack and using that instead of recursive methods? does that help?
here is also a high-level of my code, please take a look, maybe there is actually something silly wrong with this that causes SOF error:
private void MyLoadMethod(string conceptCKI)
{
// make some script calls to DB, so that moTargetConceptList2 will have Concept-Relations for the current node.
// when this is zero, it means its a leaf.
int numberofKids = moTargetConceptList2.ConceptReltns.Count();
if (numberofKids == 0)
return;
for (int i = 1; i <= numberofKids; i++)
{
oUCMRConceptReltn = moTargetConceptList2.ConceptReltns.get_ItemByIndex(i, false);
//Get the concept linked to the relation concept
if (oUCMRConceptReltn.SourceCKI == sConceptCKI)
{
oConcept = moTargetConceptList2.ItemByKeyConceptCKI(oUCMRConceptReltn.TargetCKI, false);
}
else
{
oConcept = moTargetConceptList2.ItemByKeyConceptCKI(oUCMRConceptReltn.SourceCKI, false);
}
//builder.AppendLine("\t" + oConcept.PrimaryCTerm.SourceString);
Debug.WriteLine(oConcept.PrimaryCTerm.SourceString);
MyLoadMethod(oConcept.ConceptCKI);
}
}
How about creating my own Stack and using that instead of recursive methods? does that help?
Yes!
When you instantiate a Stack<T> this will live on the heap and can grow arbitrarily large (until you run out of addressable memory).
If you use recursion you use the call stack. The call stack is much smaller than the heap. The default is 1 MB of call stack space per thread. Note this can be changed, but it's not advisable.
StackOverflowException is quite different to OutOfMemoryException.
OOME means that there is no memory available to the process at all. This could be upon trying to create a new thread with a new stack, or in trying to create a new object on the heap (and a few other cases).
SOE means that the thread's stack - by default 1M, though it can be set differently in thread creation or if the executable has a different default; hence ASP.NET threads have 256k as a default rather than 1M - was exhausted. This could be upon calling a method, or allocating a local.
When you call a function (method or property), the arguments of the call are placed on the stack, the address the function should return to when it returns are put on the stack, then execution jumps to the function called. Then some locals will be placed on the stack. Some more may be placed on it as the function continues to execute. stackalloc will also explicitly use some stack space where otherwise heap allocation would be used.
Then it calls another function, and the same happens again. Then that function returns, and execution jumps back to the stored return address, and the pointer within the stack moves back up (no need to clean up the values placed on the stack, they're just ignored now) and that space is available again.
If you use up that 1M of space, you get a StackOverflowException. Because 1M (or even 256k) is a large amount of memory for these such use (we don't put really large objects in the stack) the three things that are likely to cause an SOE are:
Someone thought it would be a good idea to optimise by using stackalloc when it wasn't, and they used up that 1M fast.
Someone thought it would be a good idea to optimise by creating a thread with a smaller than usual stack when it wasn't, and they use up that tiny stack.
A recursive (whether directly or through several steps) call falls into an infinite loop.
It wasn't quite infinite, but it was large enough.
You've got case 4. 1 and 2 are quite rare (and you need to be quite deliberate to risk them). Case 3 is by far the most common, and indicates a bug in that the recursion shouldn't be infinite, but a mistake means it is.
Ironically, in this case you should be glad you took the recursive approach rather than iterative - the SOE reveals the bug and where it is, while with an iterative approach you'd probably have an infinite loop bringing everything to a halt, and that can be harder to find.
Now for case 4, we've got two options. In the very very rare cases where we've got just slightly too many calls, we can run it on a thread with a larger stack. This doesn't apply to you.
Instead, you need to change from a recursive approach to an iterative one. Most of the time, this isn't very hard thought it can be fiddly. Instead of calling itself again, the method uses a loop. For example, consider the classic teaching-example of a factorial method:
private static int Fac(int n)
{
return n <= 1 ? 1 : n * Fac(n - 1);
}
Instead of using recursion we loop in the same method:
private static int Fac(int n)
{
int ret = 1;
for(int i = 1; i <= n, ++i)
ret *= i;
return ret;
}
You can see why there's less stack space here. The iterative version will also be faster 99% of the time. Now, imagine we accidentally call Fac(n) in the first, and leave out the ++i in the second - the equivalent bug in each, and it causes an SOE in the first and a program that never stops in the second.
For the sort of code you're talking about, where you keep producing more and more results as you go based on previous results, you can place the results you've got in a data-structure (Queue<T> and Stack<T> both serve well for a lot of cases) so the code becomes something like):
private void MyLoadMethod(string firstConceptCKI)
{
Queue<string> pendingItems = new Queue<string>();
pendingItems.Enqueue(firstConceptCKI);
while(pendingItems.Count != 0)
{
string conceptCKI = pendingItems.Dequeue();
// make some script calls to DB, so that moTargetConceptList2 will have Concept-Relations for the current node.
// when this is zero, it means its a leaf.
int numberofKids = moTargetConceptList2.ConceptReltns.Count();
for (int i = 1; i <= numberofKids; i++)
{
oUCMRConceptReltn = moTargetConceptList2.ConceptReltns.get_ItemByIndex(i, false);
//Get the concept linked to the relation concept
if (oUCMRConceptReltn.SourceCKI == sConceptCKI)
{
oConcept = moTargetConceptList2.ItemByKeyConceptCKI(oUCMRConceptReltn.TargetCKI, false);
}
else
{
oConcept = moTargetConceptList2.ItemByKeyConceptCKI(oUCMRConceptReltn.SourceCKI, false);
}
//builder.AppendLine("\t" + oConcept.PrimaryCTerm.SourceString);
Debug.WriteLine(oConcept.PrimaryCTerm.SourceString);
pendingItems.Enque(oConcept.ConceptCKI);
}
}
}
(I haven't completely checked this, just added the queuing instead of recursing to the code in your question).
This should then do more or less the same as your code, but iteratively. Hopefully that means it'll work. Note that there is a possible infinite loop in this code if the data you are retrieving has a loop. In that case this code will throw an exception when it fills the queue with far too much stuff to cope. You can either debug the source data, or use a HashSet to avoid enqueuing items that have already been processed.
Edit: Better add how to use a HashSet to catch duplicates. First set up a HashSet, this could just be:
HashSet<string> seen = new HashSet<string>();
Or if the strings are used case-insensitively, you'd be better with:
HashSet<string> seen = new HashSet<string>(StringComparison.InvariantCultureIgnoreCase) // or StringComparison.CurrentCultureIgnoreCase if that's closer to how the string is used in the rest of the code.
Then before you go to use the string (or perhaps before you go to add it to the queue, you have one of the following:
If duplicate strings shouldn't happen:
if(!seen.Add(conceptCKI))
throw new InvalidOperationException("Attempt to use \" + conceptCKI + "\" which was already seen.");
Or if duplicate strings are valid, and we just want to skip performing the second call:
if(!seen.Add(conceptCKI))
continue;//skip rest of loop, and move on to the next one.
I think you have a recursion's ring (infinite recursion), not a really stack overflow error. If you are got more memory for stack - you will get the overflow error too.
For test it:
Declare a global variable for storing a operable objects:
private Dictionary<int,object> _operableIds = new Dictionary<int,object>();
...
private void Start()
{
_operableIds.Clear();
Recurtion(start_id);
}
...
private void Recurtion(int object_id)
{
if(_operableIds.ContainsKey(object_id))
throw new Exception("Have a ring!");
else
_operableIds.Add(object_id, null/*or object*/);
...
Recurtion(other_id)
...
_operableIds.Remove(object_id);
}

C# Collect garbage of object with memory leak

I am using a 3rd-party object I didn't create that over time consumes a lot of resources. This object shouldn't in any way contain a state, it simply performs a calculation. Despite this fact, everytime I call a specific function of this object a little more memory is consumed. A few hours later, and my program is sitting at gigabytes of allocated memory.
The object was origionaly initialized as a static member of my Program class in my command-line application. I have found that if I wrap my entire program in an class, and reinitialize it every now and again, the older (and bloated) object is unallocated by GC and a new smaller object replaces it.
My issue is this method is quite clumsy and ruins the flow of my Program.
Is there any other way you can dispose of an object? I am lead to believe GC.Collect() will only dispose unreachable code. Is there anyway I can make an object 'unreachable'?
Edit: As requested, the code:
static ILexicon lexicon = new Lexicon();
...
lexicon.LoadDataFromFile(#"lexicon.dat", null);
...
byte similarityScore(string w1, string w2, PartOfSpeech pos, SimilarityMeasure measure)
{
if (w1 == w2)
return 255;
if (pos != PartOfSpeech.Noun && pos != PartOfSpeech.Verb)
return 0;
IList<ILemma> w1_lemmas = lexicon.FindSenses(w1, pos);
IList<ILemma> w2_lemmas = lexicon.FindSenses(w2, pos);
byte result;
byte score = 0;
foreach (ILemma w1_lemma in w1_lemmas)
{
foreach (ILemma w2_lemma in w2_lemmas)
{
result = (byte) (w1_lemma.GetSimilarity(w2_lemma, measure) * 255);
if (result > score)
score = result;
}
}
return score;
}
As similarityScore is called, more memory is allocated to a private member of lexicon. It does not implement IDisposable and there are no obvious functions to clear the memory. The library is based on WordNet, and uses an algorithm to find path lengths in the hypernym tree to calculate the similarity of two words. Unless there is caching, I can't see why it would need to store any memory. What is for sure, is I can't change it. I'm almost certain there is nothing wrong with my code. I just need to dispose of lexicon when it gets too large (N.B. it takes a second or two to load the lexicon from file to memory)
If the object doesn't implement IDisposable and you want to push it out of scope you can set all references to it to null and then the force garbage collection with GC.Collect().
GC.Collect() is very expensive. If you're going to have to do this frequently, you might want to consider contacting the vendor.
Find out:
If you are using their library correctly, or is there something you're doing wrong that's causing the memory leak.
If their library is leaking memory even when used as intended, can they fix the leak?
Additional note: If the 3rd party library is native and you're having to use interop, you can use Marshal.ReleaseComObject to free unmanaged memory.
you could try calling the Dispose() method. This would make the object unusable, so you would have to instantiate another one. I assume your program is in a loop, so it can be a loop variable with the call to dispose at the bottom.
I would suggest that if you can get your hands on a memory profiler, you use it. A memory profiler will let you pause your program, click on a class, and and see a list of objects of that class. One can then click on an object and see how it was created, and the "path" to that object from a root (e.g. there's a static class foo, which holds a reference to a bar, which holds a reference to a boz, which holds a reference to a reallybigthing). Often, seeing that will make it clear what needs to be done to break the chain.
you might be able to download the source from wordnet repository and modify the code since it is an opensource.

Traversing arbitrarily large binary tree inorder

I'm stuck at finding a solution. C#, .NET 4.0, VS2010
I can easily write a recursive one, but can't for the life of me figure out something that won't overflow the stack if the tree is arbitrarily large.
This is a binary tree question, and i am trying to write a
public IEnumerable<T> Values()
method.
Here is the full code in case you are interested: http://pastebin.com/xr2f3y7g
Obviously, the version currently in there doesn't work. I probably should mention that I am a newbie in C#, transitioning from C++.
Here is a method for inorder traversal, that uses explicit stack. The stack is created on the heap, so it can be much larger, than the stack the processor uses.
public IEnumerable<T> Values()
{
Stack<Node> stack = new Stack<Node>();
Node current = this.root;
while(current != null)
{
while(current.leftChild != null)
{
stack.Push(current);
current = current.leftChild;
}
yield return current.data;
while(current.rightChild == null && stack.Count > 0)
{
current = stack.Pop();
yield return current.data;
}
current = current.rightChild;
}
}
If you can't use a stack and your nodes happen to have parent pointers, you can try solutions from this question
Assuming the tree is anywhere near balanced, its maximum depth is log2(n), so you'd need a huge tree to overflow the stack.
You can transform any recursive algorithm into an iterative one, but in this case, it will likely require either backward pointers or an explicit stack, both of which look expensive.
Having said that, recursion is typically not so great in .NET because any local variables in calling instances of a method cannot be GC'ed until the stack gets unwound after the terminating condition. I don't know whether the JIT will automatically optimize tail-end recursion to make it iterative, but that would help.

Where did variable = null as "object destroying" come from?

Working on a number of legacy systems written in various versions of .NET, across many different companies, I keep finding examples of the following pattern:
public void FooBar()
{
object foo = null;
object bar = null;
try
{
foo = new object();
bar = new object();
// Code which throws exception.
}
finally
{
// Destroying objects
foo = null;
bar = null;
}
}
To anybody that knows how memory management works in .NET, this kind of code is painfully unnecessary; the garbage collector does not need you to manually assign null to tell that the old object can be collected, nor does assigning null instructs the GC to immediately collected the object.
This pattern is just noise, making it harder to understand what the code is trying to achieve.
Why, then, do I keep finding this pattern? Is there a school that teaches this practice? Is there a language in which assigning null values to locally scoped variables is required to correctly manage memory? Is there some additional value in explicitly assigning null that I haven't percieved?
It's FUDcargo cult programming (thanks to Daniel Earwicker) by developers who are used to "free" resources, bad GC implementations and bad API.
Some GCs didn't cope well with circular references. To get rid of them, you had to break the cycle "somewhere". Where? Well, if in doubt, then everywhere. Do that for a year and it's moved into your fingertips.
Also setting the field to null gives you the idea of "doing something" because as developers, we always fear "to forget something".
Lastly, we have APIs which must be closed explicitly because there is no real language support to say "close this when I'm done with it" and let the computer figure it out just like with GC. So you have an API where you have to call cleanup code and API where you don't. This sucks and encourages patterns like the above.
It is possible that it came from VB which used a reference counting strategy for memory management and object lifetime. Setting a reference to Nothing (equivalent to null) would decrement the reference count. Once that count became zero then the object was destroyed synchronously. The count would be decremented automatically upon leaving the scope of a method so even in VB this technique was mostly useless, however there were special situations where you would want to greedily destroy an object as illustrated by the following code.
Public Sub Main()
Dim big As Variant
Set big = GetReallyBigObject()
Call big.DoSomething
Set big = Nothing
Call TimeConsumingOperation
Call ConsumeMoreMemory
End Sub
In the above code the object referenced by big would have lingered until the end without the call to Set big = Nothing. That may be undesirable if the other stuff in the method was a time consuming operation or generated more memory pressure.
It comes from C/C++ where explicitly made setting your pointers to null was the norm (to eliminate dangling pointers)
After calling free():
#include <stdlib.h>
{
char *dp = malloc ( A_CONST );
// Now that we're freeing dp, it is a dangling pointer because it's pointing
// to freed memory
free ( dp );
// Set dp to NULL so it is no longer dangling
dp = NULL;
}
Classic VB developers also did the same thing when writing their COM components to prevent memory leaks.
It is more common in languages with deterministic garbage collection and without RAII, such as the old Visual Basic, but even there it's unnecessary and there it was often necessary to break cyclic references. So possibly it really stems from bad C++ programmers who use dumb pointers all over the place. In C++, it makes sense to set dumb pointers to 0 after deleting them to prevent double deletion.
I've seen this a lot in VBScript code (classic ASP) and I think it comes from there.
I think it used to be a common misunderstanding among former C/C++ developers. They knew that the GC will free their memory, but they didn't really understand when and how. Just clean it and carry on :)
I suspect that this pattern comes from translating C++ code to C# without pausing to understand the differences between C# finalization and C++ finalization. In C++ I often null things out in the destructor, either for debugging purposes (so that you can see in the debugger that the reference is no longer valid) or, rarely, because I want a smart object to be released. (If that's the meaning I'd rather call Release on it and make the meaning of the code crystal-clear to the maintainers.) As you note, this is pretty much senseless in C#.
You see this pattern in VB/VBScript all the time too, for different reasons. I mused a bit about what might cause that here:
Link
May be the convention of assigning null originated from the fact that had foo been an instance variable instead of a local variable, you should remove the reference before GC can collect it. Someone slept during the first sentence and started nullifying all their variables; the crowd followed.
It comes from C/C++ where doing a free()/delete on an already released pointer could result in a crash while releasing a NULL-pointer simply did nothing.
This means that this construct (C++) will cause problems
void foo()
{
myclass *mc = new myclass(); // lets assume you really need new here
if (foo == bar)
{
delete mc;
}
delete mc;
}
while this will work
void foo()
{
myclass *mc = new myclass(); // lets assume you really need new here
if (foo == bar)
{
delete mc;
mc = NULL;
}
delete mc;
}
Conclusion: IT's totally unneccessary in C#, Java and just about any other garbage-collecting language.
Consider a slight modification:
public void FooBar()
{
object foo = null;
object bar = null;
try
{
foo = new object();
bar = new object();
// Code which throws exception.
}
finally
{
// Destroying objects
foo = null;
bar = null;
}
vavoom(foo,bar);
}
The author(s) may have wanted to ensure that the great Vavoom (*) did not get pointers to malformed objects if an exception was previously thrown and caught. Paranoia, resulting in defensive coding, is not necessarily a bad thing in this business.
(*) If you know who he is, you know.
VB developers had to dispose all of their objects, to try and mitigate the chance of a Memory leak. I can imagine this is where it has come from as VB devs migrated over to .NEt / c#
I can see it coming from either a misunderstanding of how the garbage collection works, or an effort to force the GC to kick in immediately - perhaps because the objects foo and bar are quite large.
I've seen this in some Java code before. It was used on a static variable to signal that the object should be destroyed.
It probably didn't originate from Java though, as using it for anything other than a static variable would also not make sense in Java.
It comes from C++ code especially smart pointers. In that case it's rougly equivalent to a .Dispose() in C#.
It's not a good practice, at most a developer's instinct. There is no real value by assigning null in C#, except may be helping the GC to break a circular reference.

C# Property Access Optimization

In C# (or VB .NET), does the compiler make attempts to optimize property accesses? For eg.,
public ViewClass View
{
get
{
...
Something is computed here
....
}
}
if (View != null)
View.Something = SomethingElse;
I would imagine that if the compiler could somehow detect that View remains constant between the two accesses, it can refrain from computing the value twice. Are these kind of optimizations performed?
I understand that if View has some intensive computations, it should probably be refactored into a function (GetView()). In my particular case, View involves climbing the visual tree looking for an element of a particular type.
Related: Any references on the workings of the (Microsoft) C# compiler?
Not in general, no. As Steven mentioned there are numerous factors to consider regarding multithreading, if you truly are computing something that might change, you're correct it should be refactored away from a property. If it won't change, you should lazy-load it (check if the private member is null, if so then calculate, then return the value).
If it won't change and depends on a parameter, you can use a Dictionary or Hashtable as a cache - given the parameter (key) you will store the value. You could have each entry as a WeakReference to the value too, so when the value isn't referenced anywhere and garbage collection happens, the memory will be freed.
Hope that helps.
The question is very unclear, it isn't obvious to me how the getter and the snippet below it are related. But yes, property accessors are normally heavily optimized. Not by the C# compiler, by the JIT compiler. For one, they are often inlined so you don't pay for the cost of a method call.
That will only happen if the getter doesn't contain too much code and doesn't monkey with locks and exception handling. You can help the JIT compiler to optimize the common case with code like this:
get
{
if (_something == null) {
_something = createSomething();
}
return _something;
}
This will inline the common case and allow the creation method to remain un-inlined. This gets typically compiled to three machine code instructions in the Release build (load + test + jump), about a nano-second of execution time. It is a micro-optimization, seeing an actual perf improvement would be quite rare.
Do note that the given sample code is not thread-safe. Always write correct code rather than fast code first.
No, which is why you should use Lazy<T> to implement a JIT calculation.
From my understanding there is no implicit caching - you have to cache the value of a given property yourself the first time it is calculated
For example:
object mCachedValue = null;
public Object MyProperty
{
get
{
if (mCachedValue == null)
{
lock(mCachedValue)
{
//after acquiring the lock check if the property has not been initialized in the mean time - only calculate once
if (mCachedValue == null)
{
//calculate value the first time
}
}
}
return mCachedValue;
}

Categories

Resources