I just delivered my first C# WebAPI application to the first customer. Under normal load, performance initially is even better than I expected. Initially.
Everything worked fine until, at some point, memory was up and garbage collection started running riot (as in "It collects objects that are not yet garbage"). At that point, there were multiple W3WP threads with some ten gigs of ram altogether, and single-digit gigs per worker. After a restart of the IIS everything was back to normal, but of course the memory usage is rising again.
Please correct me if I am wrong, but
Shouldn't C# have automatic garbage collection?
Shouldn't it be easy for GC to collect the garbage of a WebAPI application?
And please help me out:
How can I explicitly state what GC should collect, thus preventing memory leaks? Is someBigList = null; the way to go?
How can I detect where the memory leaks are?
EDIT: Let me clarify some things.
My .NET WebAPI application is mostly a bunch of
public class MyApiController:ApiController
{
[HttpGet]
public MyObjectClass[] MyApi(string someParam) {
List<MyObjectClass> list = new List<MyObjectClass>();
...
for/while/foreach {
MyObjectClass obj = new MyObjectClass();
obj.firstStringAttribute = xyz;
...
list.Add(obj);
}
return list.ToArray();
}
}
Under such conditions, GC should be easy: after "return", all local variables should be garbage. Yet with every single call the used memory increases.
I initially thought that C# WebAPI programs behave similar to (pre-compiled) PHP: IIS calls the program, it is executed, returns the value and is then completely disposed off.
But this is not the case. For instance, I found static variables to keep their data between runs, and now I disposed of all static variables.
Because I found static variables to be a problem for GC:
internal class Helper
{
private static List<string> someVar = new List<string>();
internal Helper() {
someVar=new List<string>();
}
internal void someFunc(string str) {
someVar.Add(str);
}
internal string[] someOtherFunc(string str) {
string[] s = someVar.ToArray();
someVar=new List<string>();
return s;
}
}
Here, under low-memory conditions, someVar threw a null pointer error, which in my opinion can only be caused by GC, since I did not find any code where someVar is actively nullified by me.
I think the memory increase slowed down since I actively set the biggest array variables in the most often used Controllers to null, but this is only a gut feeling and not even nearly a complete solution.
I will now do some profiling using the link you provided, and get back with some results.
Shouldn't C# have automatic garbage collection?
C# is a programming language for the .NET runtime, and .NET brings the automatic garbage collection to the table. So, yes, although technically C# isn't the piece that brings it.
Shouldn't it be easy for GC to collect the garbage of a WebAPI application?
Sure, it should be just as easy as for any other type of .NET application.
The common theme here is garbage. How does .NET determine that something is garbage? By verifying that there are no more live references to the object. To be honest I think it is far more likely that you have verified one of your assumptions wrongly, compared to there being a serious bug in the garbage collector in such a way that "It collects objects that are not yet garbage".
To find leaks, you need to figure out what objects are currently held in memory, make a determination whether that is correct or not, and if not, figure out what is holding them there. A memory profiler application would help with that, there are numerous available, such as the Red-Gate ANTS Memory Profiler.
For your other questions, how to make something eligible for garbage collection? By turning it into garbage (see definition above). Note that setting a local variable to null may not necessarily help or be needed. Setting a static variable to null, however, might. But the correct way to determine that is to use a profiler.
Here are some shot-in-the-dark type of tips you might look into:
Look at static classes, static fields, and static properties. Are you storing data there that is accumulating?
How about static events? Do you have this? Do you remember to unsubscribe the event when you no longer need it?
And by "static fields, properties, and events", I also mean normal instance fields, properties and events that are held in objects that directly or indirectly are stored in static fields or properties. Basically, anything that will keep the objects in memory.
Are you remembering to Dispose of all your IDisposable objects? If not, then the memory being used could be unmanaged. Typically, however, when the garbage collector collects the managed object, the finalizer of that object should clean up the unmanaged memory as well, however you might allocate memory that the GC algorithm isn't aware of, and thus thinks it isn't a big problem to wait with collection. See the GC.AddMemoryPressure method for more on this.
Related
Using the System.DirectoryServices.Protocols library:
I have a class LdapItemOperator that takes a SearchResultEntry object from an LDAP query (not Active Directory related) and stores the attributes for the object in a field: readonly SearchResultAttributeCollection LdapAttributes.
The problem I am experiencing is that when I have a large operation the garbage collector seems to never delete these objects after they ought to have been disposed because of the LdapAttributes field in my objects, at least I think that's the problem. What ways can I try to dispose of the objects when they are no longer required? I can't seem to find a way to incorporate a using statement in there, although I only have little experience with it.
As an example, let's say I have the following logic:
List<LdapItemOperator> itemList = GetList(ldapFilter);
List<bool> resultList = new List<bool>();
foreach (IdmLdapItemOperator item in itemList) {
bool result = doStuff(item);
resultList.Add(result);
}
//Even though we are out of the loop now, the objects are still stored in memory, how come? Same goes for the previous objects in the loop, they seem to remain in memory
Logic.WriteResultToLog(result);
After a good while of running the logic on large filesets, this process starts taking up enormous amounts of memory, of course...
I think you might be a little confused about how GC works. You can never know exactly when GC will run. And objects you are still holding a reference to will not be collected (unless it's a weak reference...).
Also "disposing" is yet another different concept, that hasn't much to do with GC.
Basically, all objects will be in memory already after the call to GetList. And memory consumption will not change much after that, the foreach loop shouldn't affect it at all.
Without knowing your implementation, maybe try returning an enumerable instead of a single list, or make multiple batched calls.
I am using a 3rd-party object I didn't create that over time consumes a lot of resources. This object shouldn't in any way contain a state, it simply performs a calculation. Despite this fact, everytime I call a specific function of this object a little more memory is consumed. A few hours later, and my program is sitting at gigabytes of allocated memory.
The object was origionaly initialized as a static member of my Program class in my command-line application. I have found that if I wrap my entire program in an class, and reinitialize it every now and again, the older (and bloated) object is unallocated by GC and a new smaller object replaces it.
My issue is this method is quite clumsy and ruins the flow of my Program.
Is there any other way you can dispose of an object? I am lead to believe GC.Collect() will only dispose unreachable code. Is there anyway I can make an object 'unreachable'?
Edit: As requested, the code:
static ILexicon lexicon = new Lexicon();
...
lexicon.LoadDataFromFile(#"lexicon.dat", null);
...
byte similarityScore(string w1, string w2, PartOfSpeech pos, SimilarityMeasure measure)
{
if (w1 == w2)
return 255;
if (pos != PartOfSpeech.Noun && pos != PartOfSpeech.Verb)
return 0;
IList<ILemma> w1_lemmas = lexicon.FindSenses(w1, pos);
IList<ILemma> w2_lemmas = lexicon.FindSenses(w2, pos);
byte result;
byte score = 0;
foreach (ILemma w1_lemma in w1_lemmas)
{
foreach (ILemma w2_lemma in w2_lemmas)
{
result = (byte) (w1_lemma.GetSimilarity(w2_lemma, measure) * 255);
if (result > score)
score = result;
}
}
return score;
}
As similarityScore is called, more memory is allocated to a private member of lexicon. It does not implement IDisposable and there are no obvious functions to clear the memory. The library is based on WordNet, and uses an algorithm to find path lengths in the hypernym tree to calculate the similarity of two words. Unless there is caching, I can't see why it would need to store any memory. What is for sure, is I can't change it. I'm almost certain there is nothing wrong with my code. I just need to dispose of lexicon when it gets too large (N.B. it takes a second or two to load the lexicon from file to memory)
If the object doesn't implement IDisposable and you want to push it out of scope you can set all references to it to null and then the force garbage collection with GC.Collect().
GC.Collect() is very expensive. If you're going to have to do this frequently, you might want to consider contacting the vendor.
Find out:
If you are using their library correctly, or is there something you're doing wrong that's causing the memory leak.
If their library is leaking memory even when used as intended, can they fix the leak?
Additional note: If the 3rd party library is native and you're having to use interop, you can use Marshal.ReleaseComObject to free unmanaged memory.
you could try calling the Dispose() method. This would make the object unusable, so you would have to instantiate another one. I assume your program is in a loop, so it can be a loop variable with the call to dispose at the bottom.
I would suggest that if you can get your hands on a memory profiler, you use it. A memory profiler will let you pause your program, click on a class, and and see a list of objects of that class. One can then click on an object and see how it was created, and the "path" to that object from a root (e.g. there's a static class foo, which holds a reference to a bar, which holds a reference to a boz, which holds a reference to a reallybigthing). Often, seeing that will make it clear what needs to be done to break the chain.
you might be able to download the source from wordnet repository and modify the code since it is an opensource.
Here's an interesting article that I found on the web.
It talks about how this firm is able to parse a huge amount of financial data in a managed environment, essentially by object reuse and avoiding immutables such as string. They then go on and show that their program doesn't do any GC during the continuous operation phase.
This is pretty impressive, and I'd like to know if anyone else here has some more detailed guidelines as to how to do this. For one, I'm wondering how the heck you can avoid using string, when blatently some of the data inside the messages are strings, and whatever client application is looking at the messages will want to be passed those strings? Also, what do you allocate in the startup phase? How will you know it's enough? Is it simple a matter of claiming a big chunk of memory and keeping a reference to it so that GC doesn't kick in? What about whatever client application is using the messages? Does it also need to be written according to these stringent standards?
Also, would I need a special tool to look at the memory? I've been using SciTech memory profiler thus far.
I found the paper you linked to rather deficient:
It assumes, and wants you to assume, that garbage collection is the ultimate latency killer. They have not explained why they think so, nor have they explained in what way their system is not basically a custom-made garbage collector in disguise.
It talks about the amount of memory cleaned up in garbage collection, which is irrelevant: the time taken to garbage collect depends more on the number of objects, irrespective of their size.
The table of “results” at the bottom provides no comparison to a system that uses .NET’s garbage collector.
Of course, this doesn’t mean they’re lying and it’s nothing to do with garbage collection, but it basically means that the paper is just trying to sound impressive without actually divulging anything useful that you could use to build your own.
One thing to note from the beginning is where they say "Conventional wisdom has been developing low latency messaging technology required the use of unmanaged C++ or assembly language". In particular, they are talking about a sort of case where people would often dismiss a .NET (or Java) solution out of hand. For that matter, a relatively naïve C++ solution probably wouldn't make the grade either.
Another thing to consider here, is that they have essentially haven't so much gotten rid of the GC as replaced it - there's code there managing object lifetime, but it's their own code.
There are several different ways one could do this instead. Here's one. Say I need to create and destroy several Foo objects as my application runs. Foo creation is parameterised by an int, so the normal code would be:
public class Foo
{
private readonly int _bar;
Foo(int bar)
{
_bar = bar;
}
/* other code that makes this class actually interesting. */
}
public class UsesFoo
{
public void FooUsedHere(int param)
{
Foo baz = new Foo(param)
//Do something here
//baz falls out of scope and is liable to GC colleciton
}
}
A much different approach is:
public class Foo
{
private static readonly Foo[] FOO_STORE = new Foo[MOST_POSSIBLY_NEEDED];
private static Foo FREE;
static Foo()
{
Foo last = FOO_STORE[MOST_POSSIBLY_NEEDED -1] = new Foo();
int idx = MOST_POSSIBLY_NEEDED - 1;
while(idx != 0)
{
Foo newFoo = FOO_STORE[--idx] = new Foo();
newFoo._next = FOO_STORE[idx + 1];
}
FREE = last._next = FOO_STORE[0];
}
private Foo _next;
//Note _bar is no longer readonly. We lose the advantages
//as a cost of reusing objects. Even if Foo acts immutable
//it isn't really.
private int _bar;
public static Foo GetFoo(int bar)
{
Foo ret = FREE;
FREE = ret._next;
return ret;
}
public void Release()
{
_next = FREE;
FREE = this;
}
/* other code that makes this class actually interesting. */
}
public class UsesFoo
{
public void FooUsedHere(int param)
{
Foo baz = Foo.GetFoo(param)
//Do something here
baz.Release();
}
}
Further complication can be added if you are multithreaded (though for really high performance in a non-interactive environment, you may want to have either one thread, or separate stores of Foo classes per thread), and if you cannot predict MOST_POSSIBLY_NEEDED in advance (the simplest is to create new Foo() as needed, but not release them for GC which can be easily done in the above code by creating a new Foo if FREE._next is null).
If we allow for unsafe code we can have even greater advantages in having Foo a struct (and hence the array holding a contiguous area of stack memory), _next being a pointer to Foo, and GetFoo() returning a pointer.
Whether this is what these people are actually doing, I of course cannot say, but the above does prevent GC from activating. This will only be faster in very high throughput conditions, if not then letting GC do its stuff is probably better (GC really does help you, despite 90% of questions about it treating it as a Big Bad).
There are other approaches that similarly avoid GC. In C++ the new and delete operators can be overridden, which allows for the default creation and destruction behaviour to change, and discussions of how and why one might do so might interest you.
A practical take-away from this is when objects either hold resources other than memory that are expensive (e.g. connections to databases) or "learn" as they continue to be used (e.g. XmlNameTables). In this case pooling objects is useful (ADO.NET connections do so behind the scenes by default). In this case though a simple Queue is the way to go, as the extra overhead in terms of memory doesn't matter. You can also abandon objects on lock contention (you're looking to gain performance, and lock contention will hurt it more than abandoning the object), which I doubt would work in their case.
From what I understood, the article doesn't say they don't use strings. They don't use immutable strings. The problem with immutable strings is that when you're doing parsing, most of the strings generated are just throw-away strings.
I'm guessing they're using some sort of pre-allocation combined with free lists of mutable strings.
I worked for a while with a CEP product called StreamBase. One of their engineers told me that they were migrating their C++ code to Java because they were getting better performance, fewer bugs and better portability on the JVM by pretty much avoiding GC altogether. I imagine the arguments apply to the CLR as well.
It seemed counter-intuitive, but their product was blazingly fast.
Here's some information from their site:
StreamBase avoids garbage collection in two ways: Not using objects, and only using the minimum set of objects we need.
First, we avoid using objects by using Java primitive types (Boolean, byte, int, double, and long) to represent our data for processing. Each StreamBase data type is represented by one or more primitive type. By only manipulating the primitive types, we can store data efficiently in stack or array allocated regions of memory. We can then use techniques like parallel arrays or method calling to pass data around efficiently.
Second, when we do use objects, we are careful about their creation and destruction. We tend to pool objects rather than releasing them for garbage collection. We try to manage object lifecycle such that objects are either caught by the garbage collector in the young generation, or kept around forever.
Finally, we test this internally using a benchmarking harness that measures per-tuple garbage collection. In order to achieve our high speeds, we try to eliminate all per-tuple garbage collection, generally with good success.
In 99% of the time you will be wasting your bosses money when you try to achieve this. The article describes a absolute extreme scenario where they need the last drop of performance. As you can read in the article, there are great parts of the .NET framework that can't be used when trying to be GC-free. Some of the most basic parts of the BCL use memory allocations (or 'produce garbage', as the paper calls it). You will need to find a way around those methods. And even when you need absolute blazingly fast applications, you'd better first try to build an application/architecture that can scale out (use multiple machines), before trying to walk the no-GC route. The sole reason for them to use the no-GC route is they need an absolute low latency. IMO, when you need absolute speed, but don't care about the absolute minimum response time, it will be hard to justify a no-GC architecture. Besides this, if you try to build a GC-free client application (such as Windows Forms or WPF App); forget it, those presentation frameworks create new objects constantly.
But if you really want this, it is actually quite simple. Here is a simple how to:
Find out which parts of the .NET API can't be used (you can write a tool that analyzes the .NET assemblies using an introspection engine).
Write a program that verifies the code you or your developers write to ensure they don't allocate directly or use 'forbidden' .NET methods, using the safe list created in the previous point (FxCop is a great tool for this).
Create object pools that you initialize at startup time. The rest of the program can reuse existing object so that they won't have to do any new ops.
If you need to manipulate strings, use byte arrays for this and store byte arrays in a pool (WCF uses this technique also). You will have to create an API that allows manipulating those byte arrays.
And last but not least, profile, profile, profile.
Good luck
Working on a number of legacy systems written in various versions of .NET, across many different companies, I keep finding examples of the following pattern:
public void FooBar()
{
object foo = null;
object bar = null;
try
{
foo = new object();
bar = new object();
// Code which throws exception.
}
finally
{
// Destroying objects
foo = null;
bar = null;
}
}
To anybody that knows how memory management works in .NET, this kind of code is painfully unnecessary; the garbage collector does not need you to manually assign null to tell that the old object can be collected, nor does assigning null instructs the GC to immediately collected the object.
This pattern is just noise, making it harder to understand what the code is trying to achieve.
Why, then, do I keep finding this pattern? Is there a school that teaches this practice? Is there a language in which assigning null values to locally scoped variables is required to correctly manage memory? Is there some additional value in explicitly assigning null that I haven't percieved?
It's FUDcargo cult programming (thanks to Daniel Earwicker) by developers who are used to "free" resources, bad GC implementations and bad API.
Some GCs didn't cope well with circular references. To get rid of them, you had to break the cycle "somewhere". Where? Well, if in doubt, then everywhere. Do that for a year and it's moved into your fingertips.
Also setting the field to null gives you the idea of "doing something" because as developers, we always fear "to forget something".
Lastly, we have APIs which must be closed explicitly because there is no real language support to say "close this when I'm done with it" and let the computer figure it out just like with GC. So you have an API where you have to call cleanup code and API where you don't. This sucks and encourages patterns like the above.
It is possible that it came from VB which used a reference counting strategy for memory management and object lifetime. Setting a reference to Nothing (equivalent to null) would decrement the reference count. Once that count became zero then the object was destroyed synchronously. The count would be decremented automatically upon leaving the scope of a method so even in VB this technique was mostly useless, however there were special situations where you would want to greedily destroy an object as illustrated by the following code.
Public Sub Main()
Dim big As Variant
Set big = GetReallyBigObject()
Call big.DoSomething
Set big = Nothing
Call TimeConsumingOperation
Call ConsumeMoreMemory
End Sub
In the above code the object referenced by big would have lingered until the end without the call to Set big = Nothing. That may be undesirable if the other stuff in the method was a time consuming operation or generated more memory pressure.
It comes from C/C++ where explicitly made setting your pointers to null was the norm (to eliminate dangling pointers)
After calling free():
#include <stdlib.h>
{
char *dp = malloc ( A_CONST );
// Now that we're freeing dp, it is a dangling pointer because it's pointing
// to freed memory
free ( dp );
// Set dp to NULL so it is no longer dangling
dp = NULL;
}
Classic VB developers also did the same thing when writing their COM components to prevent memory leaks.
It is more common in languages with deterministic garbage collection and without RAII, such as the old Visual Basic, but even there it's unnecessary and there it was often necessary to break cyclic references. So possibly it really stems from bad C++ programmers who use dumb pointers all over the place. In C++, it makes sense to set dumb pointers to 0 after deleting them to prevent double deletion.
I've seen this a lot in VBScript code (classic ASP) and I think it comes from there.
I think it used to be a common misunderstanding among former C/C++ developers. They knew that the GC will free their memory, but they didn't really understand when and how. Just clean it and carry on :)
I suspect that this pattern comes from translating C++ code to C# without pausing to understand the differences between C# finalization and C++ finalization. In C++ I often null things out in the destructor, either for debugging purposes (so that you can see in the debugger that the reference is no longer valid) or, rarely, because I want a smart object to be released. (If that's the meaning I'd rather call Release on it and make the meaning of the code crystal-clear to the maintainers.) As you note, this is pretty much senseless in C#.
You see this pattern in VB/VBScript all the time too, for different reasons. I mused a bit about what might cause that here:
Link
May be the convention of assigning null originated from the fact that had foo been an instance variable instead of a local variable, you should remove the reference before GC can collect it. Someone slept during the first sentence and started nullifying all their variables; the crowd followed.
It comes from C/C++ where doing a free()/delete on an already released pointer could result in a crash while releasing a NULL-pointer simply did nothing.
This means that this construct (C++) will cause problems
void foo()
{
myclass *mc = new myclass(); // lets assume you really need new here
if (foo == bar)
{
delete mc;
}
delete mc;
}
while this will work
void foo()
{
myclass *mc = new myclass(); // lets assume you really need new here
if (foo == bar)
{
delete mc;
mc = NULL;
}
delete mc;
}
Conclusion: IT's totally unneccessary in C#, Java and just about any other garbage-collecting language.
Consider a slight modification:
public void FooBar()
{
object foo = null;
object bar = null;
try
{
foo = new object();
bar = new object();
// Code which throws exception.
}
finally
{
// Destroying objects
foo = null;
bar = null;
}
vavoom(foo,bar);
}
The author(s) may have wanted to ensure that the great Vavoom (*) did not get pointers to malformed objects if an exception was previously thrown and caught. Paranoia, resulting in defensive coding, is not necessarily a bad thing in this business.
(*) If you know who he is, you know.
VB developers had to dispose all of their objects, to try and mitigate the chance of a Memory leak. I can imagine this is where it has come from as VB devs migrated over to .NEt / c#
I can see it coming from either a misunderstanding of how the garbage collection works, or an effort to force the GC to kick in immediately - perhaps because the objects foo and bar are quite large.
I've seen this in some Java code before. It was used on a static variable to signal that the object should be destroyed.
It probably didn't originate from Java though, as using it for anything other than a static variable would also not make sense in Java.
It comes from C++ code especially smart pointers. In that case it's rougly equivalent to a .Dispose() in C#.
It's not a good practice, at most a developer's instinct. There is no real value by assigning null in C#, except may be helping the GC to break a circular reference.
Should you set all the objects to null (Nothing in VB.NET) once you have finished with them?
I understand that in .NET it is essential to dispose of any instances of objects that implement the IDisposable interface to release some resources although the object can still be something after it is disposed (hence the isDisposed property in forms), so I assume it can still reside in memory or at least in part?
I also know that when an object goes out of scope it is then marked for collection ready for the next pass of the garbage collector (although this may take time).
So with this in mind will setting it to null speed up the system releasing the memory as it does not have to work out that it is no longer in scope and are they any bad side effects?
MSDN articles never do this in examples and currently I do this as I cannot
see the harm. However I have come across a mixture of opinions so any comments are useful.
Karl is absolutely correct, there is no need to set objects to null after use. If an object implements IDisposable, just make sure you call IDisposable.Dispose() when you're done with that object (wrapped in a try..finally, or, a using() block). But even if you don't remember to call Dispose(), the finaliser method on the object should be calling Dispose() for you.
I thought this was a good treatment:
Digging into IDisposable
and this
Understanding IDisposable
There isn't any point in trying to second guess the GC and its management strategies because it's self tuning and opaque. There was a good discussion about the inner workings with Jeffrey Richter on Dot Net Rocks here: Jeffrey Richter on the Windows Memory Model and
Richters book CLR via C# chapter 20 has a great treatment:
Another reason to avoid setting objects to null when you are done with them is that it can actually keep them alive for longer.
e.g.
void foo()
{
var someType = new SomeType();
someType.DoSomething();
// someType is now eligible for garbage collection
// ... rest of method not using 'someType' ...
}
will allow the object referred by someType to be GC'd after the call to "DoSomething" but
void foo()
{
var someType = new SomeType();
someType.DoSomething();
// someType is NOT eligible for garbage collection yet
// because that variable is used at the end of the method
// ... rest of method not using 'someType' ...
someType = null;
}
may sometimes keep the object alive until the end of the method. The JIT will usually optimized away the assignment to null, so both bits of code end up being the same.
No don't null objects. You can check out https://web.archive.org/web/20160325050833/http://codebetter.com/karlseguin/2008/04/28/foundations-of-programming-pt-7-back-to-basics-memory/ for more information, but setting things to null won't do anything, except dirty your code.
Also:
using(SomeObject object = new SomeObject())
{
// do stuff with the object
}
// the object will be disposed of
In general, there's no need to null objects after use, but in some cases I find it's a good practice.
If an object implements IDisposable and is stored in a field, I think it's good to null it, just to avoid using the disposed object. The bugs of the following sort can be painful:
this.myField.Dispose();
// ... at some later time
this.myField.DoSomething();
It's good to null the field after disposing it, and get a NullPtrEx right at the line where the field is used again. Otherwise, you might run into some cryptic bug down the line (depending on exactly what DoSomething does).
Chances are that your code is not structured tightly enough if you feel the need to null variables.
There are a number of ways to limit the scope of a variable:
As mentioned by Steve Tranby
using(SomeObject object = new SomeObject())
{
// do stuff with the object
}
// the object will be disposed of
Similarly, you can simply use curly brackets:
{
// Declare the variable and use it
SomeObject object = new SomeObject()
}
// The variable is no longer available
I find that using curly brackets without any "heading" to really clean out the code and help make it more understandable.
In general no need to set to null. But suppose you have a Reset functionality in your class.
Then you might do, because you do not want to call dispose twice, since some of the Dispose may not be implemented correctly and throw System.ObjectDisposed exception.
private void Reset()
{
if(_dataset != null)
{
_dataset.Dispose();
_dataset = null;
}
//..More such member variables like oracle connection etc. _oraConnection
}
The only time you should set a variable to null is when the variable does not go out of scope and you no longer need the data associated with it. Otherwise there is no need.
this kind of "there is no need to set objects to null after use" is not entirely accurate. There are times you need to NULL the variable after disposing it.
Yes, you should ALWAYS call .Dispose() or .Close() on anything that has it when you are done. Be it file handles, database connections or disposable objects.
Separate from that is the very practical pattern of LazyLoad.
Say I have and instantiated ObjA of class A. Class A has a public property called PropB of class B.
Internally, PropB uses the private variable of _B and defaults to null. When PropB.Get() is used, it checks to see if _PropB is null and if it is, opens the resources needed to instantiate a B into _PropB. It then returns _PropB.
To my experience, this is a really useful trick.
Where the need to null comes in is if you reset or change A in some way that the contents of _PropB were the child of the previous values of A, you will need to Dispose AND null out _PropB so LazyLoad can reset to fetch the right value IF the code requires it.
If you only do _PropB.Dispose() and shortly after expect the null check for LazyLoad to succeed, it won't be null, and you'll be looking at stale data. In effect, you must null it after Dispose() just to be sure.
I sure wish it were otherwise, but I've got code right now exhibiting this behavior after a Dispose() on a _PropB and outside of the calling function that did the Dispose (and thus almost out of scope), the private prop still isn't null, and the stale data is still there.
Eventually, the disposed property will null out, but that's been non-deterministic from my perspective.
The core reason, as dbkk alludes is that the parent container (ObjA with PropB) is keeping the instance of _PropB in scope, despite the Dispose().
Stephen Cleary explains very well in this post: Should I Set Variables to Null to Assist Garbage Collection?
Says:
The Short Answer, for the Impatient
Yes, if the variable is a static field, or if you are writing an enumerable method (using yield return) or an asynchronous method (using async and await). Otherwise, no.
This means that in regular methods (non-enumerable and non-asynchronous), you do not set local variables, method parameters, or instance fields to null.
(Even if you’re implementing IDisposable.Dispose, you still should not set variables to null).
The important thing that we should consider is Static Fields.
Static fields are always root objects, so they are always considered “alive” by the garbage collector. If a static field references an object that is no longer needed, it should be set to null so that the garbage collector will treat it as eligible for collection.
Setting static fields to null is meaningless if the entire process is shutting down. The entire heap is about to be garbage collected at that point, including all the root objects.
Conclusion:
Static fields; that’s about it. Anything else is a waste of time.
There are some cases where it makes sense to null references. For instance, when you're writing a collection--like a priority queue--and by your contract, you shouldn't be keeping those objects alive for the client after the client has removed them from the queue.
But this sort of thing only matters in long lived collections. If the queue's not going to survive the end of the function it was created in, then it matters a whole lot less.
On a whole, you really shouldn't bother. Let the compiler and GC do their jobs so you can do yours.
Take a look at this article as well: http://www.codeproject.com/KB/cs/idisposable.aspx
For the most part, setting an object to null has no effect. The only time you should be sure to do so is if you are working with a "large object", which is one larger than 84K in size (such as bitmaps).
I believe by design of the GC implementors, you can't speed up GC with nullification. I'm sure they'd prefer you not worry yourself with how/when GC runs -- treat it like this ubiquitous Being protecting and watching over and out for you...(bows head down, raises fist to the sky)...
Personally, I often explicitly set variables to null when I'm done with them as a form of self documentation. I don't declare, use, then set to null later -- I null immediately after they're no longer needed. I'm saying, explicitly, "I'm officially done with you...be gone..."
Is nullifying necessary in a GC'd language? No. Is it helpful for the GC? Maybe yes, maybe no, don't know for certain, by design I really can't control it, and regardless of today's answer with this version or that, future GC implementations could change the answer beyond my control. Plus if/when nulling is optimized out it's little more than a fancy comment if you will.
I figure if it makes my intent clearer to the next poor fool who follows in my footsteps, and if it "might" potentially help GC sometimes, then it's worth it to me. Mostly it makes me feel tidy and clear, and Mongo likes to feel tidy and clear. :)
I look at it like this: Programming languages exist to let people give other people an idea of intent and a compiler a job request of what to do -- the compiler converts that request into a different language (sometimes several) for a CPU -- the CPU(s) could give a hoot what language you used, your tab settings, comments, stylistic emphases, variable names, etc. -- a CPU's all about the bit stream that tells it what registers and opcodes and memory locations to twiddle. Many things written in code don't convert into what's consumed by the CPU in the sequence we specified. Our C, C++, C#, Lisp, Babel, assembler or whatever is theory rather than reality, written as a statement of work. What you see is not what you get, yes, even in assembler language.
I do understand the mindset of "unnecessary things" (like blank lines) "are nothing but noise and clutter up code." That was me earlier in my career; I totally get that. At this juncture I lean toward that which makes code clearer. It's not like I'm adding even 50 lines of "noise" to my programs -- it's a few lines here or there.
There are exceptions to any rule. In scenarios with volatile memory, static memory, race conditions, singletons, usage of "stale" data and all that kind of rot, that's different: you NEED to manage your own memory, locking and nullifying as apropos because the memory is not part of the GC'd Universe -- hopefully everyone understands that. The rest of the time with GC'd languages it's a matter of style rather than necessity or a guaranteed performance boost.
At the end of the day make sure you understand what is eligible for GC and what's not; lock, dispose, and nullify appropriately; wax on, wax off; breathe in, breathe out; and for everything else I say: If it feels good, do it. Your mileage may vary...as it should...
I think setting something back to null is messy. Imagine a scenario where the item being set to now is exposed say via property. Now is somehow some piece of code accidentally uses this property after the item is disposed you will get a null reference exception which requires some investigation to figure out exactly what is going on.
I believe framework disposables will allows throw ObjectDisposedException which is more meaningful. Not setting these back to null would be better then for that reason.
Some object suppose the .dispose() method which forces the resource to be removed from memory.