Can someone please explain me how garbage collection is working?
(I'm using C# and Java).
It's too complex topic to be covered in one simple answer.
Here is a list of recommended reading:
Wikipedia: Garbage collection
(computer science)
Garbage Collection: Automatic Memory
Management in the Microsoft .NET
Framework
Garbage Collection: Part 2:
Automatic Memory Management in the
Microsoft .NET Framework
Java theory and practice: A brief
history of garbage collection
The basic idea behind Garbage Collection is that you don't have to care about memory management. What the Garbage Collectod does is to periodically check upon objects references and find the one that are no more used (not referenced anymore) to reclaim their memory and compact the other.
Garbage Collector use various algorithms to perform their work and they differ for some details from a lenguage to another.
Wikipedia gives you a good starting point.
If you're looking for more in depth information about actual implementation of various Garbage Collectors (Java, .NET, ..) you can check here and here or search google for more.
Try the book Garbage Collection: Algorithms for Automatic Dynamic Memory Management. It wont have the more recent stuff in it, but it'll get you on your way.
Perfmon provides a number of counters for GC related performance...
Here's a nice webcast that discusses simple mark-and-sweep (non-generational) garbage collection, complete with nice animations to help clairy the concept.
I think you need to know that the Garbage Collector is a thread that runs on your program freeing the memory occupied by the objects whose references make them unreachable. You also need to know that the moment in which int GC runs can't be predicted, you may make a call to System.gc() to make a suggestion for the GC to run but not to make it run, it is the JVM who'll take that decision.
If you have:
Object objectReference = null;
the Object referenced by objectReference is GC bait. The subjects of "islands of isolation" and how the finalize() method works are interesting topics to read. I suggest a quick google search on both.
Related
I need to dispose of an object so it can release everything it owns, but it doesn't implement the IDisposable so I can't use it in a using block. How can I make the garbage collector collect it?
You can force a collection with GC.Collect(). Be very careful using this, since a full collection can take some time. The best-practice is to just let the GC determine when the best time to collect is.
Does the object contain unmanaged resources but does not implement IDisposable? If so, it's a bug.
If it doesn't, it shouldn't matter if it gets released right away, the garbage collector should do the right thing.
If it "owns" anything other than memory, you need to fix the object to use IDisposable. If it's not an object you control this is something worth picking a different vendor over, because it speaks to the core of how well your vendor really understands .Net.
If it does just own memory, even a lot of it, all you have to do is make sure the object goes out of scope. Don't call GC.Collect() — it's one of those things that if you have to ask, you shouldn't do it.
You can't perform garbage collection on a single object. You could request a garbage collection by calling GC.Collect() but this will effect all objects subject to cleanup. It is also highly discouraged as it can have a negative effect on the performance of later collections.
Also, calling Dispose on an object does not clean up it's memory. It only allows the object to remove references to unmanaged resources. For example, calling Dispose on a StreamWriter closes the stream and releases the Windows file handle. The memory for the object on the managed heap does not get reclaimed until a subsequent garbage collection.
Chris Sells also discussed this on .NET Rocks. I think it was during his first appearance but the subject might have been revisited in later interviews.
http://www.dotnetrocks.com/default.aspx?showNum=10
This article by Francesco Balena is also a good reference:
When and How to Use Dispose and Finalize in C#
http://www.devx.com/dotnet/Article/33167/0/page/1
Garbage collection in .NET is non deterministic, meaning you can't really control when it happens. You can suggest, but that doesn't mean it will listen.
Tells us a little bit more about the object and why you want to do this. We can make some suggestions based off of that. Code always helps. And depending on the object, there might be a Close method or something similar. Maybe the useage is to call that. If there is no Close or Dispose type of method, you probably don't want to rely on that object, as you will probably get memory leaks if in fact it does contain resourses which will need to be released.
If the object goes out of scope and it have no external references it will be collected rather fast (likely on the next collection).
BEWARE: of f ra gm enta tion in many cases, GC.Collect() or some IDisposal is not very helpful, especially for large objects (LOH is for objects ~80kb+, performs no compaction and is subject to high levels of fragmentation for many common use cases) which will then lead to out of memory (OOM) issues even with potentially hundreds of MB free. As time marches on, things get bigger, though perhaps not this size (80 something kb) for LOH relegated objects, high degrees of parallelism exasperates this issue due simply due to more objects in less time (and likely varying in size) being instantiated/released.
Array’s are the usual suspects for this problem (it’s also often hard to identify due to non-specific exceptions and assertions from the runtime, something like “high % of large object heap fragmentation” would be swell), the prognosis for code suffering from this problem is to implement an aggressive re-use strategy.
A class in Systems.Collections.Concurrent.ObjectPool from the parallel extensions beta1 samples helps (unfortunately there is not a simple ubiquitous pattern which I have seen, like maybe some attached property/extension methods?), it is simple enough to drop in or re-implement for most projects, you assign a generator Func<> and use Get/Put helper methods to re-use your previous object’s and forgo usual garbage collection. It is usually sufficient to focus on array’s and not the individual array elements.
It would be nice if .NET 4 updated all of the .ToArray() methods everywhere to include .ToArray(T target).
Getting the hang of using SOS/windbg (.loadby sos mscoreei for CLRv4) to analyze this class of issue can help. Thinking about it, the current garbage collection system is more like garbage re-cycling (using the same physical memory again), ObjectPool is analogous to garbage re-using. If anybody remembers the 3 R’s, reducing your memory use is a good idea too, for performance sakes ;)
We have an application in C# that controls one of our device and reacts to signal this device gives us.
Basically, the application creates threads, treat operations (access to a database, etc) and communicates with this device.
In the life of the application, it creates objects and release them and so far, we're letting the Garbage Collector taking care of our memory. i've read that it is highly recommanded to let the GC do its stuff without interfering.
Now the problem we're facing is that the process of our application grows ad vitam eternam, growing by step. Example:
It seems to have "waves" when the application is growing and all of a sudden, the application release some memory but seems to leave memory leaks at the same time.
We're trying to investigate the application with some memory profiler but we would like to understand deeply how the garbage Collector works.
I've found an excellent article here : The Danger of Large Objects
I've also found the official documentation here : MSDN
Do you guys know another really really deep documentation of GC?
Edit :
Here is a screenshot that illustrates the behavior of the application :
You can clearly see the "wave" effect we're having on a very regular pattern.
Subsidiary question :
I've seen that my GC Collection 2 Heap is quite big and following the same pattern as the total bytes used by my application. I guess it's perfectly normal because most of our objects will survive at least 2 garbage collections (for example Singleton classes, etc)... What do you think ?
The behavior you describe is typical of problems with objects created on Large Object Heap (LOH). However, your memory consumption seems to return to some lower value later on, so check twice whether it is really a LOH issue.
You are obviously aware of that, but what is not quite obvious is that there is an exception to the size of the objects on LOH.
As described in documentation, objects above 85000 bytes in size end up on LOH. However, for some reason (an 'optimization' probably) arrays of doubles which are longer than 1000 elements also end up there:
double[999] smallArray = ...; // ends up in 'normal', Gen-0 heap
double[1001] bigArray = ...; // ends up in LOH
These arrays can result in fragmented LOH, which requires more memory, until you get an Out of memory exception.
I was bitten by this as we had an app which received some sensor readings as arrays of doubles which resulted in LOH defragmentation since every array slightly differed in length (these were readings of realtime data at various frequencies, sampled by non-realtime process). We solved the issue by implementing our own buffer pool.
I did some research on a class I was teaching a couple of years back. I don't think the references contain any information regarding the LoH but I thought it was worthwhile to share them either way (see below). Further, I suggest performing a second search for unreleased object references before blaming the garbage collector. Simply implementing a counter in the class finalizer to check that these large objects are being dropped as you believe.
A different solution to this problem, is simply to never deallocate your large objects, but instead reuse them with a pooling strategy. In my hubris I have many times before ended up blaming the GC prematurely for the memory requirements of my application growing over time, however this is more often than not a symptom of faulty implementation.
GC References:
http://blogs.msdn.com/b/clyon/archive/2007/03/12/new-in-orcas-part-3-gc-latency-modes.aspx
http://msdn.microsoft.com/en-us/library/ee851764.aspx
http://blogs.msdn.com/b/ericlippert/archive/2010/09/30/the-truth-about-value-types.aspx
http://blogs.msdn.com/b/ericlippert/archive/2009/04/27/the-stack-is-an-implementation-detail.aspx
Eric Lippert's blog is especially interesting, when it comes to understanding anything C# in detail!
Here is an update with some of my investigations :
In our application, we're using a lot of thread to make different tasks. Some of these threads have higher priority.
1) We're using a GC that is concurrent and we tried to switch it back to non-concurrent.
We've seen a dramatic improvment :
The Garbage collector is being called much often and it seems that, when called more often, it's releasing much better our memory.
I'll post a screenshot as soon as I have a good one to illustrate this.
We've found a really good article on the MSDN. We also found an interesting question on SO.
With the next Framework 4.5, 4 possibilities will be available for GC configuration.
Workstation - non-concurrent
Workstation - concurrent
Server - non-concurrent
Server - concurrent
We'll try and switch to the "server - non-concurrent" and "serveur - concurrent" to check if it's giving us better performance.
I'll keep this thread updated with our findings.
I have a large C# server application, I'm interested in learning how the GC class works, and in particular what actions should I take to determine the source of a possible memory leak.
Are there any books on the subject, or is it not really that ellaborate?
There are plenty of sources you can study.
I hope you don't miss basics:
CLR via C# 3rd Edition by Jeffrey Richter
I think before you go with details about GC, try to understand how IDisposable and resource management is handled:
Dispose, Finalization, and Resource Management. It pretty old but still awesome.
GC specific:
Garbage Collection / Fundamentals of Garbage Collection
Maoni's WebLog (Maoni Stephens is a software developer who spends her time implementing .NET's GC. In fact, she's been working on the GC since the early days of .NET.)
Video: Maoni Stephens and Andrew Pardoe: CLR 4 Garbage Collector - Inside Background GC
Video: Erik Meijer and Patrick Dussud - Inside Garbage Collection
Drill Into .NET Framework Internals to See How the CLR Creates Runtime Objects
Identify And Prevent Memory Leaks In Managed Code
Hope it helps to start.
Not a book, but our team has used the ANTS Memory Profiler with pretty good success for tracking down managed memory leaks. Their support section and included help walks you through the process of tracking down different types of memory issues. This doesn't include specifics on the GC class itself, just how to track down common mistakes (event handler deregistration, static variables, etc.).
Also not a book, but decent article.
Memory Leak Detection in .NET
There is an excellent article by Rico Mariani: Tracking down managed memory leaks (how to find a GC leak). I used this technique often and is easy and efficient. And getting yourself familiar with a true debugger like Windbg is a bonus side benefit!
There is also the SciTech .NET Memory Profiler, our team has been using that successfully.
To complement the answers above, there are more recent videos on Channel9 with Maoni Stephens (Principal developer for the GC on the CLR team at Microsoft) that walk you through basics of GC, what developers should look out for, how they should troubleshoot, and some of the tools you can use. I found the explanation of how the GC works and the concept of generations and roots really useful.
Here is the first part of a 3 episode series :
http://channel9.msdn.com/Shows/Defrag-Tools/Defrag-Tools-33-CLR-GC-Part-1
I am hoping that someone can shed some light on how .NET handles garbage collection in the following case.
I have a program where I need to do a very specific kind of "Find in Files" functionality like you would see in Visual Studio. I have to search potentially thousands of files, and I collect the results in a List(Pair()) object, where Pair is a simple class I created for storing a pair of items (obviously).
When I am through using what I need, I call Clear() on the list in order to get rid of the old information. This does not seem to help free memory because I can see on my Task Manager that the memory consumed does not decrease.
For a really large search, I am potentially dealing with 5,000,000 lines of information (approx. 500MB of memory usage on my machine) that need to be handled. When my search is through, the memory consumed level stays the same. I made my Pair class implement IDisposable, and that didn't help.
Any idea what I might be missing? Thanks!
The garbage collection will clear memory when needed, that is, not when you "clear" the list, but when it finds out that none of the items that were referenced in it are referenced any more and when the process/computer is running out of memory.
There is no need to micromanage memory in C#.
The .NET Garbage Collector is surprisingly good. In general you shouldn't worry about the memory consumption you see in task manager because as you are observing, the garbage collector doesn't reclaim memory as soon as you would think. The reason for this is reclaiming memory is an expensive operation. If the memory isn't needed at that moment, why go messing around in there? The inner workings are of when it does go reclaiming space are pretty involved. There are different levels of collection the GC goes through (called Generations) to reclaim memory optimized for speed.
There are lots of articles which can explain this in more detail better than I can. Here is a starting point.
http://msdn.microsoft.com/en-us/library/ms973837.aspx
For now you should see at what point you end up getting out of memory exceptions, if at all, and go from there.
When you call Clear() all references to the Pair objects will be removed, this will cause those objects to be GC'ed eventually unless another object holds references to them, but you cannot count on when that will happen - it also depends on memory pressure.
As a side note you can use Tuple in C# 4 instead of Pair.
I understand that in a managed language like Java or C# there is this thing called a garbage collector that every once in a while checks to see if there are any object instances that are no longer referenced, and thus are completely orphaned, and clears then out of memory. But if two objects are not referenced by any variable in a program, but reference each other (like an event subscription), the garbage collector will see this reference and not clear the objects from memory.
Why is this? Why can't the garbage collector figure out that neither of the objects can be reference by any active part of the running program and dispose them.
Your presumption is incorrect. If the GC 'sees' (see below) a cyclic reference of 2 or more objects (during the 'Mark' phase) but are not referenced by any other objects or permanent GC handles (strong references), those objects will be collected (during the 'Sweep' phase).
An in-depth look at the CLR garbage collector can be found in this MSDN article and in this blog post.
Note: In fact, the GC doesn't even 'see' these types of objects during the mark phase since they are unreachable, and hence get collected during a sweep.
Most GCs don't work with reference counting anymore. They usually (and this is the case both in Java and .NET) work with reach-ability from the root set of objects. The root set of objects are globals and stack referenced instances. Anything reachable from that set directly or indirectly is alive. The rest of the memory is unreachable and thus prone to be collected.
This is a major drawback of traditional reference counting garbage collection. The property of a garbage collector describing this behavior is an incomplete collector. Other collectors largely fall into a category called tracing garbage collectors, which include traditional mark-sweep, semi-space/compacting, and generational hybrids, and don't suffer from these drawbacks (but face several others).
All of the JVM and CLI implementations I'm aware of use complete collectors, which means they don't suffer from the specific problem you are asking about here. To my knowledge, of those Jikes RVM is the only one supplying a reference counting collector (one of its many).
Another interesting thing to note is there are solutions to the completeness problem in reference counting garbage collection, and the resulting collectors demonstrate some interesting performance properties that are tough to get out of tracing collectors. Unfortunately, the highest performing reference-counting garbage collection algorithms and most completeness modifications rely on compiler assistance, so bringing them to C++'s shared_ptr<T> are difficult/not happening. Instead, we have weak_ptr<T> and documented rules (sorry about the sub-optimal link - apparently the documentation eludes me) about simply avoiding the problems. This isn't the first time (another mediocre link) we've seen this approach, and the hope is the extra work to prevent memory problems is less than the amount of work required to maintain code that doesn't use shared_ptr<T>, etc.
The mediocre links are because much of my reference material is scattered in notes from last semester's memory management class.
I would like to add that issues surrounding event subscription usually revolve around the fact that the subscriber & publisher have very different lifecycles.
Attach yourself e.g. to the App.Idle event in Windows Forms and your object will be kept alive for the remaining application lifetime. Why? That static App will hold a reference (albeit indirectly through a delegate) to your registered observer. Even though you may have disposed of your observer, it is still attached to App.Idle. You can construct many of those examples.
The other answers here are certainly correct; .NET does it's garbage collection based on reachability of an object.
What I wanted to add: I can recommend reading Understanding Garbage Collection in .NET (simple-talk article by Andrew Hunter) if you want a little more in-depth info.