In answering another question* on SO, and the subsequent comment discussion, I ran into a wall on a point that I'm not clear on.
Correct me on any point where I'm astray...
When the Garbage Collector collects an object, it calls that object's finalizer, on a separate thread (unless the finalizer has been suppressed, e.g. through a Dispose() method). While collecting, the GC suspends all threads except the thread that triggered the collection (background collection aside).
What isn't clear:
Does the Garbage Collector wait for the finalizer to execute on that object before collecting it?
If not, does it un-suspend threads while the finalizer is still executing?
If it does wait, what happens if the finalizer runs into a lock being held by one of the suspended threads? Does the finalizer thread deadlock? (In my answer, I argue that this is bad design, but I could possibly see cases where this could happen)
* Link to the original question:
.NET GC Accessing a synchronised object from a finalizer
Does the Garbage Collector wait for the finalizer to execute on that object before collecting it?
Your question is a bit ambiguous.
When the GC encounters a "dead" object that needs finalization, it abandons its attempt to reclaim the dead object's storage. Instead, it puts the object on a queue of "objects that I know need finalization" and treats that object as alive until the finalizer thread is done with it.
So, yes, the GC does "wait" until the finalizer is executed before reclaiming the storage. But it does not wait synchronously. It sounds like you're asking "does the GC synchronously call the finalizer right there?" No, it queues up the object to be finalized later and keeps on truckin'. The GC wants to quickly get through the task of releasing garbage and compacting memory so that the program proper can resume running ASAP. It's not going to stop to deal with some whiny object that is demanding attention before it gets cleaned up. It puts that object on a queue and says "be quiet and the finalizer thread will deal with you later."
Later on the GC will check the object again and say "are you still dead? And has your finalizer run?" If the answer is "yes" then the object gets reclaimed. (Remember, a finalizer might make a dead object back into a live one; try to never do that. Nothing pleasant happens as a result.)
Does it un-suspend threads while the finalizer is still executing?
I believe that the GC thaws out the threads that it froze, and signals the finalizer thread "hey, you've got work to do". So when the finalizer thread starts running, the threads that were frozen by the GC are starting up again.
There might have to be unfrozen threads because the finalizer might require a call to be marshalled to a user thread in order to release a thread-affinitized resource. Of course some of those user threads might be blocked or frozen; threads can always be blocked by something.
what happens if the finalizer runs into a lock being held by one of the suspended threads? Does the finalizer thread deadlock?
You betcha. There's nothing magic about the finalizer thread that prevents it from deadlocking. If a user thread is waiting on a lock taken out by the finalizer thread, and the finalizer thread is waiting on a lock taken out by the user thread, then you've got a deadlock.
Examples of finalizer thread deadlocks abound. Here's a good article on one such scenario, with a bunch of links to other scenarios:
http://blogs.microsoft.co.il/blogs/sasha/archive/2010/06/30/sta-objects-and-the-finalizer-thread-tale-of-a-deadlock.aspx
As the article states: finalizers are an extremely complex and dangerous cleanup mechanism and you should avoid them if you possibly can. It is incredibly easy to get a finalizer wrong and very hard to get it right.
Objects that contain a finalizer tend to live longer. When, during a collect, the GC marks an object with a finalizer as being garbage, it will not collect that object (yet). The GC will add that object to the finalizer queue that will run after the GC has finished. Consequence of this is that, because this object is not collected, it moves to the next generation (and with that, all objects it refers to).
The GC suspends all running threads. The finalizer thread on the other hand will run in the background while the application keeps running. The finalizer calls all finalize methods on all objects that are registered for finalization. After the finalizer method on an object has ran, the object will be removed from the queue, and from that point on the object (and possibly all objects it still references) is garbage. The next collection that cleans objects of the generation of that object will (at last) remove that object. Since objects that live in generation 2 are collected about 10 times as less as objects that live in generation 1, and gen 1 ten times as less as gen 0, it can take some time for such object is finally garbage collected.
Because the finalizer thread is just a simple thread that runs managed code (it calls the finalizers), it can block and even dead lock. Because of this it is important to do as little as possible in finalize methods. Because the finalizer is a background thread, a failing finalize method could even bring down the complete AppDomain (yuck!).
You could say that this design is unfortunate, but if you think about it, other designs where the framework cleans our mess effectively, are hard to imagine.
So, to answer your questions:
Yes, only after the object is removed from the finalizer queue, the object will be garbage and the GC will collect it.
The GC suspends all threads, even the finalizer queue.
The finalizer queue can deadlock. Lock as little as possible inside finalize methods.
It's simplest to think of the garbage collector as dividing objects into four groups:
Those which aren't reachable by any rooted object;
Those which are reachable from a list of live finalizable objects, but not from any other rooted object;
Those which are on the list of live finalizable objects, but are also reachable through some rooted object other than that list.
Those which are not on the list of live finalizable objects, but are reachable via some rooted object other than that list.
When the garbage-collector runs, objects of type #1 disappear. Objects of #2 get added to a list of objects needing imminent finalization and removed from the "live finalizable objects" list (thus becoming objects of category #4). Note that the list of objects needing finalization is a normal rooted reference, so objects on this list cannot be collected while they are on it, but if no other rooted reference is created by the time the finalizer is complete the object will move to category #1.
Related
I am reading the "Disposal and Garbage Collection" chapter of book C# 8.0 in a Nutshell. When it comes to finalizers, it says:
The GC identifies the unused objects for deletion, those without
finalizers are deleted immediately, those with pending finalizers are
kept alive and are put onto a special queue. When the garbage
collection is complete and your program continues executing, the
finalizer thread then starts running in parallel to the program,
picking objects off that special queue and running their finalization
methods.
Does this paragraph mean that an object waiting for finalization need to be collected by the GC again? I assumed it already been detected as garbage by GC, why does it need to be collected after finalization again?
Well, the objects were not 'collected' the first time. They were seen to need additional processing (finalizer code needs to run) and put on the finalization queue so they could be processed separately. This ends up putting them on the 'freachable' queue, which has now resurrected the object: it is now referenced by the freachable queue and is no longer eligible for collection. It will be unreachable after the finalizer actually executes and the object is removed from the freachable queue.
(This is how it used to work, not sure if things have changed in newer .NET versions, but I'm not aware of any.)
So the object is not really 'collected' more than once, if by 'collected' we understand that the memory was reclaimed. It does, however, need additional processing and will be re-evaluated by the GC again at a later point in time.
The GC works by traversing object graphs from GC roots. When the GC does a collection it checks for objects that have no references to it (and are therefore safe to free up).
A finalizer delays garbage collection of objects.
Why? Well the GC sees that an object is safe to be free'd up (not connected to a GC root). However, it can't free the memory if there's a finalizer that hasn't run yet.
So the GC marks the object as having a pending finalizer and does not free up that space on first pass. Nor does the GC run the finalizer at that instant (it puts it in a "pending finalizer" queue).
This is exactly why it's bad practice to use finalizers unless necessary. It delays collection. Some have a misconception that the GC runs the finalizer upon a collection pass. It does not.
When is it necessary? A good rule of thumb is if the objects references unmanaged memory (which is not handled by the GC) then you absolutely should use a finalizer to avoid memory leaks. If you're only referencing managed objects then don't.
If you do implement a finalizer I would also implement IDisposable, release any unmanaged resources on Dispose and stop the finalizer ever from running with GC.SuppressFinalize(this).
I am reading Jeffrey Richter's book "CLR via c#". It is quote from there:
Finalize methods are called at the completion of a garbage collection on objects that the GC
has determined to be garbage. This means that the memory for these objects cannot be reclaimed
right away because the Finalize method might execute code that accesses a field.
Because a finalizable object must survive the collection, it gets promoted to another generation, forcing the object
to live much longer than it should
It misled me a little bit. Why cannot finalizable object be reclaimed right away? I cannot understand argument that finalize method might execute code that accesses a field. What is problem? Moreover, I cannot understand why finalizable object should be moved to older generation and stored in separated queue (to be processed in other finalizer thread).
In my opinion the simplest way is to finalize object before removing at all without these additional actions.
Why cannot finalizable object be reclaimed right away? I cannot understand argument that finalize method might execute code that accesses a field. What is problem?
Because Finalize() is just a normal method of the object, so code in it might access any fields of the object.
When garbage collection happens, all threads are frozen.
The two points add up together to the fact that when gc is happening, it cannot execute the Finalize() method right away (All threads are paused during gc!!), while Finalize is expected to be invoked before object being collected.
All these above leads to the fact that garbage collection cannot kill the object immediately before its Finalize() method is invoked. So gc takes the object out from the "death list" (the object is now said to be resurrected), and put it to a queue called "Freachable" ("F" stands for finalization, "reachable" means all objects in it cannot be garbage collected now since gc only collects objects unreachable from roots).
After the gc finished, a special dedicated thread with high priority will take out each entry from the "Freachable" queue and invoke Finalize() method on it, which makes that object finally "garbage collectable", but of course, since the first gc has already ended before this Finalize() calling process, all the objects poped out from "Freachable" can now only be scheduled to next garbage collection.
Moreover, I cannot understand why finalizable object should be moved to older generation and stored in separated queue (to be processed in other finalizer thread).
To understand this, you need to first know the concept of the generation gc model. After objects are popped out from the "Freachable" queue and are again ready for garbage collection, they have been moved to older generation owing to the fact that they survive the previous one.
I think this quote says "finish your unit of work stuff and kill your instance. To kill your instance you should clean up your garbage collection because of your memory."
Ok, it's known that GC implicitly calls Finalize methods on objects when it identifies that object as garbage. But what happens if I do a GC.Collect()? Are the finalizers still executed? Someone asked me this and I answered a "Yes" and then I thought: "Was that fully correct?"
Ok, it's known that GC implicitly calls Finalize methods on objects when it identifies that object as garbage.
No no no. That is not known because in order to be knowledge a statement must be true. That statement is false. The garbage collector does not run finalizers as it traces, whether it runs itself or whether you call Collect. The finalizer thread runs finalizers after the tracing collector has found the garbage and that happens asynchronously with respect to a call to Collect. (If it happens at all, which it might not, as another answer points out.) That is, you cannot rely on the finalizer thread executing before control returns from Collect.
Here's an oversimplified sketch of how it works:
When a collection happens the garbage collector tracing thread traces the roots -- the objects known to be alive, and every object they refer to, and so on -- to determine the dead objects.
"Dead" objects that have pending finalizers are moved onto the finalizer queue. The finalizer queue is a root. Therefore those "dead" objects are actually still alive.
The finalizer thread, which is typically a different thread than the GC tracing thread, eventually runs and empties out the finalizer queue. Those objects then become truly dead, and are collected in the next collection on the tracing thread. (Of course, since they just survived the first collection, they might be in a higher generation.)
As I said, that's oversimplified; the exact details of how the finalizer queue works are a bit more complicated than that. But it gets enough of the idea across. The practical upshot here is that you cannot assume that calling Collect also runs finalizers, because it doesn't. Let me repeat that one more time: the tracing portion of the garbage collector does not run finalizers, and Collect only runs the tracing part of the collection mechanism.
Call the aptly named WaitForPendingFinalizers after calling Collect if you want to guarantee that all finalizers have run. That will pause the current thread until the finalizer thread gets around to emptying the queue. And if you want to ensure that those finalized objects have their memory reclaimed then you're going to have to call Collect a second time.
And of course, it goes without saying that you should only be doing this for debugging and testing purposes. Never do this nonsense in production code without a really, really good reason.
Actually the answer "It depends". Actually there is a dedicated thread that executes all finalizers. That means that call to GC.Collect only triggered this process and execution of all finalizers would be called asynchronously.
If you want to wait till all finalizers would be called you can use following trick:
GC.Collect();
// Waiting till finilizer thread will call all finalizers
GC.WaitForPendingFinalizers();
Yes, but not straight away. This excerpt is from Garbage Collection: Automatic Memory Management in the Microsoft .NET Framework (MSDN Magazine) (*)
"When an application creates a new object, the new operator allocates
the memory from the heap. If the object's type contains a Finalize
method, then a pointer to the object is placed on the finalization
queue. The finalization queue is an internal data structure controlled
by the garbage collector. Each entry in the queue points to an object
that should have its Finalize method called before the object's memory
can be reclaimed.
When a GC occurs ... the garbage collector scans the finalization
queue looking for pointers to these objects. When a pointer is found,
the pointer is removed from the finalization queue and appended to the
freachable queue (pronounced "F-reachable"). The freachable queue is
another internal data structure controlled by the garbage collector.
Each pointer in the freachable queue identifies an object that is
ready to have its Finalize method called.
There is a special runtime thread dedicated to calling Finalize
methods. When the freachable queue is empty (which is usually the
case), this thread sleeps. But when entries appear, this thread wakes,
removes each entry from the queue, and calls each object's Finalize
method. Because of this, you should not execute any code in a Finalize
method that makes any assumption about the thread that's executing the
code. For example, avoid accessing thread local storage in the
Finalize method."
(*) From November 2000, so things might have changed since.
When the garbage is collected (whether in response to memory pressure or GC.Collect()), the objects requiring finalization are put to finalization queue.
Unless you call GC.WaitForPendingFinalizers(), the finalizers may continue to execute in the background long after garbage collection has finished.
BTW, there is no guarantee finalizers will be called at all. From MSDN...
The Finalize method might not run to completion or might not run at
all in the following exceptional circumstances:
Another finalizer blocks indefinitely (goes into an infinite loop, tries to obtain a lock it can never obtain and so on). Because the
runtime attempts to run finalizers to completion, other finalizers
might not be called if a finalizer blocks indefinitely.
The process terminates without giving the runtime a chance to clean up. In this case, the runtime's first notification of process
termination is a DLL_PROCESS_DETACH notification.
The runtime continues to Finalize objects during shutdown only while
the number of finalizable objects continues to decrease.
Couple of more points are worth to state here.
Finalizer is the last point where .net objects can release unmanaged resources.
Finalizers are to be executed only if you don’t dispose your instances correctly. Ideally, finalizers should never be executed in many cases. Because proper dispose implementation should suppress the finalization.
Here is an example for correct IDispoable Implementation.
If you call the Dispose method of any disposable objects, it should clear all references and Supress the finalization. If there is any not so good developer who forget to call the Dispose method, Finalizer is the life saver.
This article says
If an object has a finalizer, it is not immediately removed when the
garbage collector decides it is no longer ‘live’. Instead, it becomes
a special kind of root until .NET has called the finalizer method.
This means that these objects usually require more than one garbage
collection to be removed from memory, as they will survive the first
time they are found to be unused.
My question is why GC don't call finalizer when it finds that object can't be referenced anymore and collect the object right away? why does it need more than on garbage collection?
Two points to consider:
The finalizer may take some time to complete. For example, it may end up closing a resource or something similar. You wouldn't want that to be part of the garbage collection time, which may be blocking threads from doing work (when they just want to get some memory). By running finalization separately, the GC itself can complete very quickly, and the finalization work can be done in parallel with other work later.
The finalizer may resurrect the object by making it visible again - but detecting that would (I suspect) require another sweep of memory anyway... so why not just wait until the next time it was going to happen?
Because (depending on the GC mode selected) when it is performing GC it has to pause key parts of the runtime. Hence you want this to be as quick as is possible. This creates two issues:
it doesn't know how long the finalizer will take to run (although it has a hard limit), and doesn't want to delay resuming the runtime
the runtime needs to be running for the finalizer to work reliably (even if a GC thread is used, the code you write could conceivably care about other threads)
To address both issues, those with pending finalizers are queued, and then executed after the GC has finished (when the runtime is working).
As a side-note, it is a good practice to combine finalizers with IDisposable and have the Dispose() cancel the finalization; that way it doesn't need finalization later, and is cleaned up in one step.
When the .net garbage-collector runs, objects are divided into three categories: objects which are reachable from a "normal" rooted reference, objects which are not reachable by any rooted reference, and objects which are not reachable by any "normal" rooted reference, but have either requested to receive notification when they are abandoned, or are reachable from other objects that have done so. The garbage collector makes a list of objects in that third category; that list is stored as a rooted reference, making all objects in it 'live'. The system goes through items in that list, though, cancels their 'notification' requests, runs their Finalize() method, and removes them from the list. If no reference to the object exists anywhere once all that is said and done, then the object will be declared "dead" on the next GC cycle.
As I understand, garbage collector in c# will put all objects of a class into finalization queue, as soon as I implement destructor of the class. When I was reading documentation for GC.Suppresfinalize, it mentions that object header already has a bit set for calling finalize.
I am wondering that why the implementers of GC had to put all objects in a queue, and delay the freeup of memory by 1-2 cycles. Could not they just look at the bit flag while releasing memory, then call finalize of the object and then release memory?
No doubt I am an idiot and I not able to understand the working of GC. I am posing this question just to improve my understanding or fill the missing gap in my knowledge
EDIT : If the bit flag is for suppressfinalize, GC implementers could have added another flag in object header for this purpose, no?
So it can run in a different thread and thus keep from blocking the main GC thread.
You can learn a lot about the GC from this MSDN article.
There is a great explanation here
What are the Finalizer Queue and Control+ThreadMethodEntry?
Essentially the reasoning is that it may not always be ideal for the GC to have to wait on finalizer code to execute, so queuing finalizers allows finalization to be deferred until a time when it's more convenient.
It's desirable for garbage collection pauses to be as short as possible. To that end, running finalizers is usually deferred to a later time, when the frantic work of garbage collection is done. It is instead done in the background on a separate thread.
#Jason: this is true for the f-reachable queue. But IMHO it does not explain why there is the finalization-queue itself.
My guess is that the finalization-queue is there to add another information that helps the GC to distinguish between all the possible states of an object life-cycle.
The finalization flag in the object's header says "the object needs to be finalized" or "the object does not need to be finalized" but it does not say if the finalization has already occurred.
But to be honest I don't grasp why it's needed in the current finalization process implementation.
Indeed, here is the naive workflow I imagine possible without the finalization-queue:
when creating the object, if it has a finalizer, the GC sets the finalization flag;
if later SupressFinalize is called then the flag is zeroed;
now let's jump to when the GC collects the object, which is not referenced from anywhere: if the finalization flag is set then the GC puts a reference to the object into the f-reachable queue and lets the finalization thread operates;
later the finalization thread dequeues the reference, resets the finalization flag and runs the finalizer;
if the object wants to be refinalized later it could ReRegisterForFinalize to set the finalization flag again;
later the GC collects the object again: if the finalization flag is not set it knows there is nothing to do and then frees the object memory;
if the finalization flag is set the GC enqueues again a reference to the object into the f-reachable queue and there we go again for another round;
at some point in time the object is happy, completes the finalization and is collected; or the app-domain or process is shutdown and memory is freed anyway.
So seems like in these scenarios there is no need for a finalization-queue, only the finalization flag is useful.
One possible reason would be that from a conceptual point of view there might be a rule like: "an object is collected if and only if it is not referenced from any root".
So not having a finalization queue, and basing the decision to collect an object on the object state itself, checking the finalization flag, is not compatible with this rule.
But really I don't think the GC implementation is based on the dogmatic application of such theoretical rules but only on pragmatic choices; so it's obvious I'm missing some key scenarios where the GC needs the finalization queue to know what to do when collecting an object, but which ones?
The garbage-collector does not identify and examine garbage, except perhaps when processing the Large Object Heap. Instead, its behavior is like a that of a bowling-alley pinsetter removing deadwood between throws: the pinsetter grabs all the pins that are still standing, lifts them off the surface of the lane, and then runs the sweeper bar across the lane without regard for how many pins are on that surface. Sweeping out memory wholesale is much faster than identifying individual objects to be deleted. If 1% of objects have finalizers (the real number's probably even less), then it would be necessary to examine 100 object headers to find each finalizable object. Having a separate list of objects which have finalizers makes it unnecessary for the GC to even look at any garbage objects that don't.