Why classes with finalizers need more than one garbage collection cycle? - c#

This article says
If an object has a finalizer, it is not immediately removed when the
garbage collector decides it is no longer ‘live’. Instead, it becomes
a special kind of root until .NET has called the finalizer method.
This means that these objects usually require more than one garbage
collection to be removed from memory, as they will survive the first
time they are found to be unused.
My question is why GC don't call finalizer when it finds that object can't be referenced anymore and collect the object right away? why does it need more than on garbage collection?

Two points to consider:
The finalizer may take some time to complete. For example, it may end up closing a resource or something similar. You wouldn't want that to be part of the garbage collection time, which may be blocking threads from doing work (when they just want to get some memory). By running finalization separately, the GC itself can complete very quickly, and the finalization work can be done in parallel with other work later.
The finalizer may resurrect the object by making it visible again - but detecting that would (I suspect) require another sweep of memory anyway... so why not just wait until the next time it was going to happen?

Because (depending on the GC mode selected) when it is performing GC it has to pause key parts of the runtime. Hence you want this to be as quick as is possible. This creates two issues:
it doesn't know how long the finalizer will take to run (although it has a hard limit), and doesn't want to delay resuming the runtime
the runtime needs to be running for the finalizer to work reliably (even if a GC thread is used, the code you write could conceivably care about other threads)
To address both issues, those with pending finalizers are queued, and then executed after the GC has finished (when the runtime is working).
As a side-note, it is a good practice to combine finalizers with IDisposable and have the Dispose() cancel the finalization; that way it doesn't need finalization later, and is cleaned up in one step.

When the .net garbage-collector runs, objects are divided into three categories: objects which are reachable from a "normal" rooted reference, objects which are not reachable by any rooted reference, and objects which are not reachable by any "normal" rooted reference, but have either requested to receive notification when they are abandoned, or are reachable from other objects that have done so. The garbage collector makes a list of objects in that third category; that list is stored as a rooted reference, making all objects in it 'live'. The system goes through items in that list, though, cancels their 'notification' requests, runs their Finalize() method, and removes them from the list. If no reference to the object exists anywhere once all that is said and done, then the object will be declared "dead" on the next GC cycle.

Related

Does an object with a pending finalizer need to be collected by the GC more than one time?

I am reading the "Disposal and Garbage Collection" chapter of book C# 8.0 in a Nutshell. When it comes to finalizers, it says:
The GC identifies the unused objects for deletion, those without
finalizers are deleted immediately, those with pending finalizers are
kept alive and are put onto a special queue. When the garbage
collection is complete and your program continues executing, the
finalizer thread then starts running in parallel to the program,
picking objects off that special queue and running their finalization
methods.
Does this paragraph mean that an object waiting for finalization need to be collected by the GC again? I assumed it already been detected as garbage by GC, why does it need to be collected after finalization again?
Well, the objects were not 'collected' the first time. They were seen to need additional processing (finalizer code needs to run) and put on the finalization queue so they could be processed separately. This ends up putting them on the 'freachable' queue, which has now resurrected the object: it is now referenced by the freachable queue and is no longer eligible for collection. It will be unreachable after the finalizer actually executes and the object is removed from the freachable queue.
(This is how it used to work, not sure if things have changed in newer .NET versions, but I'm not aware of any.)
So the object is not really 'collected' more than once, if by 'collected' we understand that the memory was reclaimed. It does, however, need additional processing and will be re-evaluated by the GC again at a later point in time.
The GC works by traversing object graphs from GC roots. When the GC does a collection it checks for objects that have no references to it (and are therefore safe to free up).
A finalizer delays garbage collection of objects.
Why? Well the GC sees that an object is safe to be free'd up (not connected to a GC root). However, it can't free the memory if there's a finalizer that hasn't run yet.
So the GC marks the object as having a pending finalizer and does not free up that space on first pass. Nor does the GC run the finalizer at that instant (it puts it in a "pending finalizer" queue).
This is exactly why it's bad practice to use finalizers unless necessary. It delays collection. Some have a misconception that the GC runs the finalizer upon a collection pass. It does not.
When is it necessary? A good rule of thumb is if the objects references unmanaged memory (which is not handled by the GC) then you absolutely should use a finalizer to avoid memory leaks. If you're only referencing managed objects then don't.
If you do implement a finalizer I would also implement IDisposable, release any unmanaged resources on Dispose and stop the finalizer ever from running with GC.SuppressFinalize(this).

Why does garbage collector process finalizable objects separately?

I am reading Jeffrey Richter's book "CLR via c#". It is quote from there:
Finalize methods are called at the completion of a garbage collection on objects that the GC
has determined to be garbage. This means that the memory for these objects cannot be reclaimed
right away because the Finalize method might execute code that accesses a field.
Because a finalizable object must survive the collection, it gets promoted to another generation, forcing the object
to live much longer than it should
It misled me a little bit. Why cannot finalizable object be reclaimed right away? I cannot understand argument that finalize method might execute code that accesses a field. What is problem? Moreover, I cannot understand why finalizable object should be moved to older generation and stored in separated queue (to be processed in other finalizer thread).
In my opinion the simplest way is to finalize object before removing at all without these additional actions.
Why cannot finalizable object be reclaimed right away? I cannot understand argument that finalize method might execute code that accesses a field. What is problem?
Because Finalize() is just a normal method of the object, so code in it might access any fields of the object.
When garbage collection happens, all threads are frozen.
The two points add up together to the fact that when gc is happening, it cannot execute the Finalize() method right away (All threads are paused during gc!!), while Finalize is expected to be invoked before object being collected.
All these above leads to the fact that garbage collection cannot kill the object immediately before its Finalize() method is invoked. So gc takes the object out from the "death list" (the object is now said to be resurrected), and put it to a queue called "Freachable" ("F" stands for finalization, "reachable" means all objects in it cannot be garbage collected now since gc only collects objects unreachable from roots).
After the gc finished, a special dedicated thread with high priority will take out each entry from the "Freachable" queue and invoke Finalize() method on it, which makes that object finally "garbage collectable", but of course, since the first gc has already ended before this Finalize() calling process, all the objects poped out from "Freachable" can now only be scheduled to next garbage collection.
Moreover, I cannot understand why finalizable object should be moved to older generation and stored in separated queue (to be processed in other finalizer thread).
To understand this, you need to first know the concept of the generation gc model. After objects are popped out from the "Freachable" queue and are again ready for garbage collection, they have been moved to older generation owing to the fact that they survive the previous one.
I think this quote says "finish your unit of work stuff and kill your instance. To kill your instance you should clean up your garbage collection because of your memory."

GC.Collect() and Finalize

Ok, it's known that GC implicitly calls Finalize methods on objects when it identifies that object as garbage. But what happens if I do a GC.Collect()? Are the finalizers still executed? Someone asked me this and I answered a "Yes" and then I thought: "Was that fully correct?"
Ok, it's known that GC implicitly calls Finalize methods on objects when it identifies that object as garbage.
No no no. That is not known because in order to be knowledge a statement must be true. That statement is false. The garbage collector does not run finalizers as it traces, whether it runs itself or whether you call Collect. The finalizer thread runs finalizers after the tracing collector has found the garbage and that happens asynchronously with respect to a call to Collect. (If it happens at all, which it might not, as another answer points out.) That is, you cannot rely on the finalizer thread executing before control returns from Collect.
Here's an oversimplified sketch of how it works:
When a collection happens the garbage collector tracing thread traces the roots -- the objects known to be alive, and every object they refer to, and so on -- to determine the dead objects.
"Dead" objects that have pending finalizers are moved onto the finalizer queue. The finalizer queue is a root. Therefore those "dead" objects are actually still alive.
The finalizer thread, which is typically a different thread than the GC tracing thread, eventually runs and empties out the finalizer queue. Those objects then become truly dead, and are collected in the next collection on the tracing thread. (Of course, since they just survived the first collection, they might be in a higher generation.)
As I said, that's oversimplified; the exact details of how the finalizer queue works are a bit more complicated than that. But it gets enough of the idea across. The practical upshot here is that you cannot assume that calling Collect also runs finalizers, because it doesn't. Let me repeat that one more time: the tracing portion of the garbage collector does not run finalizers, and Collect only runs the tracing part of the collection mechanism.
Call the aptly named WaitForPendingFinalizers after calling Collect if you want to guarantee that all finalizers have run. That will pause the current thread until the finalizer thread gets around to emptying the queue. And if you want to ensure that those finalized objects have their memory reclaimed then you're going to have to call Collect a second time.
And of course, it goes without saying that you should only be doing this for debugging and testing purposes. Never do this nonsense in production code without a really, really good reason.
Actually the answer "It depends". Actually there is a dedicated thread that executes all finalizers. That means that call to GC.Collect only triggered this process and execution of all finalizers would be called asynchronously.
If you want to wait till all finalizers would be called you can use following trick:
GC.Collect();
// Waiting till finilizer thread will call all finalizers
GC.WaitForPendingFinalizers();
Yes, but not straight away. This excerpt is from Garbage Collection: Automatic Memory Management in the Microsoft .NET Framework (MSDN Magazine) (*)
"When an application creates a new object, the new operator allocates
the memory from the heap. If the object's type contains a Finalize
method, then a pointer to the object is placed on the finalization
queue. The finalization queue is an internal data structure controlled
by the garbage collector. Each entry in the queue points to an object
that should have its Finalize method called before the object's memory
can be reclaimed.
When a GC occurs ... the garbage collector scans the finalization
queue looking for pointers to these objects. When a pointer is found,
the pointer is removed from the finalization queue and appended to the
freachable queue (pronounced "F-reachable"). The freachable queue is
another internal data structure controlled by the garbage collector.
Each pointer in the freachable queue identifies an object that is
ready to have its Finalize method called.
There is a special runtime thread dedicated to calling Finalize
methods. When the freachable queue is empty (which is usually the
case), this thread sleeps. But when entries appear, this thread wakes,
removes each entry from the queue, and calls each object's Finalize
method. Because of this, you should not execute any code in a Finalize
method that makes any assumption about the thread that's executing the
code. For example, avoid accessing thread local storage in the
Finalize method."
(*) From November 2000, so things might have changed since.
When the garbage is collected (whether in response to memory pressure or GC.Collect()), the objects requiring finalization are put to finalization queue.
Unless you call GC.WaitForPendingFinalizers(), the finalizers may continue to execute in the background long after garbage collection has finished.
BTW, there is no guarantee finalizers will be called at all. From MSDN...
The Finalize method might not run to completion or might not run at
all in the following exceptional circumstances:
Another finalizer blocks indefinitely (goes into an infinite loop, tries to obtain a lock it can never obtain and so on). Because the
runtime attempts to run finalizers to completion, other finalizers
might not be called if a finalizer blocks indefinitely.
The process terminates without giving the runtime a chance to clean up. In this case, the runtime's first notification of process
termination is a DLL_PROCESS_DETACH notification.
The runtime continues to Finalize objects during shutdown only while
the number of finalizable objects continues to decrease.
Couple of more points are worth to state here.
Finalizer is the last point where .net objects can release unmanaged resources.
Finalizers are to be executed only if you don’t dispose your instances correctly. Ideally, finalizers should never be executed in many cases. Because proper dispose implementation should suppress the finalization.
Here is an example for correct IDispoable Implementation.
If you call the Dispose method of any disposable objects, it should clear all references and Supress the finalization. If there is any not so good developer who forget to call the Dispose method, Finalizer is the life saver.

Why does GC put objects in finalization queue?

As I understand, garbage collector in c# will put all objects of a class into finalization queue, as soon as I implement destructor of the class. When I was reading documentation for GC.Suppresfinalize, it mentions that object header already has a bit set for calling finalize.
I am wondering that why the implementers of GC had to put all objects in a queue, and delay the freeup of memory by 1-2 cycles. Could not they just look at the bit flag while releasing memory, then call finalize of the object and then release memory?
No doubt I am an idiot and I not able to understand the working of GC. I am posing this question just to improve my understanding or fill the missing gap in my knowledge
EDIT : If the bit flag is for suppressfinalize, GC implementers could have added another flag in object header for this purpose, no?
So it can run in a different thread and thus keep from blocking the main GC thread.
You can learn a lot about the GC from this MSDN article.
There is a great explanation here
What are the Finalizer Queue and Control+ThreadMethodEntry?
Essentially the reasoning is that it may not always be ideal for the GC to have to wait on finalizer code to execute, so queuing finalizers allows finalization to be deferred until a time when it's more convenient.
It's desirable for garbage collection pauses to be as short as possible. To that end, running finalizers is usually deferred to a later time, when the frantic work of garbage collection is done. It is instead done in the background on a separate thread.
#Jason: this is true for the f-reachable queue. But IMHO it does not explain why there is the finalization-queue itself.
My guess is that the finalization-queue is there to add another information that helps the GC to distinguish between all the possible states of an object life-cycle.
The finalization flag in the object's header says "the object needs to be finalized" or "the object does not need to be finalized" but it does not say if the finalization has already occurred.
But to be honest I don't grasp why it's needed in the current finalization process implementation.
Indeed, here is the naive workflow I imagine possible without the finalization-queue:
when creating the object, if it has a finalizer, the GC sets the finalization flag;
if later SupressFinalize is called then the flag is zeroed;
now let's jump to when the GC collects the object, which is not referenced from anywhere: if the finalization flag is set then the GC puts a reference to the object into the f-reachable queue and lets the finalization thread operates;
later the finalization thread dequeues the reference, resets the finalization flag and runs the finalizer;
if the object wants to be refinalized later it could ReRegisterForFinalize to set the finalization flag again;
later the GC collects the object again: if the finalization flag is not set it knows there is nothing to do and then frees the object memory;
if the finalization flag is set the GC enqueues again a reference to the object into the f-reachable queue and there we go again for another round;
at some point in time the object is happy, completes the finalization and is collected; or the app-domain or process is shutdown and memory is freed anyway.
So seems like in these scenarios there is no need for a finalization-queue, only the finalization flag is useful.
One possible reason would be that from a conceptual point of view there might be a rule like: "an object is collected if and only if it is not referenced from any root".
So not having a finalization queue, and basing the decision to collect an object on the object state itself, checking the finalization flag, is not compatible with this rule.
But really I don't think the GC implementation is based on the dogmatic application of such theoretical rules but only on pragmatic choices; so it's obvious I'm missing some key scenarios where the GC needs the finalization queue to know what to do when collecting an object, but which ones?
The garbage-collector does not identify and examine garbage, except perhaps when processing the Large Object Heap. Instead, its behavior is like a that of a bowling-alley pinsetter removing deadwood between throws: the pinsetter grabs all the pins that are still standing, lifts them off the surface of the lane, and then runs the sweeper bar across the lane without regard for how many pins are on that surface. Sweeping out memory wholesale is much faster than identifying individual objects to be deleted. If 1% of objects have finalizers (the real number's probably even less), then it would be necessary to examine 100 object headers to find each finalizable object. Having a separate list of objects which have finalizers makes it unnecessary for the GC to even look at any garbage objects that don't.

Garbage collection Guarantees

What guarantees are the for the garbage collector?
From my research I have managed to find:
If there is still a reference to the memory it will not be garbage collected
If there is no reference:
When it is GC is non deterministic
When the GC kicks in the finalizer will be run before memory is released.
There is no guarantee about the order of Finalizers (so do not assume parent will be run before child).
But what I really want to know is:
Is there a guarantee that all memory will eventually be garbage collected and the finalizer (destructor) run on the object (assuming the program exited nicely). For example an application with no memory pressure when it eventually exits will it force the GC to go find all objects and make sure the finalizer (destructor) is called (including static member variables)?
I did find a quote on this page:
http://www.c-sharpcorner.com/UploadFile/tkagarwal/MemoryManagementInNet11232005064832AM/MemoryManagementInNet.aspx
In addition, by default, Finalize methods are not called for unreachable objects when an application exits so that the application may terminate quickly.
But I am not sure how authoritative this quote is.
I also found documentation on:
CriticalFinalizerObject
Is there a guarantee that all memory
will eventually be garbage collected
and the finalizer (destructor) run on
the object (assuming the program
exited nicely).
No. From the Object.Finalize documentation it is clear that finalizers may not be invoked if
Some other finalizers don't finish properly:
Another finalizer blocks indefinitely
(goes into an infinite loop, tries to
obtain a lock it can never obtain and
so on). Because the runtime attempts
to run finalizers to completion, other
finalizers might not be called if a
finalizer blocks indefinitely.
Some other finalizers create more
finalizable objects, making it
impossible to finalize all
finalizable objects:
The runtime continues to Finalize
objects during shutdown only while the
number of finalizable objects
continues to decrease.
Some other finalizers throw exceptions:
If Finalize or an override of Finalize
throws an exception, and the runtime
is not hosted by an application that
overrides the default policy, the
runtime terminates the process and no
active try-finally blocks or
finalizers are executed. This behavior
ensures process integrity if the
finalizer cannot free or destroy
resources.
That being said, there are more reasons why you wouldn't want to use finalizers unless strictly necessary.
They slow down the garbage collector
(even making it possible to slow it
down so much that memory is not
reclaimed as fast as it is used up).
They run on another thread, bringing
multi-threading issues into play.
They're not executed in a
deterministic order.
They can resurrect objects which were
already finalized (and which won't be
finalized again unless explicitly
re-registered for finalization).
The only time you should write a finalizer is when you are building a type to handle a new kind of unmanaged resource. For example, a data access layer that uses Sql Server in a business app doesn't need a finalizer anywhere, even though there are unmanaged database connections involved, because the basic SqlConnection class will already finalize those connections if needed. But if you're building a brand new database engine from scratch that has connection limits similar to sql server's and are implementing the ado.net provider for it, that connection type should implement a finalizer to be as sure as possible that your connections are released.
But you don't get any guarantees beyond what happens when a process ends.
Update:
Given this context:
I am having a discussion with a collegue over a code review I did of his code. He insists that the destructor is guranteed to be called on an object. I disagree (but am not sure) and would prefer the use of IDisposable.
You are right to criticize the use of a destructor/finalizer. As I said above, you should only use them when working with an unmanaged resource that is genuinely new. Not just that instance of the resource, but the kind of resource you are working with.
For code that wraps "normal" unmanaged resources (things like SqlConnection, TcpClient, etc), IDisposable is a better choice. Then you know the resource will be cleaned up as soon as Dispose() is called rather than needing to wait for the type to be collected. If no one calls Dispose() (which is likely your colleague's concern), by the time your new type can be collected the instance of the original type for the unmanaged resource you are wrapping should be able to be collected as well, and it's finalizer will release the resource.
The main thing you need to bring to the table is that the finalizer cannot be called until the object is collected. You have to wait on the garbage collector, meaning you may be holding the resource open even longer. IDisposable allows you to release it right away. Of course you could do both, but that doesn't sound like what's going on here, and if you do have both you have to be careful not to conflict with the original type's finalizer or you could cause unwanted and harmful exceptions. And really, your own finalizer implementation is just redundant here and adds needless complexity.
Finally, I have to take issue with this statement:
If there is still a reference to the memory it will not be garbage collected
There can be references to an object and it will still be collected. What matters is if the object is reachable: are any of the references to the object rooted. For example, you may have a list with several objects in it. The list goes out of scope. Obviously there is still a reference to all of the objects in the list, but they can still all be collected in the first pass of the GC because the reference is no longer rooted.
1.6.7.6 of the Spec says:
1.6.7.6 Destructors
A destructor is a member that implements the actions
required to destruct an instance of a
class. Destructors cannot have
parameters, they cannot have
accessibility modifiers, and they
cannot be invoked explicitly. The
destructor for an instance is invoked
automatically during garbage
collection.
The garbage collector is
allowed wide latitude in deciding when
to collect objects and run
destructors. Specifically, the timing
of destructor invocations is not
deterministic, and destructors may be
executed on any thread. For these and
other reasons, classes should
implement destructors only when no
other solutions are feasible.
The
using statement provides a better
approach to object destruction.
So no, it's not guaranteed they are called.
The only time that a finalizer won't be invoked at all is if an AppDomain is forcibly unloaded.
In general, you don't need to worry about it.
There is no guarantee.
There might be a guarantee if your process terminates nicely for some definition of nicely. But there are so many things not nice that can happen:
power failure
process terminated in a 'hard' or 'forced' way
unmanaged thread throwing calling OS exit() function or throwing an exception
call to System.Environment.FailFast, which does:
MSDN: "Terminates a process but does not execute any active try-finally blocks or finalizers."

Categories

Resources