How to properly clean up interop objects in C#

How to properly clean up interop objects in C# - c#

This is a follow on question to
How to properly clean up excel interop objects in c#.
The gyst is that using a chaining calls together (eg. ExcelObject.Foo.Bar() ) within the Excel namespace prevents garbage collection for COM objects. Instead, one should explicitly create a reference to each COM object used, and explicitly release them using Marhsal.ReleaseComObject().
Is the behaviour of not releasing COM objects after a chained call specific to Excel COM objects only? Is it overkill to apply this kind of pattern whenever a COM object is in use?

It's definitely more important to handle releases properly when dealing with Office applications than many other COM libraries, for two reasons.
The Office apps run as out of process servers, not in-proc libraries. If you fail to clean up properly, you leave a process running.
Office apps (especially Excel IIRC) don't terminate properly even if you call Application.Quit, if there are outstanding references to its COM objects.
For regular, in-proc COM libraries, the consequences of failing to clean up properly are not so dramatic. When your process exits, all in-proc libraries goes away with it. And if you forget to call ReleaseComObject on an object when you no longer need it, it will still be taken care of eventually when the object gets finalized.
That said, this is no excuse to write sloppy code.

COM objects are essentially unmanaged code - and as soon as you start calling unmanaged code from a managed application, it becomes your responsibility to clean up after that unmanaged code.
In short, the pattern linked in the above post is necessary for all COM objects.

Related

As of today, what is the right way to work with COM objects? [duplicate]

This question already has answers here:
Clean up Excel Interop Objects with IDisposable
(2 answers)
Closed 3 years ago.
This is a very common question and I decided to ask it because this question may have a different answer as of today. Hopefully, the answers will help to understand what is the right way to work with COM objects.
Personally, I feel very confuse after getting different opinions on this subject.
The last 5 years, I used to work with COM objects and the rules were very clear for me:
Use a single period in lines of code. Using more than one period create temporary objects behind the scene that cannot be explictly released.
Do not use foreach, use a for loop instead and release each item on each iteration
Do not call FInalReleaseComObject, use ReleaseComObject instead.
Do not use GC for releasing COM objects. GC intent is mainly for debugging usage.
Release objects in reverse order of their creation.
Some of you may be frustrated after reading those last lines, this is what I knew about how to properly create/release Com Object, I hope getting answers that will make it clearer and uncontested.
Following, are some links I found on this topic. Some of them telling that it is needed to call ReleaseComObject and some of them not.
How to properly release Excel COM objects (Nov. 2013)
Proper Way of Releasing COM Objects in .NET (Aug. 2011)
Marshal.ReleaseComObject Considered Dangerous (Mar. 2010)
ReleaseCOMObject (Apr. 2004)
"... In VSTO scenarios, you typically don’t ever have to use ReleaseCOMObject. ..."
MSDN - Marshal.ReleaseComObject Method (current .NET Framework version):
"...You should use this method to free the underlying COM object that holds references..."
UPDATE:
This question has been marked as too broad. As requested, I will try to simplify and ask simpler questions.
Does ReleaseComObject is required when working with COM Objects or calling GC is the correct way?
Does VSTO approach change the way we used to work with COM Objects?
Which of the above rules I wrote are required and which are wrong? Is there any others?

The .NET / COM interop is well designed, and works correctly. In particular, the .NET Garbage Collector correctly tracks COM references, and will correctly release COM objects when they have no remaining runtime references. Interfering with the reference counts of COM object by calling Marshal.ReleaseComObject(...) or Marshal.FinalReleaseComObject(...) is a dangerous but common anti-pattern. Unfortunately, some of the bad advice came out of Microsoft.
Your .NET code can correctly interact with COM while ignoring all 5 of your rules.
If you do need to trigger deterministic clean-up of COM objects that are no longer referenced from the runtime, you can safely force a GC (and possibly wait for finalizers to complete). Otherwise, you don't have to do anything special in your code, to deal with COM objects.
There is one important caveat, that might have contributed to confusion about role of the garbage collector. When debugging .NET code, local variables artificially have their lifetime extended to the end of the method, in order to support watching the variabled under the debugger. That means you might still have managed references to a COM object (and hence the GC won't clean up) later than expect form just looking at the code. A good workaround for this issue (which only occurs under the debugger) is to split the scope of COM calls from the GC cleanup calls.
As an example, here is some C# code that interacts with Excel, and cleans up properly. You can paste into a Console application (just add a reference to Microsoft.Office.Interop.Excel):
using System;
using System.Runtime.InteropServices;
using Microsoft.Office.Interop.Excel;
namespace TestCsCom
{
class Program
{
static void Main(string[] args)
{
// NOTE: Don't call Excel objects in here...
// Debugger would keep alive until end, preventing GC cleanup
// Call a separate function that talks to Excel
DoTheWork();
// Now let the GC clean up (repeat, until no more)
do
{
GC.Collect();
GC.WaitForPendingFinalizers();
}
while (Marshal.AreComObjectsAvailableForCleanup());
}
static void DoTheWork()
{
Application app = new Application();
Workbook book = app.Workbooks.Add();
Worksheet worksheet = book.Worksheets["Sheet1"];
app.Visible = true;
for (int i = 1; i <= 10; i++) {
worksheet.Cells.Range["A" + i].Value = "Hello";
}
book.Save();
book.Close();
app.Quit();
// NOTE: No calls the Marshal.ReleaseComObject() are ever needed
}
}
}
You'll see that the Excel process properly shuts down, indicating that all the COM objects were properly cleaned up.
VSTO does not change any of these issues - it is just a .NET library that wraps and extends the native Office COM object model.
There is a lot of false information and confusion about this issue, including many posts on MSDN and on StackOverflow.
What finally convinced me to have a closer look and figure out the right advice was this post https://blogs.msdn.microsoft.com/visualstudio/2010/03/01/marshal-releasecomobject-considered-dangerous/ together with finding the issue with references kept alive under the debugger on a StackOverflow answer.
One exception to this general guidance is when the COM object model requires interfaces to be released in a particular order. The GC approach described here does not give you control over the order in which the COM objects are released by the GC.
I don't have any reference to indicate whether this would violate the COM contract. In general, I would expect COM hierarchies to use internal references to ensure any dependencies on the sequence are properly managed. E.g. in the case of Excel, one would expect a Range object to keep an internal reference to the parent Worksheet object, so that a user of the object model need not explicitly keep both alive.
There may be cases where even the Office applications are sensitive to the sequence in which COM objects are released. One case seems to be when the OLE embedding is used - see https://blogs.msdn.microsoft.com/vsofficedeveloper/2008/04/11/excel-ole-embedding-errors-if-you-have-managed-add-in-sinking-application-events-in-excel-2/
So it would be possible to create a COM object model that fails if objects are released in the wrong sequence, and the interop with such a COM model would then require some more care, and might need manual interference with the references.
But for general interop with the Office COM object models, I agree with the VS blog post calling "Marshal.ReleaseComObject – a problem disguised as a solution".

C# - Fundamental Misunderstanding about Garbage Collection and Managed and Unmanaged Resources [duplicate]

I want to know about unmanaged resources.
Can anyone please give me a basic idea?

Managed resources basically means "managed memory" that is managed by the garbage collector. When you no longer have any references to a managed object (which uses managed memory), the garbage collector will (eventually) release that memory for you.
Unmanaged resources are then everything that the garbage collector does not know about. For example:
Open files
Open network connections
Unmanaged memory
In XNA: vertex buffers, index buffers, textures, etc.
Normally you want to release those unmanaged resources before you lose all the references you have to the object managing them. You do this by calling Dispose on that object, or (in C#) using the using statement which will handle calling Dispose for you.
If you neglect to Dispose of your unmanaged resources correctly, the garbage collector will eventually handle it for you when the object containing that resource is garbage collected (this is "finalization"). But because the garbage collector doesn't know about the unmanaged resources, it can't tell how badly it needs to release them - so it's possible for your program to perform poorly or run out of resources entirely.
If you implement a class yourself that handles unmanaged resources, it is up to you to implement Dispose and Finalize correctly.

Some users rank open files, db connections, allocated memory, bitmaps, file streams etc. among managed resources, others among unmanaged. So are they managed or unmanaged?
My opinion is, that the response is more complex: When you open file in .NET, you probably use some built-in .NET class System.IO.File, FileStream or something else. Because it is a normal .NET class, it is managed. But it is a wrapper, which inside does the "dirty work" (communicates with the operating system using Win32 dlls, calling low level functions or even assembler instructions) which really open the file. And this is, what .NET doesn't know about, unmanaged.
But you perhaps can open the file by yourself using assembler instructions and bypass .NET file functions. Then the handle and the open file are unmanaged resources.
The same with the DB: If you use some DB assembly, you have classes like DbConnection etc., they are known to .NET and managed. But they wrap the "dirty work", which is unmanaged (allocate memory on server, establish connection with it, ...).
If you don't use this wrapper class and open some network socket by yourself and communicate with your own strange database using some commands, it is unmanaged.
These wrapper classes (File, DbConnection etc.) are managed, but they inside use unmanaged resources the same way like you, if you don't use the wrappers and do the "dirty work" by yourself. And therefore these wrappers DO implement Dispose/Finalize patterns. It is their responsibility to allow programmer to release unmanaged resources when the wrapper is not needed anymore, and to release them when the wrapper is garbage collected. The wrapper will be correctly garbage collected by garbage collector, but the unmanaged resources inside will be collected by using the Dispose/Finalize pattern.
If you don't use built-in .NET or 3rd party wrapper classes and open files by some assembler instructions etc. in your class, these open files are unmanaged and you MUST implement dispose/finalise pattern. If you don't, there will be memory leak, forever locked resource etc. even when you don't use it anymore (file operation complete) or even after you application terminates.
But your responsibility is also when using these wrappers. For those, which implement dispose/finalise (you recognize them, that they implement IDisposable), implement also your dispose/finalise pattern and Dispose even these wrappers or give them signal to release their unmanaged resources. If you don't, the resources will be after some indefinite time released, but it is clean to release it immediately (close the file immediately and not leaving it open and blocked for random several minutes/hours). So in your class's Dispose method you call Dispose methods of all your used wrappers.

Unmanaged resources are those that run outside the .NET runtime (CLR)(aka non-.NET code.) For example, a call to a DLL in the Win32 API, or a call to a .dll written in C++.

An "unmanaged resource" is not a thing, but a responsibility. If an object owns an unmanaged resource, that means that (1) some entity outside it has been manipulated in a way that may cause problems if not cleaned up, and (2) the object has the information necessary to perform such cleanup and is responsible for doing it.
Although many types of unmanaged resources are very strongly associated with various type of operating-system entities (files, GDI handles, allocated memory blocks, etc.) there is no single type of entity which is shared by all of them other than the responsibility of cleanup. Typically, if an object either has a responsibility to perform cleanup, it will have a Dispose method which instructs it to carry out all cleanup for which it is responsible.
In some cases, objects will make allowances for the possibility that they might be abandoned without anyone having called Dispose first. The GC allows objects to request notification that they've been abandoned (by calling a routine called Finalize), and objects may use this notification to perform cleanup themselves.
Terms like "managed resource" and "unmanaged resource" are, unfortunately, used by different people to mean different things; frankly think it's more useful to think in terms of objects as either not having any cleanup responsibility, having cleanup responsibility that will only be taken care of if Dispose is called, or having cleanup responsibility which should be taken care of via Dispose, but which can also be taken care of by Finalize.

The basic difference between a managed and unmanaged resource is that the
garbage collector knows about all managed resources, at some point in time
the GC will come along and clean up all the memory and resources associated
with a managed object. The GC does not know about unmanaged resources, such
as files, stream and handles, so if you do not clean them up explicitly in
your code then you will end up with memory leaks and locked resources.
Stolen from here, feel free to read the entire post.

Any resource for which memory is allocated in the .NET managed heap is a Managed resource. CLR is completly aware of this sort of memory and will do everything to make sure that it doesn't go orphaned. Anything else is unmanaged. For example interoping with COM, might create objects in the proces memory space, but CLR will not take care of it. In this case the managed object that makes calls across the managed boundry should own the responsibility for anything beyond it.

Let us first understand how VB6 or C++ programs (Non Dotnet applications) used to execute.
We know that computers only understand machine level code. Machine level code is also called as native or binary code. So, when we execute a VB6 or C++ program, the respective language compiler, compiles the respective language source code into native code, which can then be understood by the underlying operating system and hardware.
Native code (Unmanaged Code) is specific (native) to the operating system on which it is generated. If you take this compiled native code and try to run on another operating system it will fail. So the problem with this style of program execution is that, it is not portable from one platform to another platform.
Let us now understand, how a .Net program executes. Using dotnet we can create different types of applications. A few of the common types of .NET applications include Web, Windows, Console and Mobile Applications. Irrespective of the type of the application, when you execute any .NET application the following happens
The .NET application gets compiled into Intermediate language (IL). IL is also referred as Common Intermediate language (CIL) and Microsoft Intermediate language (MSIL). Both .NET and non .NET applications generate an assembly. Assemblies have an extension of .DLL or .EXE. For example if you compile a windows or Console application, you get a .EXE, where as when we compile a web or Class library project we get a .DLL. The difference between a .NET and NON .NET assembly is that, DOTNET Assembly is in intermediate language format where as NON DOTNET assembly is in native code format.
NON DOTNET applications can run directly on top of the operating system, where as DOTNET applications run on top of a virtual environment called as Common Language Runtime (CLR). CLR contains a component called Just In-Time Compiler (JIT), which will convert the Intermediate language into native code which the underlying operating system can understand.
So, in .NET the application execution consists of 2 steps
1. Language compiler, compiles the Source Code into Intermediate Language (IL)
2. JIT compiler in CLR converts, the IL into native code which can then be run on the underlying operating system.
Since, a .NET assembly is in Intermedaite Language format and not native code, .NET assemblies are portable to any platform, as long as the target platform has the Common Language Runtime (CLR). The target platform's CLR converts the Intermedaite Language into native code that the underlying operating system can understand. Intermediate Languge is also called as managed code. This is because CLR manages the code that runs inside it. For example, in a VB6 program, the developer is responsible for de-allocating the memory consumed by an object. If a programmer forgets to de-allocate memory, we may run into hard to detecct out of memory exceptions. On the other hand a .NET programmer need not worry about de-allocating the memory consumed by an object. Automatic memory management, also known as grabage collection is provided by CLR. Apart, from garbage collection, there are several other benefits provided by the CLR, which we will discuss in a later session. Since, CLR is managing and executing the Intermediate Language, it (IL) is also called as managed code.
.NET supports different programming languages like C#, VB, J#, and C++. C#, VB, and J# can only generate managed code (IL), where as C++ can generate both managed code (IL) and un-managed code (Native code).
The native code is not stored permanently anywhere, after we close the program the native code is thrown awaya. When we execute the program again, the native code gets generated again.
.NET program is similar to java program execution. In java we have byte codes and JVM (Java Virtual Machine), where as in .NET we Intermediate Language and CLR (Common Language Runtime)
This is provided from this link - He is a great tutor.
http://csharp-video-tutorials.blogspot.in/2012/07/net-program-execution-part-1.html

Unmanaged and managed resources are based on application domain.
From my understanding, unmanaged resource is everything that is used to make connection to outside of your application domain.
It could be HttpClient class that you resort to fetch data outside of your domain or a FileStream that helps you to read/write from/to a file.
we use Using block to dispose these kind of classes objects immediately after our work is done because GC in the first place care about inside the process resources not outside ones, although it will be disposed by the GC at the end.

Can I get some clarity on ReleaseComObject? Can I ignore it?

I have been developing office solutions in VBA for a while now and have fairly complete knowledge regarding office development in VBA. I have decided it is time to learn some real programming with .Net and am having some teething problems.
Having looked through a bunch of articles and forums (here and elsewhere), there seems to be some mixed information regarding memory management in .Net when using COM objects.
Some people say I should always deterministically release COM objects and others say I should almost never do it.
People saying I should do it:
The book 'Professional Excel Development' on page 861.
This stack exchange question has been answered by saying "every reference you make to a COM object must be released. If you don't, the process will stay in memory"
This blog suggests using it solved his problems.
People saying I should not do it:
This MSDN blog by Eric Carter states "In VSTO scenarios, you typically don't ever have to use ReleaseCOMObject."
The book 'VSTO for Office 2007' which is co-authored by Eric Carter seems to make no mention whatsoever of memory management or ReleaseComObject.
This MSDN blog by Paul Harrington says don't do it.
Someone with mixed advice:
Jake Ginnivan says I should always do it on COM objects that do not leave the method scope. If a COM object leaves the method scope then forget about it. Why can't I just forget about it all the time then?
The blog by Paul Harrington seems to suggest that the advice from MS has changed sometime in the past. Is it the case that calling ReleaseCOMObject used to be best practice but is not anymore? Can I leave the finer details of memory management to MS and assume that everything will be mostly fine?

I try to adhere to the following rule in my interop development regarding ReleaseComObject.
If my managed object implements some kind of shutdown protocol similar to IDisposable, I call ReleaseComObject on any child COM objects I hold references to. Some examples of the shutdown protocols I'm talking about:
IObjectWithSite.SetSite(null)
IOleObject.SetClientSite(null)
IOleObject.Close()
IDTExtensibility2.OnDisconnection
IDTExtensibility2.OnBeginShutdown
IDisposable.Dispose itself
This helps breaking potential circular references between .NET and native COM objects, so the managed garbage collector can do its job unobstructively.
Perhaps, there's something similar which can be used in your VSTO interop scenario (AFAIR, IDTExtensibility2 is relevant there).
If the interop scenario involves IPC COM calls (e.g., when you pass a managed event sink object to an out-of-proc COM server like Excel), there's another option to track external references to the managed object: IExternalConnection interface. IExternalConnection::AddConnection/ReleaseConnection are very similar to IUnknown::AddRef/Release, but they get called when a reference is added from another COM appartment (including apartments residing in separate processes).
IExternalConnection provides a way to implement an almost universal shutdown mechanism for out-of-proc scenarios. When the external reference count reaches zero, you should call ReleaseComObject on any external Excel objects you may be holding references to, effectively breaking any potential circular COM references between your process and Excel process. Perhaps, something like this has already been implemented by VSTO runtime (I don't have much experience with VSTO).
That said, if there is no clear shutdown mechanism, I don't call ReleaseComObject. Also, I never use FinalReleaseComObject.

You should not ignore it, if you are working with the Office GUI! Like your second link states:
Every reference you make to a COM object must be released. If you don't, the process will stay in memory.
This means, that your objects will remain in memory if you do not explicitly release them. Since they are COM objects, the garbage collector is responsible for releasing them. However, Excel and the other fancy tools are implemented with no knowledge of the garbage collector in .NET. They relate on deterministic release of memory. If you are requesting an object from excel and do not properly release it, it might be that your application does not close correctly, because it wait's until your resources are released. If your objects live long enought to get into gen1 or gen2, then this can take hours or even days!
All considerations about not releasing COM objects are targeting multithreaded scenarios or scenarios where you are forced to push around many com objects over multiple instances. As a good advice, allways create your com objects as late as possible and release them as soon as possible in the opposite order than created. Also you should think about keeping your interop instances private wherever possible. This reduces the possibility that other threads or instances are accessing the object while you have already released it.

In what order should one release COM objects and garbage collect?

There are lots of questions on SO regarding the releasing COM objects and garbage collection but nothing I could find that address this question specifically.
When releasing COM objects (specifically Excel Interop in this case), in what order should I be releasing the reference and calling garbage collection?
In some places (such as here) I have seen this:
Marshall.FinalReleaseComObject(obj);
GC.Collect();
GC.WaitForPendingFinalizers();
And in others (such as here) this:
GC.Collect();
GC.WaitForPendingFinalizers();
Marshall.FinalReleaseComObject(obj);
Or doesn't it matter and I'm worrying about nothing?

Marshal.FinalReleaseComObject() releases the underlying COM interface pointer.
GC.Collect() and GC.WaitForPendingFinalizers() causes the finalizer for a COM wrapper to be called, which calls FinalReleaseComObject().
So what makes no sense is to do it both ways. Pick one or the other.
The trouble with explicitly calling FinalReleaseComObject() is that it will only work when you call it for all the interface pointers. The Office program will keep running if you miss just one of them. That's very easy to do, especially the syntax sugar allowed in C# version 4 makes it likely. An expression like range = sheet.Cells[1, 1], very common in Excel interop code. There's a hidden Range interface reference there that you never explicitly store anywhere. So you can't release it either.
That's not a problem with GC.Collect(), it can see them. It is however not entirely without trouble either, it will only collect and run the finalizer when your program has no reference to the interface anymore. Which is definitely what's wrong with your second snippet. And which tends to go wrong when you debug your program, the debugger extends the lifetime of local object references to the end of the method. Also the time you look at Taskmgr and yell "die dammit!"
The usual advice for GC.Collect() applies here as well. Keep your program running and perform work. The normal thing happens, you'll trigger a garbage collection and that releases the COM wrappers as well. And the Office program will exit. It just doesn't happen instantly, it will happen eventually.

Reference counting mechanism that is used by COM is another way of automatic memory management but with slightly different impact on memory and behavior.
Any reference counting implementation provide deterministic behavior for resource cleanup. This means that right after call to Marshal.FinalReleaseComObject() all resources (memory and other resources) related to the COM object would be reclaimed.
This means that if we have additional managed objects and you want to reclaim them as quickly as possible, you should release COM object first and only after that call GC.Collect method.

"COM object that has been separated from its underlying RCW cannot be used" with .NET 4.0

I've a class in my .NET 3.5 C# WinForms application which has five methods.
Each method uses different sets of C++ COM interfaces.
Am using Marshal.FinalReleaseCOMObject for cleaning up these COM objects. This code works fine on this .NET platform without any issues.
But when I move this application to .NET 4.0, I start getting this error in one of these methods at a line where I cast a variable from ICOMInterface1 to ICOMInterface2, i.e.:
ICOMInterface1 myVar= obj as ICOMInterface2;
COM object that has been separated from its underlying RCW cannot be
used.
And if I remove the line where am using Marshal.FinalReleaseCOMObject, I don't get this error.
What am I missing here? And how do I clean up these unmanaged COM objects from the memory on .NET 4.0 platform?

The simple answer is to never use Marshal.FinalReleaseComObject unless you absolutely must. And if you do, there are some additional rules you must follow.
When a COM object is used in .NET, the runtime creates what's known as a "RCW" or "runtime callable wrapper" for that object. This RCW is just a normal object that holds a COM reference on the object. When this object is garbage collected, it will call IUnknown::Release() on the COM object, as you'd expect. What this means is unless your COM object requires that the last Release() is done at a very certain moment in time, just let the garbage collector take care of it. Many COM objects fall into this case, so absolutely verify that you must manage the call to Release() carefully.
So when you're calling FinalReleaseComObject, that is essentially decrementing the reference the RCW has on the COM object until it hits zero and the RCW then releases the COM object. At this point, this RCW is now zombied, and any use of it will give the exception you've seen. The CLR (by default) only creates a single RCW for any underlying COM object, so this means if the COM API you're using returned the same object twice, it's just going to have a single RCW. Calling FinalReleaseComObjectwould mean that suddenly all uses of that RCW are toast.
The only way to guarantee you have a unique Marshal.GetUniqueObjectForIUnknown, which prevents any RCW sharing. But like I said earlier, in most COM APIs this isn't neccessary to do in the first place, so just don't do it.
Paul Harrington wrote up a good blog post about [Final]ReleaseComObject and it's evils. It's a dangerous weapon that unless needed will only hurt you. Since you're asking this question, I'd suspect you don't actually need to be calling it at all. :-)

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.