Proper disposal of COM interop objects in C# particularly MS Office applications

Proper disposal of COM interop objects in C# particularly MS Office applications - c#

I am developing an application that relies heavily on multiple Microsoft Office products including Access, Excel, Word, PowerPoint and Outlook among others. While doing research on interop I found out that starting with VS2010 and .NET 4, we thankfully no longer have to go through the nightmares of PIAs.
Furthermore, I have been reading a lot of articles on proper disposal of objects, the most sensible one seemed to be this one.
However, the article is 5 years old and there are not many authoritative publications on the subject AFAIK. Here is a sample of code from the above link:
' Cleanup:
GC.Collect()
GC.WaitForPendingFinalizers()
GC.Collect()
GC.WaitForPendingFinalizers()
Marshal.FinalReleaseComObject(worksheet)
oWB.Close(SaveChanges:=False)
Marshal.FinalReleaseComObject(workbook)
oApp.Quit()
Marshal.FinalReleaseComObject(application)
What I want to know is by today's standards, how accurate is this and what should I look out for if I expect to support my application for the coming few years?
UPDATE: A link to some credible articles would be highly appreciated. By the way, this is not a server side application. This will be running in computer labs where we have users interact with office products that we instantiate for them.
FOUND IT: This three-part article is probably the closest to an authoritative account I would expect to find.

Objects should be disposed automatically by the GC after the object goes out of scope. If you need to release them sooner, you can use Marshal.ReleaseComObject http://msdn.microsoft.com/en-us/library/system.runtime.interopservices.marshal.releasecomobject.aspx

Depending on the version of Office you are controlling through interop, and the circumstances that occur on the server, yes, you may need to do some serious convolutions to get rid of a running instance of one of the Office applications. There are unpredictable (and unnecessary) dialogs which, for example, Word will open when it is not interactive (None of the Office applications are particularly well designed for unattended execution, thus the recommendation that even with the latest versions that they not be installed for server-side use).
For server cases, consider using Aspose's stack rather than Office, or another of the alternatives. The interactive, local use cases, you may need that "extra vigorous killing" on occasion since the Office automation servers are particularly badly behaved unmanaged objects.

If you want to release COM objects fully especially in MS Office COM Objects, its very highly recommended that you release sub objects that you must have used which are inside the parent objects.
In your example, I would say release all Cell, Range any other objects that you may have used before releasing the worksheet that the cell, range or any other object belongs to.

Related

As of today, what is the right way to work with COM objects? [duplicate]

This question already has answers here:
Clean up Excel Interop Objects with IDisposable
(2 answers)
Closed 3 years ago.
This is a very common question and I decided to ask it because this question may have a different answer as of today. Hopefully, the answers will help to understand what is the right way to work with COM objects.
Personally, I feel very confuse after getting different opinions on this subject.
The last 5 years, I used to work with COM objects and the rules were very clear for me:
Use a single period in lines of code. Using more than one period create temporary objects behind the scene that cannot be explictly released.
Do not use foreach, use a for loop instead and release each item on each iteration
Do not call FInalReleaseComObject, use ReleaseComObject instead.
Do not use GC for releasing COM objects. GC intent is mainly for debugging usage.
Release objects in reverse order of their creation.
Some of you may be frustrated after reading those last lines, this is what I knew about how to properly create/release Com Object, I hope getting answers that will make it clearer and uncontested.
Following, are some links I found on this topic. Some of them telling that it is needed to call ReleaseComObject and some of them not.
How to properly release Excel COM objects (Nov. 2013)
Proper Way of Releasing COM Objects in .NET (Aug. 2011)
Marshal.ReleaseComObject Considered Dangerous (Mar. 2010)
ReleaseCOMObject (Apr. 2004)
"... In VSTO scenarios, you typically don’t ever have to use ReleaseCOMObject. ..."
MSDN - Marshal.ReleaseComObject Method (current .NET Framework version):
"...You should use this method to free the underlying COM object that holds references..."
UPDATE:
This question has been marked as too broad. As requested, I will try to simplify and ask simpler questions.
Does ReleaseComObject is required when working with COM Objects or calling GC is the correct way?
Does VSTO approach change the way we used to work with COM Objects?
Which of the above rules I wrote are required and which are wrong? Is there any others?

The .NET / COM interop is well designed, and works correctly. In particular, the .NET Garbage Collector correctly tracks COM references, and will correctly release COM objects when they have no remaining runtime references. Interfering with the reference counts of COM object by calling Marshal.ReleaseComObject(...) or Marshal.FinalReleaseComObject(...) is a dangerous but common anti-pattern. Unfortunately, some of the bad advice came out of Microsoft.
Your .NET code can correctly interact with COM while ignoring all 5 of your rules.
If you do need to trigger deterministic clean-up of COM objects that are no longer referenced from the runtime, you can safely force a GC (and possibly wait for finalizers to complete). Otherwise, you don't have to do anything special in your code, to deal with COM objects.
There is one important caveat, that might have contributed to confusion about role of the garbage collector. When debugging .NET code, local variables artificially have their lifetime extended to the end of the method, in order to support watching the variabled under the debugger. That means you might still have managed references to a COM object (and hence the GC won't clean up) later than expect form just looking at the code. A good workaround for this issue (which only occurs under the debugger) is to split the scope of COM calls from the GC cleanup calls.
As an example, here is some C# code that interacts with Excel, and cleans up properly. You can paste into a Console application (just add a reference to Microsoft.Office.Interop.Excel):
using System;
using System.Runtime.InteropServices;
using Microsoft.Office.Interop.Excel;
namespace TestCsCom
{
class Program
{
static void Main(string[] args)
{
// NOTE: Don't call Excel objects in here...
// Debugger would keep alive until end, preventing GC cleanup
// Call a separate function that talks to Excel
DoTheWork();
// Now let the GC clean up (repeat, until no more)
do
{
GC.Collect();
GC.WaitForPendingFinalizers();
}
while (Marshal.AreComObjectsAvailableForCleanup());
}
static void DoTheWork()
{
Application app = new Application();
Workbook book = app.Workbooks.Add();
Worksheet worksheet = book.Worksheets["Sheet1"];
app.Visible = true;
for (int i = 1; i <= 10; i++) {
worksheet.Cells.Range["A" + i].Value = "Hello";
}
book.Save();
book.Close();
app.Quit();
// NOTE: No calls the Marshal.ReleaseComObject() are ever needed
}
}
}
You'll see that the Excel process properly shuts down, indicating that all the COM objects were properly cleaned up.
VSTO does not change any of these issues - it is just a .NET library that wraps and extends the native Office COM object model.
There is a lot of false information and confusion about this issue, including many posts on MSDN and on StackOverflow.
What finally convinced me to have a closer look and figure out the right advice was this post https://blogs.msdn.microsoft.com/visualstudio/2010/03/01/marshal-releasecomobject-considered-dangerous/ together with finding the issue with references kept alive under the debugger on a StackOverflow answer.
One exception to this general guidance is when the COM object model requires interfaces to be released in a particular order. The GC approach described here does not give you control over the order in which the COM objects are released by the GC.
I don't have any reference to indicate whether this would violate the COM contract. In general, I would expect COM hierarchies to use internal references to ensure any dependencies on the sequence are properly managed. E.g. in the case of Excel, one would expect a Range object to keep an internal reference to the parent Worksheet object, so that a user of the object model need not explicitly keep both alive.
There may be cases where even the Office applications are sensitive to the sequence in which COM objects are released. One case seems to be when the OLE embedding is used - see https://blogs.msdn.microsoft.com/vsofficedeveloper/2008/04/11/excel-ole-embedding-errors-if-you-have-managed-add-in-sinking-application-events-in-excel-2/
So it would be possible to create a COM object model that fails if objects are released in the wrong sequence, and the interop with such a COM model would then require some more care, and might need manual interference with the references.
But for general interop with the Office COM object models, I agree with the VS blog post calling "Marshal.ReleaseComObject – a problem disguised as a solution".

Can I get some clarity on ReleaseComObject? Can I ignore it?

I have been developing office solutions in VBA for a while now and have fairly complete knowledge regarding office development in VBA. I have decided it is time to learn some real programming with .Net and am having some teething problems.
Having looked through a bunch of articles and forums (here and elsewhere), there seems to be some mixed information regarding memory management in .Net when using COM objects.
Some people say I should always deterministically release COM objects and others say I should almost never do it.
People saying I should do it:
The book 'Professional Excel Development' on page 861.
This stack exchange question has been answered by saying "every reference you make to a COM object must be released. If you don't, the process will stay in memory"
This blog suggests using it solved his problems.
People saying I should not do it:
This MSDN blog by Eric Carter states "In VSTO scenarios, you typically don't ever have to use ReleaseCOMObject."
The book 'VSTO for Office 2007' which is co-authored by Eric Carter seems to make no mention whatsoever of memory management or ReleaseComObject.
This MSDN blog by Paul Harrington says don't do it.
Someone with mixed advice:
Jake Ginnivan says I should always do it on COM objects that do not leave the method scope. If a COM object leaves the method scope then forget about it. Why can't I just forget about it all the time then?
The blog by Paul Harrington seems to suggest that the advice from MS has changed sometime in the past. Is it the case that calling ReleaseCOMObject used to be best practice but is not anymore? Can I leave the finer details of memory management to MS and assume that everything will be mostly fine?

I try to adhere to the following rule in my interop development regarding ReleaseComObject.
If my managed object implements some kind of shutdown protocol similar to IDisposable, I call ReleaseComObject on any child COM objects I hold references to. Some examples of the shutdown protocols I'm talking about:
IObjectWithSite.SetSite(null)
IOleObject.SetClientSite(null)
IOleObject.Close()
IDTExtensibility2.OnDisconnection
IDTExtensibility2.OnBeginShutdown
IDisposable.Dispose itself
This helps breaking potential circular references between .NET and native COM objects, so the managed garbage collector can do its job unobstructively.
Perhaps, there's something similar which can be used in your VSTO interop scenario (AFAIR, IDTExtensibility2 is relevant there).
If the interop scenario involves IPC COM calls (e.g., when you pass a managed event sink object to an out-of-proc COM server like Excel), there's another option to track external references to the managed object: IExternalConnection interface. IExternalConnection::AddConnection/ReleaseConnection are very similar to IUnknown::AddRef/Release, but they get called when a reference is added from another COM appartment (including apartments residing in separate processes).
IExternalConnection provides a way to implement an almost universal shutdown mechanism for out-of-proc scenarios. When the external reference count reaches zero, you should call ReleaseComObject on any external Excel objects you may be holding references to, effectively breaking any potential circular COM references between your process and Excel process. Perhaps, something like this has already been implemented by VSTO runtime (I don't have much experience with VSTO).
That said, if there is no clear shutdown mechanism, I don't call ReleaseComObject. Also, I never use FinalReleaseComObject.

You should not ignore it, if you are working with the Office GUI! Like your second link states:
Every reference you make to a COM object must be released. If you don't, the process will stay in memory.
This means, that your objects will remain in memory if you do not explicitly release them. Since they are COM objects, the garbage collector is responsible for releasing them. However, Excel and the other fancy tools are implemented with no knowledge of the garbage collector in .NET. They relate on deterministic release of memory. If you are requesting an object from excel and do not properly release it, it might be that your application does not close correctly, because it wait's until your resources are released. If your objects live long enought to get into gen1 or gen2, then this can take hours or even days!
All considerations about not releasing COM objects are targeting multithreaded scenarios or scenarios where you are forced to push around many com objects over multiple instances. As a good advice, allways create your com objects as late as possible and release them as soon as possible in the opposite order than created. Also you should think about keeping your interop instances private wherever possible. This reduces the possibility that other threads or instances are accessing the object while you have already released it.

I've found a bug in the JIT/CLR - now how do I debug or reproduce it?

I have a computationally-expensive multi-threaded C# app that seems to crash consistently after 30-90 minutes of running. The error it gives is
The runtime has encountered a fatal error. The address of the error was at 0xec37ebae, on thread 0xbcc. The error code is 0xc0000005. This error may be a bug in the CLR or in the unsafe or non-verifiable portions of user code. Common sources of this bug include user marshaling errors for COM-interop or PInvoke, which may corrupt the stack.
(0xc0000005 is the error-code for Access Violation)
My app does not invoke any native code, or use any unsafe blocks, or even any non-CLS compliant types like uint. In fact, the line of code that the debugger says caused the crash is
overallLength += distanceTravelled;
Where both values are of type double
Given all this, I believe the crash must be due to a bug in the compiler or CLR or JIT. I'd like to figure out what causes it, or at the very least write a smaller reproduction to send into Microsoft, but I have no idea where to even begin. I've never had to view the CIL-binary, or the compiled JIT output, or the native stacktrace (there is no managed stacktrace at the time of the crash), so I'm not sure how. I can't even figure out how to view the state of all the variables at the time of the crash (VS unfortunately won't tell me like it does after managed-exceptions, and outputting them to console/a file would slow down the app 1000-fold, which is obviously not an option).
So, how do I go about debugging this?
[Edit] Compiled under VS 2010 SP1, running latest version of .Net 4.0 Client Profile. Apparently it's ".Net 4.0C/.Net 4.0E, .Net CLR 1.1.4322"

I'd like to figure out what causes it, or at the very least write a smaller reproduction to send into Microsoft, but I have no idea where to even begin.
"Smaller reproduction" definitely sounds like a great idea here... even if "smaller" won't mean "quicker to reproduce".
Before you even start, try to reproduce the error on another machine. If you can't reproduce it on another machine, that suggests a whole different set of tests to do - hardware, installation etc.
Also, check you're on the latest version of everything. It would be annoying to spend days debugging this (which is likely, I'm afraid) and then end up with a response of "Yes, we know about this - it was a bug in .NET 4 which was fixed in .NET 4.5" for example. If you can reproduce it on a variety of framework versions, that would be even better :)
Next, cut out everything you can in the program:
Does it have a user interface at all? If possible, remove that.
Does it use a database? See if you can remove all database access: definitely any output which isn't used later, and ideally input too. If you can hard code the input within the app, that would be ideal - but if not, files are simpler for reproductions than database access.
Is it data-sensitive? Again, without knowing much about the app it's hard to know whether this is useful, but assuming it's processing a lot of data, can you use a binary search to find a relatively small amount of data which causes the problem?
Does it have to be multi-threaded? If you can remove all the threading, obviously that may well then take much longer to reproduce the problem - but does it still happen at all?
Try removing bits of business logic: if your app is componentized appropriately, you can probably fake out whole significant components by first creating a stub implementation, and then simply removing the calls.
All of this will gradually reduce the size of the app until it's more manageable. At each step, you'll need to run the app again until it either crashes or you're convinced it won't crash. If you have a lot of machines available to you, that should help...

tl;dr Make sure you're compiling to .Net 4.5
This sounds suspiciously like the same error found here. From the MSDN page:
This bug can be encountered when the Garbage Collector is freeing and compacting memory. The error can happen when the Concurrent Garbage Collection is enabled and a certain combination of foreground Garbage Collection and background Garbage Collection occurs. When this situation happens you will see the same call stack over and over. On the heap you will see one free object and before it ends you will see another free object corrupting the heap.
The fix is to compile to .Net 4.5. If for some reason you can't do this, you can also disable concurrent garbage collection by disabling gcConcurrent in the app.config file:
<configuration>
<runtime>
<gcConcurrent enabled="false"/>
</runtime>
</configuration>
Or just compile to x86.

WinDbg is your friend:
http://blogs.msdn.com/b/tess/archive/2006/02/09/net-crash-managed-heap-corruption-calling-unmanaged-code.aspx
http://www.codeproject.com/Articles/23589/Get-Started-Debugging-Memory-Related-Issues-in-Net
http://www.codeproject.com/Articles/22245/Quick-start-to-using-WinDbg

Download Debug Diagnostic Tool v1.2
Run program
Add Rule "Crash"
Select "Specific Process"
on page Advanced Configuration set your exception if you know on which exception it fails or just leave this page as is
Set userdump location
Now wait for process to crash, log file is created by DebugDiag. Now activate tab Advanced Analysis, select Crash/Hang Analyzers in top list and dump file in lower list and hit Start Analysis. This will generate html report for you. Hopes you found usefull info in that report. If you have problem with analyze, upload html report somewhere and place url here so we can focus on it.

My app does not invoke any native code, or use any unsafe blocks, or
even any non-CLS compliant types like uint
You may think this, but threading, synchronization via semaphore, mutex it any handles all are native. .net is a layer over operating system, .net itself does not support pure clr code for multithreading apps, this is because OS already does it.
Most likely this is thread synchronization error. Probably multiple threads are trying to access shared resource like file etc that is outside clr boundary.
You may think you aren't accessing com etc, but when you call certain API like get desktop folder path etc it is called through shell com API.
You have following two options,
Publish your code so that we can review the bottleneck
Redesign your app using .net parallel threading framework, which includes variety of algorithms requiring CPU intensive operations.
Most likely programs fail after certain period of time as collections grow up and operations fail to execute before other thread interfere. For example, producer consumer problem, you will not notice any problem till producer will become slower or fail to finish its operation before consumer kicks in.
Bug in clr is rare, because clr is very stable. But poorly written code may lead error to appear as bug in clr. Clr can not and will never detect whether the bug is in your code or in clr itself.

Did you run a memory test for your machine as the one time I had comparable symptoms one of my dimms turned out to be faulty (a very good memorytester is included in Win7; http://www.tomstricks.com/how-to-test-your-ram-or-memory-with-windows-memory-diagnostic-tool-in-windows-7/)
It might also be a heating/throttling issue if your CPU gets too hot after this period of time. Although that would happen sooner imho.
There should be a dumpfile that you can analyze. If you never did this find someone who did, or send that to microsoft

I will suggest you open a support case via http://support.microsoft.com immediately, as the support guys can show you how to collect the necessary information.
Generally speaking, like #paulsm4 and #psulek said, you can utilize WinDbg or Debug Diag to capture crash dumps of the process, and within it, all necessary information is embedded. However, if this is the very first time you use those tools, you might be puzzled. Microsoft support team can provide you step by step guidance on them, or they can even set up a Live Meeting session with you to capture the data, as the program crashes so often.
Once you are familiar with the tools, in the future you can perform similar troubleshooting more easily,
http://blogs.msdn.com/b/lexli/archive/2009/08/23/when-the-application-program-crashes-on-windows.aspx
BTW, it is too early to say "I've found a bug". Though you cannot obviously find in your program a dependency on native code, it might still have a dependency on native code. We should not draw a conclusion before debugging further into the issue.

.NET : Monitoring Objects Lifetime (Birth/Death/Memory)

Issue : What is the best way to have a comprehensive view of birth/death/mem usage of ALL objects created during an application lifetime? (graphic report would be better)
Why such a question :
Among many others, the idea behind that, is to reveal long-living objects that may never be collected by the garbage collector or cause memory trouble (such as heap/stack issues or so), and provide valuable information to efficiently manage object lifecycles (I actually just spent a whole night debugging a multithreaded application to finally notice that a "believed to be disposed/renewed" object was in fact still alive and smashing up the server memory.)
VS2010 Performance Wizard & Profiler might be a good starter...
I stumbled upon few ways to do this programmatically but it involved wrapping up objects individually (painstaking and not code-seamless)
I'm looking for something that would look like this :
Application START[-----------------------------------------------------------]END
Object 1 [---------------------------]
Object 2 [---------------------------]
Object 3 [-----------------------------------------------------]

Mika,
As you noted you can use the VS2010 Profiler (if you have Visual Studio Premium or Ultimate). For more information see the MSDN page about collecting 'Object Lifetime' information.
Beware that this profiling mode is pretty heavyweight compared to other profiling modes and you may find the collected VSP file is quite large unless you have a fairly focussed scenario that you are profiling.
The profiling report will show the information in tabular form, but you can copy the data into Excel or another program of your choice for further analysis/charting.
Disclaimer: I work on the Visual Studio profiler.

There are some tools which can do that but not as easily as graph. You will need to learn a bit on those tools.
Free:
CLR Profiler
http://msdn.microsoft.com/en-us/library/ff650691.aspx
WinDbg:
http://www.microsoft.com/whdc/devtools/debugging/default.mspx
Use the SOS or SOSEX extension with Windbg to profile .NET code.
Commercial :
Red Gate Ants Profiler:
http://www.red-gate.com/products/dotnet-development/ants-performance-profiler/

How to properly clean up interop objects in C#

This is a follow on question to
How to properly clean up excel interop objects in c#.
The gyst is that using a chaining calls together (eg. ExcelObject.Foo.Bar() ) within the Excel namespace prevents garbage collection for COM objects. Instead, one should explicitly create a reference to each COM object used, and explicitly release them using Marhsal.ReleaseComObject().
Is the behaviour of not releasing COM objects after a chained call specific to Excel COM objects only? Is it overkill to apply this kind of pattern whenever a COM object is in use?

It's definitely more important to handle releases properly when dealing with Office applications than many other COM libraries, for two reasons.
The Office apps run as out of process servers, not in-proc libraries. If you fail to clean up properly, you leave a process running.
Office apps (especially Excel IIRC) don't terminate properly even if you call Application.Quit, if there are outstanding references to its COM objects.
For regular, in-proc COM libraries, the consequences of failing to clean up properly are not so dramatic. When your process exits, all in-proc libraries goes away with it. And if you forget to call ReleaseComObject on an object when you no longer need it, it will still be taken care of eventually when the object gets finalized.
That said, this is no excuse to write sloppy code.

COM objects are essentially unmanaged code - and as soon as you start calling unmanaged code from a managed application, it becomes your responsibility to clean up after that unmanaged code.
In short, the pattern linked in the above post is necessary for all COM objects.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.