How to properly clean up Excel interop object in C#, 2012 edition - c#

I am in the process of writing an application in C# which will open an Excel spreadsheet (2007, for now) via interop, do some magic, then close. The "magic" part is non-trivial, so this application will contain many references to many COM objects spawned by Excel.
I have written this kind of application before (too many times, in fact) but I've never found a comfortable, "good smell" approach to interacting with COM objects. The problem is partly that, despite significant study, I still don't perfectly understand COM and partly that the interop wrappers hide much that probably shouldn't be hidden. The fact that there are so many different, conflicting suggestions from the community only makes matters worse.
In case you can't tell from the title, I've done my research. The title alludes to this post:
How do I properly clean up Excel interop objects?
First asked in 2008, the advice was really helpful and solid at the time (especially the "Never use 2 dots with com objects" bit) but now seems out of date. In March of 2010, the Visual Studio team posted a blog article warning fellow programmers that Marshal.ReleaseComObject [is] Considered Dangerous. The article referred to two articles, cbrumme's WebLog > ReleaseComObject and The mapping between interface pointers and runtime callable wrappers (RCWs), suggesting that people have been using ReleaseComInterop incorrectly all along (cbrumme: "If you are a client application using a modest number of COM objects that are passed around freely in your managed code, you should not use ReleaseComObject").
Does anyone have an example of a moderately complex application, preferably using multiple threads, that is able to successfully navigate between memory leaks (Excel continues running in the background after the application has closed) and InvalidComObjectExceptions? I'm looking for something which will allow a COM object to be used outside of the context in which it was created but can still be cleaned up once the application is finished with it: a hybrid of memory management strategies which can effectively straddle the managed/unmanaged divide.
A reference to an article or tutorial that discusses a correct approach to this problem would be a much appreciated alternative. My best Google-fu efforts have returned the apparently incorrect ReleaseComInterop approach.
UPDATE:
(This is not an answer)
I discovered this article not long after posting:
VSTO and COM Interop by Jake Ginnivan
I've been able to implement his strategy of wrapping COM objects in "AutoCleanup" classes via an extension method, and I'm pretty happy with the result. Though it does not provide a solution to allow COM objects to cross the boundaries of the context in which they were created and still makes use of the ReleaseComObject function, it does at least provide a neat and easy-to-read solution.
Here's my implementation:
class AutoCleanup<T> : IDisposable {
public T Resource {
get;
private set;
}
public AutoCleanup( T resource ) {
this.Resource = resource;
}
~AutoCleanup() {
this.Dispose();
}
private bool _disposed = false;
public void Dispose() {
if ( !_disposed ) {
_disposed = true;
if ( this.Resource != null &&
Marshal.IsComObject( this.Resource ) ) {
Marshal.FinalReleaseComObject( this.Resource );
} else if ( this.Resource is IDisposable ) {
( (IDisposable) this.Resource ).Dispose();
}
this.Resource = null;
}
}
}
static class ExtensionMethods {
public static AutoCleanup<T> WithComCleanup<T>( this T target ) {
return new AutoCleanup<T>( target );
}
}

did you now the NetOffice concept for COM proxy management?
NetOffice use wrapper classes for com proxies and the IDisposable pattern.
NetOffice keep the parent->child relationship for proxies. dispose a worksheet and all created childs from the instance(cells, styles), etc. was also disposed. you can also use a special event or static property to observe the count of open proxies in your application.
just take a look in this documentation snippet:
http://netoffice.codeplex.com/wikipage?title=Tec_Documentation_English_Management
you find some showstopper projects for com proxy management in the tutorials folder

Related

Releasing COM Object in Constructor Parameter

I am developing a VSTO addin where now I am looking to optimize it. In my code, I does something
public class ExtraOrdinaryClass
{
public ExtraOrdinaryClass(Excel.Worksheet someGoodSheet)
{
tSheetName = someGoodSheet.Name;
tDesignSheet = someGoodSheet;
}
}
I just got to know I should release all the COM Objects, but I am searching for a proper way to release the someGoodSheet object in a proper way. I suspect if I do something like below is efficient
public class ExtraOrdinaryClass
{
public ExtraOrdinaryClass(Excel.Worksheet someGoodSheet)
{
tSheetName = someGoodSheet.Name;
tDesignSheet = someGoodSheet;
Marshal.ReleaseComObject(someGoodSheet);
someGoodSheet = null;
}
}
Can anyone help me if I am doing it effectively and tell me when parameter objects are collected by garbage collector?
I just got to know I should release all the COM Objects, but I am searching for a proper way to release the someGoodSheet object in a proper way
It is not necessary to worry about releasing COM objects because you are writing a VSTO add-in. VSTO add-in are in-process COM libraries loaded by the COM application, in this case Excel. The COM object represented by the Excel worksheet was created by Excel and so it has ownership. Attempting to manually release (via Marshal.ReleaseComObject) or reduce the reference count inordinately when you still have a managed reference to it (as in your second example), may crash Excel and/or your application.
Had you been writing a stand-alone process which say launched Excel via COM and fiddled with a few COM objects, then yes, you would need to ensure COM objects are released appropriately.
i.e. do not use this in your constructor:
Marshal.ReleaseComObject(someGoodSheet);
Because you are essentially saying to COM "I'm finished with it" and yet you are not because you still have a managed referenced to it via tDesignSheet. COM/Excel may choose to get rid of the worksheet from under you. The next time you go to access tDesignSheet you may encounter an exception.
Calling Marshal.ReleaseComObject on a parameter that is copied to a field is sort of like using a Font after you called Dispose() - both will lead to the undesirable situation where an obliterated object is accessed. (in the case of COM I assume the reference count reached 0)
Also it is not necessary to call this (because you have a reference to the same object anyway):
someGoodSheet = null;
Your original code (shown below) is fine:
public class ExtraOrdinaryClass
{
public ExtraOrdinaryClass(Excel.Worksheet someGoodSheet)
{
tSheetName = someGoodSheet.Name; // you arguably don't need this (just read tDesignSheet.Name)
tDesignSheet = someGoodSheet;
}
}
and tell me when parameter objects are collected by garbage collector?
I would say that in this case the .NET parameter would not be collected because now you have an additional reference to it in tDesignSheet inside ExtraOrdinaryClass. You would either need to set tDesignSheet to null and wait for the next time the GC runs, whenever that is.
Even if the .NET object is collected, .NET I suspect is smart enough to know that the COM object may still be used by other native COM clients such as Excel itself.
So, just because your .NET now-disposed objects no longer refer to the COM object, you may find the COM object could well be still active. e.g. the worksheet is still open.

A nasty COM interop problem in VSIX

For some time now, I've observed an intermittent COM problem in my VSIX package for Visual Studio 2010. Attempting to subscribe to one of the IDE's COM-based event sinks randomly throws the following error:
"COM object that has been separated from its underlying RCW cannot be used"
A repro case boils down to this code (which must be used in VSIX, obviously):
using System;
using EnvDTE;
using EnvDTE80;
class Test
{
private readonly Events _events;
private readonly Events2 _events2;
private readonly BuildEvents _buildEvents;
private readonly ProjectItemsEvents _projectItemsEvents;
public Test(IServiceProvider provider)
{
var dte = (DTE)provider.GetService(typeof(DTE));
var dte2 = (DTE2)dte;
// Store all references in fields as a GC precaution.
_events = dte.Events;
_events2 = (Events2)dte2.Events;
_buildEvents = _events.BuildEvents;
_projectItemsEvents = _events2.ProjectItemsEvents;
// Proceed to subscribe to event sinks.
_buildEvents.OnBuildBegin += BuildBeginHandler; // BOOM!
_projectItemsEvents.ItemAdded += ItemAddedHandler;
}
private void ItemAddedHandler(ProjectItem projectItem) { }
private void BuildBeginHandler(vsBuildScope scope, vsBuildAction action) { }
}
I've learned about a possible cause from numerous descriptions of similar problems that can be found on the net. It's basically a side effect of the way Runtime Callable Wrappers and GC interact during COM interop. Here's a link to a similar problem complete with explanation.
I'm fine with that explanation, especially because it suggests an easy workaround - storing the event sink reference in a field in order to prevent it from being prematurely GC'ed. Indeed, many people seem to have solved their problem this way.
What bothers me is that it doesn't work in my case. I'm really stumped as to why. As you can plainly see, I already store all object references in fields as a precaution. Yet the error still occurs. I tried being even more explicit using GC.KeepAlive() calls at the end of the ctor, but to no avail. Is there anything else left to do?
Without a solution, my VSIX randomly fails to load, leaving the user with a single option: to restart Visual Studio and hope it doesn't happen the next time.
Any help will truly be appreciated!
Well, I gave up and simply did the only thing that crossed my mind. I figured that since this is obviously a race condition I can't affect in a predictable manner, I might as well reenter the race if I lose.
So I moved the subscription lines into a while loop that try..catch-es them and retries after a bit of Thread.Sleep(). The loop exits either when both subscriptions succeed or when I've been continuously losing the race for more than 2 seconds.
The kicker is, I haven't lost the race once since I've implemented the change. A true Heisenbug, if I ever saw one.
Anyway, I'm going to stick with this until a proper solution occurs to me or the bug reappears.
I suspect that your problem is really that you are attempting to wire up your event handlers too soon. You normally need to be doing these sorts of things in the Initialize method of your package / toolwindow / whatever - generally speaking if you need to use a service you need to do if after the Initialize method has been called, definitely don't do this in the constructor of your Package.
(This is just a hunch - your Test class doesn't implement any VSX interfaces and so I can't see from your sample when the constructor is being called)

Name for this pattern? (Answer: lazy initialization with double-checked locking)

Consider the following code:
public class Foo
{
private static object _lock = new object();
public void NameDoesNotMatter()
{
if( SomeDataDoesNotExist() )
{
lock(_lock)
{
if( SomeDataDoesNotExist() )
{
CreateSomeData();
}
else
{
// someone else also noticed the lack of data. We
// both contended for the lock. The other guy won
// and created the data, so we no longer need to.
// But once he got out of the lock, we got in.
// There's nothing left to do.
}
}
}
}
private bool SomeDataDoesNotExist()
{
// Note - this method must be thread-safe.
throw new NotImplementedException();
}
private bool CreateSomeData()
{
// Note - This shouldn't need to be thread-safe
throw new NotImplementedException();
}
}
First, there are some assumptions I need to state:
There is a good reason I couldn't just do this once an app startup. Maybe the data wasn't available yet, etc.
Foo may be instantiated and used concurrently from two or more threads. I want one of them to end up creating some data (but not both of them) then I'll allow both to access that same data (ignore thread safety of accessing the data)
The cost to SomeDataDoesNotExist() is not huge.
Now, this doesn't necessarily have to be confined to some data creation situation, but this was an example I could think of.
The part that I'm especially interested in identifying as a pattern is the check -> lock -> check. I've had to explain this pattern to developers on a few occasions who didn't get the algorithm at first glance but could then appreciate it.
Anyway, other people must do similarly. Is this a standardized pattern? What's it called?
Though I can see how you might think this looks like double-checked locking, what it actually looks like is dangerously broken and incorrect double-checked locking. Without an actual implementation of SomeDataDoesNotExist and CreateSomeData to critique we have no guarantee whatsoever that this thing is actually threadsafe on every processor.
For an example of an analysis of how double-checked locking can go wrong, check out this broken and incorrect version of double-checked locking:
C# manual lock/unlock
My advice: don't use any low-lock technique without a compelling reason and a code review from an expert on the memory model; you'll probably get it wrong. Most people do.
In particular, don't use double-checked locking unless you can describe exactly what memory access reorderings the processors can do on your behalf and provide a convincing argument that your solution is correct given any possible memory access reordering. The moment you step away even slightly from a known-to-be-correct implementation, you need to start the analysis over from scratch. You can't assume that just because one implementation of double-checked locking is correct, that they all are; almost none of them are correct.
Lazy initialization with double-checked locking?
The part that I'm especially interested in identifying as a pattern is the check -> lock -> check.
That is called double-checked locking.
Beware that in older Java versions (before Java 5) it is not safe because of how Java's memory model was defined. In Java 5 and newer changes were made to the specification of Java's memory model so that it is now safe.
The only name that comes to mind for this kind of is "Faulting". This name is used in iOS Core-Data framework to similar effect.
Basically, your method NameDoesNotMatter is a fault, and whenever someone invokes it, it results in the object to get populated or initialized.
See http://developer.apple.com/library/mac/#documentation/Cocoa/Conceptual/CoreData/Articles/cdFaultingUniquing.html for more details on how this design pattern is used.

Using Wrapper objects to Properly clean up excel interop objects

All of these questions:
Excel 2007 Hangs When Closing via .NET
How to properly clean up Excel interop objects in C#
How to properly clean up interop objects in C#
struggle with the problem that C# does not release the Excel COM objects properly after using them. There are mainly two directions of working around this issue:
Kill the Excel process when Excel is not used anymore.
Take care to explicitly assign each COM object used to a variable first and to guarantee that eventually, Marshal.ReleaseComObject is executed on each.
Some have stated that 2 is too tedious and there is always some uncertainty whether you forget to stick to this rule at some places in the code. Still 1 seems dirty and error-prone to me, also I guess that in a restricted environment trying to kill a process could raise a security error.
So I've been thinking about solving 2 by creating another proxy object model which mimics the Excel object model (for me, it would suffice to implement the objects I actually need). The principle would look as follows:
Each Excel Interop class has its proxy which wraps an object of that class.
The proxy releases the COM object in its finalizer.
The proxy mimics the interface of the Interop class.
Any methods that originally returned a COM object are changed to return a proxy instead. The other methods simply delegate the implementation to the inner COM object.
Example:
public class Application
{
private Microsoft.Office.Interop.Excel.Application innerApplication
= new Microsoft.Office.Interop.Excel.Application innerApplication();
~Application()
{
Marshal.ReleaseCOMObject(innerApplication);
innerApplication = null;
}
public Workbooks Workbooks
{
get { return new Workbooks(innerApplication.Workbooks); }
}
}
public class Workbooks
{
private Microsoft.Office.Interop.Excel.Workbooks innerWorkbooks;
Workbooks(Microsoft.Office.Interop.Excel.Workbooks innerWorkbooks)
{
this.innerWorkbooks = innerWorkbooks;
}
~Workbooks()
{
Marshal.ReleaseCOMObject(innerWorkbooks);
innerWorkbooks = null;
}
}
My questions to you are in particular:
Who finds this a bad idea and why?
Who finds this a gread idea? If so, why hasn't anybody implemented/published such a model yet? Is it only due to the effort, or am I missing a killing problem with that idea?
Is it impossible/bad/error-prone to do the ReleaseCOMObject in the finalizer? (I've only seen proposals to put it in a Dispose() rather than in a finalizer - why?)
If the approach makes sense, any suggestions to improve it?
Is it impossible/bad/dangerous to do the ReleaseCOMObject in the destructor? (I've only seen proposals to put it in a Dispose() rather than in a destructor - why?)
It is recommended not to put your clean up code in the finalizer because unlike the destructor in C++ it is not called deterministically. It might be called shortly after the object goes out of scope. It might take an hour. It might never be called. In general if you want to dispose unmanaged objects you should use the IDisposable pattern and not the finalizer.
This solution that you linked to attempts to work around that problem by explicitly calling the garbage collector and waiting for the finalizers to complete. This is really not recommended in general but for this particular situation some people consider it to be an acceptable solution due to the difficulty of keeping track of all the temporary unmanaged objects that get created. But explicitly cleaning up is the proper way of doing it. However given the difficulty of doing so, this "hack" may be acceptable. Note that this solution is probably better than the idea you proposed.
If instead you want to try to explicitly clean up, the "don't use two dots with COM objects" guideline will help you to remember to keep a reference to every object you create so that you can clean them up when you're done.
We use the LifetimeScope class that was described in the MSDN magazine. Using it properly cleans up objects and has worked great with our Excel exports. The code can be downloaded here and also contains the magazine article:
http://lifetimescope.codeplex.com/SourceControl/changeset/changes/1266
Look at my project MS Office for .NET. There is solved problem with referencich wrapper objects and native objects via native VB.NET late-binding ability.
What I'd do:
class ScopedCleanup<T> : IDisposable where T : class
{
readonly Action<T> cleanup;
public ScopedCleanup(T o, Action<T> cleanup)
{
this.Object = o;
this.cleanup = cleanup;
}
public T Object { get; private set; }
#region IDisposable Members
public void Dispose()
{
if (Object != null)
{
if(cleanup != null)
cleanup(Object);
Object = null;
GC.SuppressFinalize(this);
}
}
#endregion
~ScopedCleanup() { Dispose(); }
}
static ScopedCleanup<T> CleanupObject<T>(T o, Action<T> cleanup) where T : class
{
return new ScopedCleanup<T>(o, cleanup);
}
static ScopedCleanup<ComType> CleanupComObject<ComType>(ComType comObject, Action<ComType> actionBeforeRelease) where ComType : class
{
return
CleanupObject(
comObject,
o =>
{
if(actionBeforeRelease != null)
actionBeforeRelease(o);
Marshal.ReleaseComObject(o);
}
);
}
static ScopedCleanup<ComType> CleanupComObject<ComType>(ComType comObject) where ComType : class
{
return CleanupComObject(comObject, null);
}
Usage case. Note the call to Quit, which seems to be necessary to make the process end:
using (var excel = CleanupComObject(new Excel.Application(), o => o.Quit()))
using (var workbooks = CleanupComObject(excel.Object.Workbooks))
{
...
}
For what it's worth, the Excel Refresh Service on codeplex uses this logic:
public static void UsingCOM<T>(T reference, Action<T> doThis) where T : class
{
if (reference == null) return;
try
{
doThis(reference);
}
finally
{
Marshal.ReleaseComObject(reference);
}
}

Allowing a method to lock its parent Object in Java

Is there a way in Java to get a method to lock (mutex) the object which it is in?
I know this sounds confusing but basically I wan't an equivelent to this snippet of C# but in Java.
lock(this)
{
// Some code here...
}
I've been tasked with reimplementing an API written in .Net into Java, and I've been asked to keep the Java version as similar to the .Net version as humanly possible. This isn't helped by the fact that the .Net version looked like it was transcribed from a C++ version which I don't have access to.
Anyway the above line appears in the C# version and I need something that does the same in Java.
The equivalent of that is:
synchronized (this)
{
}
(And no, you shouldn't generally do it in either C# or Java. Prefer locking on private references which nothing else has access to. You may be aware of that already, of course - but I didn't want to leave an answer without the warning :)
Assuming that the C++ code is a simple mutex, replace "lock" with "synchronized"
synchronized (this)
{
// ...
}
Here's the Java Concurrency tutorial for more info
I'd recommend Brian Goetz's "Java Concurrency In Practice." It's an excellent book.
It can be a good thing to keep the synchronized block as small as possible. Using the synchronized modifier on the method is coarse-grained and sometimes necessary, but otherwise you can use another object to do it that keeps the block smaller.
Like this:
public class PrivateLock {
private final Object myLock = new Object();
#GuardedBy("myLock") Widget widget;
void someMethod() {
synchronized (myLock) {
// Access or modify the state of widget
}
}
}
You should also look into the java.util.concurrent package of the API (JDK 5.0+) for additional concurrency management objects such as semaphore, exchanger, etc
http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/package-summary.html

Categories

Resources