Getting ReportAvOnComRelease Exception when using 3rd party COM - c#

I am a new C# programmer and have created an application which uses a 3rd party COM object to track telephone call recordings from a call recording server. The creator of the COM software is also the vendor who makes the call recording software, so you would think it should work. I have been on many phone calls and code reviews with their staff and they have come up with very little to help.
The application responds to events from the COM object like OnCallStart and OnCallEnd, AgentLogon, AgentLogoff, ServerDown, etc. I do nothing more than monitor what the events return and write it to a file. The application compiles without a problem and runs for a few minutes and then it gives me the following error (I had to open up the Exception in the Debug>Exceptions menu to finally see it):
ReportAvOnComRelease was detected
Message: An exception was caught but handled while releasing a COM interface pointer through Marshal.Release or Marshal.ReleaseComObject or implicitly after the corresponding RuntimeCallableWrapper was garbage collected. This is the result of a user refcount error or other problem with a COM object's Release. Make sure refcounts are managed properly. The COM interface pointer's original vtable pointer was 0x45ecbac. While these types of exceptions are caught by the CLR, they can still lead to corruption and data loss so if possible the issue causing the exception should be addressed.
It gives me no more than that. No vtable details, refcounts or anything else. I coded a GC.Collect() and let the app run for a minute and then fired the GC.Collect and got the error. I can do that with some consistency. I have read article after article about this error and the need to Marshal correctly, but I am not the one marshaling. VS creates a RCW for the COM object and I use that so I have no control there, or do I? None of the articles gave me any code examples or anything other than theoretical chit chat.
Is there a better way to do this? How can I find exactly what is causing the error? There seems to be no way to isolate this thing. I found one article from a guy from Microsoft that called this the "Silent Assassin", but he gave no solutions and essentially admitted that MS didn't have any either. Read Here
I am at my wits end. Any help is appreciated.

Well, it is a serious defect in the COM server you are using. Yes, it is indirectly triggered by the CLR, the last reference count of a COM object is released when the finalizer thread runs the RCW finalizer. Marshal.ReleaseComObject counts the reference count down to 0 and the COM server's IUnknown::Release() implementation method will clean up the object.
That's always a vulnerable time for a COM server. When it corrupted the heap earlier, a common time for this to trigger a CPU hardware fault (AV = Access Violation) is when it releases memory. Microsoft put a catcher for this hardware exception in place to help diagnose the problem. Without it, you'd have very little chance to figure out what happened because the finalizer runs at an unpredictable time without any of your own code actively running.
The fault is quite serious, you're left with a corrupted heap that's only partly cleaned up. If you keep going, you'll typically just get more AVs and/or you'll leak memory. The worst possible outcome, quite likely btw, is that it doesn't die afterwards but just starts generating bad data or misbehaves unpredictable, causing you to think that it is your code that is buggy.
There's only one party that can fix this problem, the supplier of this COM server. Carefully specify the machine you are running on, especially the operating system version is important, and give them a small piece of code (source included) that reproduces the exception. Keeping it small and highly visible is important or they'll claim it was your code that corrupted the heap. They are likely to do so anyway, heap corruption is very difficult to debug. If you cannot get them responsive, you'd be wise to shop for another vendor.

Related

Can I get some clarity on ReleaseComObject? Can I ignore it?

I have been developing office solutions in VBA for a while now and have fairly complete knowledge regarding office development in VBA. I have decided it is time to learn some real programming with .Net and am having some teething problems.
Having looked through a bunch of articles and forums (here and elsewhere), there seems to be some mixed information regarding memory management in .Net when using COM objects.
Some people say I should always deterministically release COM objects and others say I should almost never do it.
People saying I should do it:
The book 'Professional Excel Development' on page 861.
This stack exchange question has been answered by saying "every reference you make to a COM object must be released. If you don't, the process will stay in memory"
This blog suggests using it solved his problems.
People saying I should not do it:
This MSDN blog by Eric Carter states "In VSTO scenarios, you typically don't ever have to use ReleaseCOMObject."
The book 'VSTO for Office 2007' which is co-authored by Eric Carter seems to make no mention whatsoever of memory management or ReleaseComObject.
This MSDN blog by Paul Harrington says don't do it.
Someone with mixed advice:
Jake Ginnivan says I should always do it on COM objects that do not leave the method scope. If a COM object leaves the method scope then forget about it. Why can't I just forget about it all the time then?
The blog by Paul Harrington seems to suggest that the advice from MS has changed sometime in the past. Is it the case that calling ReleaseCOMObject used to be best practice but is not anymore? Can I leave the finer details of memory management to MS and assume that everything will be mostly fine?
I try to adhere to the following rule in my interop development regarding ReleaseComObject.
If my managed object implements some kind of shutdown protocol similar to IDisposable, I call ReleaseComObject on any child COM objects I hold references to. Some examples of the shutdown protocols I'm talking about:
IObjectWithSite.SetSite(null)
IOleObject.SetClientSite(null)
IOleObject.Close()
IDTExtensibility2.OnDisconnection
IDTExtensibility2.OnBeginShutdown
IDisposable.Dispose itself
This helps breaking potential circular references between .NET and native COM objects, so the managed garbage collector can do its job unobstructively.
Perhaps, there's something similar which can be used in your VSTO interop scenario (AFAIR, IDTExtensibility2 is relevant there).
If the interop scenario involves IPC COM calls (e.g., when you pass a managed event sink object to an out-of-proc COM server like Excel), there's another option to track external references to the managed object: IExternalConnection interface. IExternalConnection::AddConnection/ReleaseConnection are very similar to IUnknown::AddRef/Release, but they get called when a reference is added from another COM appartment (including apartments residing in separate processes).
IExternalConnection provides a way to implement an almost universal shutdown mechanism for out-of-proc scenarios. When the external reference count reaches zero, you should call ReleaseComObject on any external Excel objects you may be holding references to, effectively breaking any potential circular COM references between your process and Excel process. Perhaps, something like this has already been implemented by VSTO runtime (I don't have much experience with VSTO).
That said, if there is no clear shutdown mechanism, I don't call ReleaseComObject. Also, I never use FinalReleaseComObject.
You should not ignore it, if you are working with the Office GUI! Like your second link states:
Every reference you make to a COM object must be released. If you don't, the process will stay in memory.
This means, that your objects will remain in memory if you do not explicitly release them. Since they are COM objects, the garbage collector is responsible for releasing them. However, Excel and the other fancy tools are implemented with no knowledge of the garbage collector in .NET. They relate on deterministic release of memory. If you are requesting an object from excel and do not properly release it, it might be that your application does not close correctly, because it wait's until your resources are released. If your objects live long enought to get into gen1 or gen2, then this can take hours or even days!
All considerations about not releasing COM objects are targeting multithreaded scenarios or scenarios where you are forced to push around many com objects over multiple instances. As a good advice, allways create your com objects as late as possible and release them as soon as possible in the opposite order than created. Also you should think about keeping your interop instances private wherever possible. This reduces the possibility that other threads or instances are accessing the object while you have already released it.

Can there be a scenario when garbage collector fails to run due to an exception?

Just out of curiosity I was wondering if there is a possibility of a scenario when garbage collector fails to run or doesn't run at all (possibly due to an exception) ?
If yes, most probably there would be an OutOfMemory/ Stackoverflow exception . Then in that case just by looking at the exception message, stacktrace etc can we identify the core issue of gc failing to run.
As others have mentioned, numerous things can prevent the GC from running. FailFast fails fast; it doesn't stop to take out the trash before the building is demolished. But you asked specifically about exceptions.
An uncaught exception produces implementation-defined behaviour, so it is implementation-defined whether finally blocks run, whether garbage collection runs, and whether the finalizer queue objects are finalized when there is an uncaught exception. An implementation of the CLR is permitted to do anything when that happens, and "anything" includes both "run the GC" and "do not run the GC". And in fact implementations of the CLR have changed their behaviour over time; in v1.0 of the CLR an uncaught exception on the finalizer thread took out the process, in v2.0 an uncaught exception on the finalizer thread is caught, the error is logged, and finalizers keep on running.
There are four questions of interest:
Can something cause the program to die entirely, without the garbage-collector getting a chance to run
Can something prevent the garbage-collector from running without causing the system to die entirely
Can something prevent objects' finalizers from running without causing the system to die entirely
Can an exception make an object uncollectable for an arbitrary period of time
With regard to the first one, the answer is "definitely". There are so many ways that could potentially happen, that there's no need to list them here.
With regard to the second question, the answer is "generally no", since failure of the garbage collector would cripple a program; there may be some cases, however, in which portions of a program which do not use GC-managed memory may be able to keep running even though the portions that use managed objects could be blocked indefinitely.
With regard to the third question, it used to be in .net that an exception in a finalizer could interfere with the action of other finalizers without killing the entire application; such behavior has been changed since .net 2.0 so that uncaught exceptions thrown from finalizers will usually kill the whole program. It is possible, however, that an exception which is thrown and caught within a poorly-written finalizer might result in its failing to clean up everything it was supposed to, leading to question #4.
With regard to the fourth question, it is quite common for objects to establish long-lived (possibly static) references to themselves when they are created, and for them to destroy such references as part of clean-up code. If an exception prevents that clean-up code from running as expected, it may cause the objects to become uncollectable even if they are no longer useful.
yes, in Java there used to be the situation where the program could stop without the GC being run for the last time - in most cases this is OK as all the memory is cleared up when the program's heap is destroyed, but you can have the problem of objects not having their finalisers being run, this may or may not be a problem for you, depending what those finalisers would do.
I doubt you'll be able to determine the GC failure, as the program will be as dead as a parrot, in a non-clean manner, so you probably won't even get a stacktrace. You might be able to post-mortem debug it (if you've turned on the right dbg settings, .NET is sh*te when it comes to working nicely with the excellent Windows debugging tools).
There are certain edge cases where a finally block will not execute - calling FailFast is one case, and see the question here for others.
Given this, I would imagine there are cases (especially in using statements / IDisposable objects) where the resource cleanup/garbage collection occurring in a finally block are not executed.
More explicitly, something like this:
try
{
//new up an expensive object, maybe one that uses native resources
Environment.FailFast(string.Empty);
}
finally
{
Console.WriteLine("never executed");
}

I've found a bug in the JIT/CLR - now how do I debug or reproduce it?

I have a computationally-expensive multi-threaded C# app that seems to crash consistently after 30-90 minutes of running. The error it gives is
The runtime has encountered a fatal error. The address of the error was at 0xec37ebae, on thread 0xbcc. The error code is 0xc0000005. This error may be a bug in the CLR or in the unsafe or non-verifiable portions of user code. Common sources of this bug include user marshaling errors for COM-interop or PInvoke, which may corrupt the stack.
(0xc0000005 is the error-code for Access Violation)
My app does not invoke any native code, or use any unsafe blocks, or even any non-CLS compliant types like uint. In fact, the line of code that the debugger says caused the crash is
overallLength += distanceTravelled;
Where both values are of type double
Given all this, I believe the crash must be due to a bug in the compiler or CLR or JIT. I'd like to figure out what causes it, or at the very least write a smaller reproduction to send into Microsoft, but I have no idea where to even begin. I've never had to view the CIL-binary, or the compiled JIT output, or the native stacktrace (there is no managed stacktrace at the time of the crash), so I'm not sure how. I can't even figure out how to view the state of all the variables at the time of the crash (VS unfortunately won't tell me like it does after managed-exceptions, and outputting them to console/a file would slow down the app 1000-fold, which is obviously not an option).
So, how do I go about debugging this?
[Edit] Compiled under VS 2010 SP1, running latest version of .Net 4.0 Client Profile. Apparently it's ".Net 4.0C/.Net 4.0E, .Net CLR 1.1.4322"
I'd like to figure out what causes it, or at the very least write a smaller reproduction to send into Microsoft, but I have no idea where to even begin.
"Smaller reproduction" definitely sounds like a great idea here... even if "smaller" won't mean "quicker to reproduce".
Before you even start, try to reproduce the error on another machine. If you can't reproduce it on another machine, that suggests a whole different set of tests to do - hardware, installation etc.
Also, check you're on the latest version of everything. It would be annoying to spend days debugging this (which is likely, I'm afraid) and then end up with a response of "Yes, we know about this - it was a bug in .NET 4 which was fixed in .NET 4.5" for example. If you can reproduce it on a variety of framework versions, that would be even better :)
Next, cut out everything you can in the program:
Does it have a user interface at all? If possible, remove that.
Does it use a database? See if you can remove all database access: definitely any output which isn't used later, and ideally input too. If you can hard code the input within the app, that would be ideal - but if not, files are simpler for reproductions than database access.
Is it data-sensitive? Again, without knowing much about the app it's hard to know whether this is useful, but assuming it's processing a lot of data, can you use a binary search to find a relatively small amount of data which causes the problem?
Does it have to be multi-threaded? If you can remove all the threading, obviously that may well then take much longer to reproduce the problem - but does it still happen at all?
Try removing bits of business logic: if your app is componentized appropriately, you can probably fake out whole significant components by first creating a stub implementation, and then simply removing the calls.
All of this will gradually reduce the size of the app until it's more manageable. At each step, you'll need to run the app again until it either crashes or you're convinced it won't crash. If you have a lot of machines available to you, that should help...
tl;dr Make sure you're compiling to .Net 4.5
This sounds suspiciously like the same error found here. From the MSDN page:
This bug can be encountered when the Garbage Collector is freeing and compacting memory. The error can happen when the Concurrent Garbage Collection is enabled and a certain combination of foreground Garbage Collection and background Garbage Collection occurs. When this situation happens you will see the same call stack over and over. On the heap you will see one free object and before it ends you will see another free object corrupting the heap.
The fix is to compile to .Net 4.5. If for some reason you can't do this, you can also disable concurrent garbage collection by disabling gcConcurrent in the app.config file:
<configuration>
<runtime>
<gcConcurrent enabled="false"/>
</runtime>
</configuration>
Or just compile to x86.
WinDbg is your friend:
http://blogs.msdn.com/b/tess/archive/2006/02/09/net-crash-managed-heap-corruption-calling-unmanaged-code.aspx
http://www.codeproject.com/Articles/23589/Get-Started-Debugging-Memory-Related-Issues-in-Net
http://www.codeproject.com/Articles/22245/Quick-start-to-using-WinDbg
Download Debug Diagnostic Tool v1.2
Run program
Add Rule "Crash"
Select "Specific Process"
on page Advanced Configuration set your exception if you know on which exception it fails or just leave this page as is
Set userdump location
Now wait for process to crash, log file is created by DebugDiag. Now activate tab Advanced Analysis, select Crash/Hang Analyzers in top list and dump file in lower list and hit Start Analysis. This will generate html report for you. Hopes you found usefull info in that report. If you have problem with analyze, upload html report somewhere and place url here so we can focus on it.
My app does not invoke any native code, or use any unsafe blocks, or
even any non-CLS compliant types like uint
You may think this, but threading, synchronization via semaphore, mutex it any handles all are native. .net is a layer over operating system, .net itself does not support pure clr code for multithreading apps, this is because OS already does it.
Most likely this is thread synchronization error. Probably multiple threads are trying to access shared resource like file etc that is outside clr boundary.
You may think you aren't accessing com etc, but when you call certain API like get desktop folder path etc it is called through shell com API.
You have following two options,
Publish your code so that we can review the bottleneck
Redesign your app using .net parallel threading framework, which includes variety of algorithms requiring CPU intensive operations.
Most likely programs fail after certain period of time as collections grow up and operations fail to execute before other thread interfere. For example, producer consumer problem, you will not notice any problem till producer will become slower or fail to finish its operation before consumer kicks in.
Bug in clr is rare, because clr is very stable. But poorly written code may lead error to appear as bug in clr. Clr can not and will never detect whether the bug is in your code or in clr itself.
Did you run a memory test for your machine as the one time I had comparable symptoms one of my dimms turned out to be faulty (a very good memorytester is included in Win7; http://www.tomstricks.com/how-to-test-your-ram-or-memory-with-windows-memory-diagnostic-tool-in-windows-7/)
It might also be a heating/throttling issue if your CPU gets too hot after this period of time. Although that would happen sooner imho.
There should be a dumpfile that you can analyze. If you never did this find someone who did, or send that to microsoft
I will suggest you open a support case via http://support.microsoft.com immediately, as the support guys can show you how to collect the necessary information.
Generally speaking, like #paulsm4 and #psulek said, you can utilize WinDbg or Debug Diag to capture crash dumps of the process, and within it, all necessary information is embedded. However, if this is the very first time you use those tools, you might be puzzled. Microsoft support team can provide you step by step guidance on them, or they can even set up a Live Meeting session with you to capture the data, as the program crashes so often.
Once you are familiar with the tools, in the future you can perform similar troubleshooting more easily,
http://blogs.msdn.com/b/lexli/archive/2009/08/23/when-the-application-program-crashes-on-windows.aspx
BTW, it is too early to say "I've found a bug". Though you cannot obviously find in your program a dependency on native code, it might still have a dependency on native code. We should not draw a conclusion before debugging further into the issue.

Corrupted state exceptions (CSE) across AppDomain

For some background info, .NET 4.0 no longer catches CSEs by default: http://msdn.microsoft.com/en-us/magazine/dd419661.aspx
I'm working on an app that executes code in a new AppDomain. If that code throws a CSE, the exception bubbles up to the main code if it's not handled. My question is, can I safely assume that a CSE on the second AppDomain won't corrupt the state in the main AppDomain, and thus exit the second AppDomain and continue running the main AppDomain?
In the context of a corrupted state exception, in general, you cannot assume anything to be true anymore. The point of these exceptions is that something has happened, usually due to buggy unmanaged code, that has violated some core assumption that Windows or the CLR makes about the structure of memory. That means that, in theory, the very structures that the CLR uses to track which app domains exist in memory could be corrupted. The kinds of things that cause CSEs are generally indicative that things have gone catastrophically wrong.
Having said all that, off-the-record, in some cases, you may be able to make a determination that it is safe to continue from a particular exception. An EXCEPTION_STACK_OVERFLOW, for example, is probably recoverable, and an EXCEPTION_ACCESS_VIOLATION usually indicates that Windows caught a potential bug before it had a chance to screw anything up. It's up to you if you're willing to risk it, depending on how much you know about the code that is throwing the CSEs in the first place.

When is it OK to catch an OutOfMemoryException and how to handle it?

Yesterday I took part in a discussion on SO devoted to OutOfMemoryException and the pros and cons of handling it (C# try {} catch {}).
My pros for handling it were:
The fact that OutOfMemoryException was thrown doesn't generally mean that the state of a program was corrupted;
According to documentation "the following Microsoft intermediate (MSIL) instructions throw OutOfMemoryException: box, newarr, newobj" which just (usually) means that the CLR attempted to find a block of memory of a given size and was unable to do that; it does not mean that no single byte left at our disposition;
But not all people were agree with that and speculated about unknown program state after this exception and an inability to do something useful since it will require even more memory.
Therefore my question is: what are the serious reasons not to handle OutOfMemoryException and immediately give up when it occurs?
Edited: Do you think that OOME is as fatal as ExecutionEngineException?
IMO, since you can't predict what you can/can't do after an OOM (so you can't reliably process the error), or what else did/didn't happen when unrolling the stack to where you are (so the BCL hasn't reliably processed the error), your app must now be assumed to be in a corrupt state. If you "fix" your code by handling this exception you are burying your head in the sand.
I could be wrong here, but to me this message says BIG TROUBLE. The correct fix is to figure out why you have chomped though memory, and address that (for example, have you got a leak? could you switch to a streaming API?). Even switching to x64 isn't a magic bullet here; arrays (and hence lists) are still size limited; and the increased reference size means you can fix numerically fewer references in the 2GB object cap.
If you need to chance processing some data, and are happy for it to fail: launch a second process (an AppDomain isn't good enough). If it blows up, tear down the process. Problem solved, and your original process/AppDomain is safe.
We all write different applications. In a WinForms or ASP.Net app I would probably just log the exception, notify the user, try to save state, and shutdown/restart. But as Igor mentioned in the comments this could very well be from building some form of image editing application and the process of loading the 100th 20MB RAW image could push the app over the edge. Do you really want the use to lose all of their work from something as simple as saying. "Sorry, unable to load more images at this time".
Another common instance that it could be useful to catch out of memory exceptions is in back end batch processing. You could have a standard model of loading multi-mega-byte files into memory for processing, but then one day out of the blue a multi-giga-byte file is loaded. When the out-of-memory occurs you could log the message to a user notification queue and then move on to the next file.
Yes it is possible that something else could blow at the same time, but those too would be logged and notified if possible. If finally the GC is unable to process any more memory the application is going to go down hard anyway. (The GC runs in an unprotected thread.)
Don't forget we all develop different types of applications. And unless you are on older, constrained machines you will probably never get an OutOfMemoryException for typical business apps... but then again not all of us are business tool developers.
To your edit...
Out-of-memory may be caused by unmanaged memory fragmentation and pinning. It can also be caused by large allocation requests. If we were to put up a white flag and draw a line in the sand over such simple issues, nothing would ever get done in large data processing projects. Now comparing that to a fatal Engine exception, well there is nothing you can do at the point the runtime falls over dead under your code. Hopefully you are able to log (but probably not) why your code fell on its face so you can prevent it in the future. But, more importantly, hopefully your code is written in a manner that could allow for safe recovery of as much data as you can. Maybe even recover the last known good state in your application and possibly skip the offending corrupt data and allow it to be manually processed and recovered.
Yet at the same time it is just as possible to have data corruption caused by SQL injection, out-of-sync versions of software, pointer manipulation, buffer over runs, and many other problems. Avoiding an issue just because you think you may not recover from it is a great way to give users error messages as constructive as Please contact your system administrator.
Some commenters have noted that there are situations, when OOM could be the immediate result of attempting to allocate a large number of bytes (graphics application, allocating large array, etc.). Note that for that purpose you could use the MemoryFailPoint class, which raises an InsufficientMemoryException (itself derived from OutOfMemoryException). That can be caught safely, as it is raised before the actual attempt to allocate the memory has been made. However, this can only really reduce the likelyness of an OOM, never fully prevent it.
It all depends on the situation.
Quite a few years ago now I was working on a real-time 3D rendering engine. At the time we loaded all the geometry for the model into memory on start up, but only loaded the texture images when we needed to display them. This meant when the day came our customers were loading huge (2GB) models we were able to cope. The geometry occupied less than 2GB, but when all the textures were added it would be > 2GB. By trapping the out of memory error that was raised when we tried to load the texture we were able to carry on displaying the model, but just as the plain geometry.
We still had a problem if the geometry was > 2GB, but that was a different story.
Obviously, if you get an out of memory error with something fundamental to your application then you've got no choice but to shut down - but do that as gracefully as you can.
Suggest Christopher Brumme's comment in "Framework Design Guideline" p.238 (7.3.7 OutOfMemoryException):
At one end of the spectrum, an OutOfMemoryException could be the result of a failure to obtain 12 bytes for implicitly autoboxing, or a failure to JIT some code that is required for critical backout. These cases are catastrophic failures and ideally would result in termination of the process. At the other end of the spectrum, an OutOfMemoryException could be the result of a thread asking for a 1 GB byte array. The fact that we failed this allocation attempt has no impact on the consistency and viability of the rest of the process.
The sad fact is that CRL 2.0 cannot distinguish among any points on this spectrum. In most managed processes, all OutOfMemoryExceptions are considered equivalent and they all result in a managed exception being propagated up the thread. However, you cannot depend on your backout code being executed, because we might fail to JIT some of your backout methods, or we might fail to execute static constructors required for backout.
Also, keep in mind that all other exceptions can get folded into an OutOfMemoryException if there isn't enough memory to instantiate those other exception objects. Also, we will give you a unique OutOfMemoryException with its own stack trace if we can. But if we are tight enough on memory, you will share an uninteresting global instance with everyone else in the process.
My best recommendation is that you treat OutOfMemoryException like any other application exception. You make your best attempts to handle it and ramain consistent. In the future, I hope the CLR can do a better job of distinguishing catastrophic OOM from the 1 GB byte array case. If so, we might provoke termination of the process for the catastrophic cases, leaving the application to deal with the less risky ones. By threating all OOM cases as the less risky ones, you are preparing for that day.
Marc Gravell has already provided an excellent answer; seeing as how I partly "inspired" this question, I would like to add one thing:
One of the core principles of exception handling is never to throw an exception inside an exception handler. (Note - re-throwing a domain-specific and/or wrapped exception is OK; I am talking about an unexpected exception here.)
There are all sorts of reasons why you need to prevent this from happening:
At best, you mask the original exception; it becomes impossible to know for sure where the program originally failed.
In some cases, the runtime may simply be unable to handle an unhandled exception in an exception handler (say that 5 times fast). In ASP.NET, for example, installing an exception handler at certain stages of the pipeline and failing in that handler will simply kill the request - or crash the worker process, I forget which.
In other cases, you may open yourself up to the possibility of an infinite loop in the exception handler. This may sound like a silly thing to do, but I have seen cases where somebody tries to handle an exception by logging it, and when the logging fails... they try to log the failure. Most of us probably wouldn't deliberately write code like this, but depending on how you structure your program's exception handling, you can end up doing it by accident.
So what does this have to do with OutOfMemoryException specifically?
An OutOfMemoryException doesn't tell you anything about why the memory allocation failed. You might assume that it was because you tried to allocate a huge buffer, but maybe it wasn't. Maybe some other rogue process on the system has literally consumed all of the available address space and you don't have a single byte left. Maybe some other thread in your own program went awry and went into an infinite loop, allocating new memory on each iteration, and that thread has long since failed by the time the OutOfMemoryException ends up on your current stack frame. The point is that you don't actually know just how bad the memory situation is, even if you think you do.
So start thinking about this situation now. Some operation just failed at an unspecified point deep in the bowels of the .NET framework and propagated up an OutOfMemoryException. What meaningful work can you perform in your exception handler that does not involve allocating more memory? Write to a log file? That takes memory. Display an error message? That takes even more memory. Send an alert e-mail? Don't even think about it.
If you try to do these things - and fail - then you'll end up with non-deterministic behaviour. You'll possibly mask the out-of-memory error and get mysterious bug reports with mysterious error messages bubbling up from all kinds of low-level components you wrote that aren't supposed to be able to fail. Fundamentally, you've violated your own program's invariants, and this is going to be a nightmare to debug if your program ever does end up running under low-memory conditions.
One of the arguments presented to me before was that you might catch an OutOfMemoryException and then switch to lower-memory code, like a smaller buffer or a streaming model. However, this "Expection Handling" is a well-known anti-pattern. If you know you're about to chew up a huge amount of memory and aren't sure whether or not the system can handle it, then check the available memory, or better yet, just refactor your code so that it doesn't need so much memory all at once. Don't rely on the OutOfMemoryException to do it for you, because - who knows - maybe the allocation will just barely succeed and trigger a bunch of out-of-memory errors immediately after your exception handler (possibly in some completely different component).
So my simple answer to this question is: Never.
My weasel-answer to this question is: It's OK in a global exception handler, if you're really really careful. Not in a try-catch block.
One practical reason for catching this exception is to attempt a graceful shutdown, with a friendly error message instead of an exception trace.
The problem is larger than .NET. Almost any application written from the fifties to now has big problems if no memory is available.
With virtual address spaces the problem has been sort-of salvaged but NOT solved because even address spaces of 2GB or 4GB may become too small. There are no commonly available patterns to handle out-of-memory. There could be an out-of-memory warning method, a panic method etc. that is guaranteed to still have memory available.
If you receive an OutOfMemoryException from .NET almost anything may be the case. 2 MB still available, just 100 bytes, whatever. I wouldn't want to catch this exception (except to shutdown without a failure dialog). We need better concepts. Then you may get a MemoryLowException where you CAN react to all sorts of situations.
The problem is that - in contrast to other Exceptions - you usually have a low memory situation when the exception occurs (except when the memory to be allocated was huge, but you don't really know when you catch the exception).
Therefore, you must be very careful not to allocate memory when handling this exception. And while this sounds easy it's not, actually it's very hard to avoid any memory allocation and do something useful. Therefore, catching it is usually not a good idea IMHO.
Write code, don't hijack the JVM. When VM is humbly telling you that a memory allocation request failed your best bet is to discard the state of application to avert corrupting application data. Even if you decide to catch OOM you should only try to gather diagnostic information like dumping log, stacktrace etc. Please do not try to initiate a backout procedure as you are not sure whether it will get a chance to execute or not.
Real world analogy: You are traveling in a plane and all engines fail. What would you do after catching a AllEngineFailureException ? Best bet is to grab the mask and prepare for a crash.
When in OOM, dump!!

Categories

Resources