application level global exception handler didn't get hit - c#

My .net application has a global exception handler by subscribing to AppDomain.Current.Domain UnhandledException event. On a few occassions i have seen that my application crashes but this global exception handler never gets hit. Not sure if its help but application is doing some COM interop.
My understanding is that as long as I don't have any local catch blocks swallowing the exception, this global exception handler should always be hit. Any ideas on what I might be missing causing this handler never been invoked?

Is this the cause of your problem?
AppDomain.CurrentDomain.UnhandledException not firing without debugging

The CLR is not all-powerful to catch every exception that unmanaged code can cause. Typically an AccessViolationException btw. It can only catch them when the unmanaged code is called from managed code. The scenario that's not supported is the unmanaged code spinning up its own thread and this thread causing a crash. Not terribly unlikely when you work with a COM component.
Since .NET 4.0, a Fatal Execution Engine exception no longer causes the UnhandledException event to fire. The exception was deemed too nasty to allow any more managed code to run. It is. And traditionally, a StackOverflowException causes an immediate abort.
You can diagnose this somewhat from the ExitCode of the process. It contains the exception code of the exception that terminated the process. 0x8013yyyy is an exception caused by managed code. 0xc0000005 is an access violation. Etcetera. You can use adplus, available from the Debugging Tools For Windows download to capture a minidump of the process. Since this is likely to be caused by the COM component, working with the vendor is likely to be important to get this resolved.

Since you are doing COM interop I do strongly suspect that some unmanaged code was running in another thread which did cause an unhandled exception. This will lead to application exit without a call to your unhandled exception handler.
Besides this with .NET 4.0 the policy did get stronger when the application is shut down without further notice.
Under the following conditions your application is shut down without further notice (Environmnt.FailFast).
Pre .NET 4:
StackOverFlowException
.NET 4:
StackoverFlowException
AccessViolationException
You can override the behaviour in .NET 4 by decorating a method with the HandleProcessCorruptedStateExceptionsAttribute or you can add the legacyCorruptedStateExceptionsPolicy tag to your App.config.
If your problem is an uncatched exception in unmanaged code you can either run your application under a debugger or you let it crash und collect a memory dump for post mortem debugging. Debugging crash dumps is usualy done with WindDbg.
After you have downloaded Windbg you have adplus (a vbs script located under Programm Files\Debugging Tools for Windows) which you can attach to your running process to trigger a crash dump when the process terminates due to an exception.
adplus -crash -p yourprocessid
Then you have a much better chance to find out what was going on when your process did terminate. Windows can also be configured to take a crash dump for you via DrWatson on older Windows Versions (Windows Error Reporting)
Crash Dump Generation
Hard core programmers will insist to create their own dump generation tool which basically uses the AEDebug registry key. When this key has a value which points to an existing executable it will be called when an application crashes which can e.g. show the Visual Studio Debugger Chooser Dialog or it can trigger the dump generation for your process.
Suspend Threads
An often overlooked thing is when you create a crash dump with an external tool (it is better to rely on external tools since you do not know how bad your process is corrupted and if it is out memory you are already in a bad situation) that you should suspend all threads from the crashed process before you take the dump.
When you take a big full memory dump it can take several minutes depending on the allocated memory of the faulted process. During this time the application threads can continue to wreak havoc on your application state leaving you with a dump which contains an inconsistent process state which did change during dump generation.

This would happen if your handler throws an exception.
It would also happen if you call Environment.FailFast or if you Abort the UI thread.

Related

Application shutting down without notice

I am currently managing a complicated application. It's written in C# and .Net 4.7.2.
Sometimes this program shuts down without notice. No error message even with a try/catch block and MessageBox.Show() in the Main method (I know it's probably not the best way but should work).
There are several threads running at different points, calling external DLLs and sometimes even drivers. So in order to log whether it's another thread that crashes the whole thing, I do this at the beginning :
AppDomain.CurrentDomain.UnhandledException += CurrentDomain_UnhandledException;
Application.ThreadException += Application_ThreadException;
Because I'm not sure which one is the correct one. In the methods, I log the Exception (after performing null checks) into a file (using File.AppendText and a timestamped based file).
Still nothing. The application keeps crashing after some random amount of time (between 2 and 6 hours) and I have no log information, no error message and I'm getting kind of lost here.
The app is running in Release mode and I cannot use Visual Studio to run the debugger into it (because that would make it easy). Maybe there's another way to run an external debugger ?
Can someone give me a hint on how to catch up for an exception that would cause an application to crash silently ?
Based on your explanations the only thing that brings to my mind is that you have some fire and forget threads in your application that throw exception sometimes but your application can't keep track of them to log or catch their exceptions.
Make sure all your tasks are awaited correctly and you don't have any async void method.
If you really need some fire and forget actions in your app, at least keep them alive with something like private Task fireAndForgetTaskAliver in your classes.
Another probability could be memory leak in your app that causes stack overflow exception.
The only way to catch an exception that is not caught anywhere in the code is indeed to look it the Windows Event Log, under Applications.
Thanks to Pavel Anikhouski for his comment.

Catching fatal exceptions thrown from unmanaged code

Currently, there is no way (at least I did not find a way) to catch fatal exceptions (such as Stack Overflow, Segfault, ..) with try-catch block.
I already started issue at .net core repository so for more details you can read there (https://github.com/dotnet/core/issues/4228)
What I'm trying to do is to make the application not crash when there is any segfault/stack overflow/any fatal exception in loaded unmanaged code. what happens now is that .NET CLR kills my application if any fatal error occurs.
Example:
In c# managed code loaded external c++ dll via kernel LoadLibrary function.
Assume the dll is intentionally created for robustness testing therefore when a specific function is called it triggers segfault (e.g. trying to get data from outside of array bounds).
When this error happens this gets caught by .net CLR and immediately kills the calling managed c# code(application).
What I would like is just report that this happens instead of dying silently.
I did some research and found out there is the reasoning behind that which is described in the issue above.

Keep an Application Running even if an unhandled Exception occurs

What ?
I am developing a Console Application that needs to keep running 24/7 No Matter WhatIs there any way to stop a Multi-Threaded Application from getting blown up by some unhandled exception happening in "some thread somewhere" ?
Why ?
Please refrain from giving lessons like "you should manage all your exceptions", "this should never happen" etc. I have my reasons : We are in test deployment and we need to keep this running, log exceptions, and restart all threads again. If anything unplanned happens and causes an unhandled exception to be thrown, it needs to be detected and some method called to restart all threads(atomicity is impossible due due the tier design)
This being said, I am aware it might no be possible to restart an application from "within" if it has blown because of and UnhandledException (which I already implemented).
So far, I have used Quartz.net's FileScan Job and a Flag File to detect stuff like that and restart the application from outwards. But this sounds hacky to me. I would like to find something cleaner and less quick and dirty.
DownVoting / Sniping Warning : I KNOW this might NOT be possible "as is'". Please be creative/helpful rather than abruptly critic and think of this more as an "Open question"
If it is going to run 24/7 anyway, why not just write it as a Windows service and take advantage of the recovery options built right into windows?
This approach has the additional advantage of being able to survive a machine reboot, and it will log failures/restarts in the system event logs.
You need to attach an event handler to UnhandledException event on the Current AppDomain:
AppDomain.CurrentDomain.UnhandledException += UnhandledExceptionHandler
Inside the handler, you will have to somehow save enough state (to a file, database, etc.) for the restarted application to pass along to the new threads. Then instantiate a new instance of your Console application with a call to System.Diagnostics.Process.Start("MyConsoleApp.exe").
Be very careful to introduce logic to avoid a continuous loop of crash/restart/crash/restart.
You can't keep a process running "no matter what". What if the process is killed?
You don't want to keep a process running "no matter what". What if the process state is corrupted in such a way that "bad things" happen if it keeps running?
Well I can think of a few drawbacks in the following solution, but it is good enough to me for the moment :
static void Main()
{
AppDomain.CurrentDomain.UnhandledException += new UnhandledExceptionEventHandler(OnUnhandledException);
CS = new ConsoleServer();
CS.Run();
}
public static void OnUnhandledException(object sender, UnhandledExceptionEventArgs e)
{
Exception exception = (Exception)e.ExceptionObject;
Logger.Log("UNHANDLED EXCEPTION : " + e.ExceptionObject.ToString());
Process.Start(#"C:\xxxx\bin\x86\Release\MySelf.exe");
}
If your using .Net 2.0 and above, the answer is you can't.
In the .NET Framework versions 1.0 and 1.1, an unhandled exception
that occurs in a thread other than the main application thread is
caught by the runtime and therefore does not cause the application to
terminate. Thus, it is possible for the UnhandledException event to be
raised without the application terminating. Starting with the .NET
Framework version 2.0, this backstop for unhandled exceptions in child
threads was removed, because the cumulative effect of such silent
failures included performance degradation, corrupted data, and
lockups, all of which were difficult to debug. For more information,
including a list of cases in which the runtime does not terminate, see
Exceptions in Managed Threads.
Taken from here:
http://msdn.microsoft.com/en-us/library/system.appdomain.unhandledexception.aspx
If you want your application to survive, then you will need very aggressive try/catch around your methods so nothing escapes.
I would advise using a windows service as mentioned by others. It's the same as a console application, but with an extra bit of service layer code on top. You could take your console app and covert it to a service application easily. Just need to override service.start/pause/stop methods.

Diagnose/Debug potential stack corruption .NET application

I think I have a curly one here... I have an WinForms application that crashes fairly regularly every hour or so when running as an x64 process. I suspect this is due to stack corruption and would like to know if anyone has seen a similar issue or has some advice for diagnosing and detecting the issue.
The program in question has no visible UI. It's just a message window that sits in the background and acts as a sort of 'middleware' between our other client programs and a server.
It dies in different ways on different machines. Sometimes it's an 'APPCRASH' dialog that reports a fault in ntdll.dll. Sometimes it's an 'APPCRASH' that reports our own dll as the culprit. Sometimes it's just a silent death. Sometimes our unhandled exception hook logs the error, sometimes it doesn't.
In the cases where Windows Error Reporting kicks in, I've examined memory dumps from several different crash scenarios and found the same Managed exception in memory each time. This is the same exception I see reported as an unhandled exception in the cases where we it logs before it dies.
I've also been lucky (?) enough to have the application crash while I was actively debugging with Visual Studio - and saw that same exception take down the program.
Now here's the kicker. This particular exception was thrown, caught and swallowed in the first few seconds of the program's life. I have verified this with additional trace logging and I have taken memory dumps of the application a couple of minutes after application startup and verified that exception is still sitting there in the heap somewhere. I've also run a memory profiler over the application and used that to verify that no other .NET object had a reference to it.
The code in question looks a bit like this (vastly simplified, but maintains the key points of flow control)
public class AClass
{
public object FindAThing(string key)
{
object retVal = null;
Collection<Place> places= GetPlaces();
foreach (Place place in places)
{
try
{
retval = place.FindThing(key);
break;
}
catch {} // Guaranteed to only be a 'NotFound' exception
}
return retval;
}
}
public class Place
{
public object FindThing(string key)
{
bool found = InternalContains(key); // <snip> some complex if/else logic
if (code == success)
return InternalFetch(key);
throw new NotFoundException(/*UsefulInfo*/);
}
}
The stack trace I see, both in the event log and when looking at the heap with windbg looks a bit like this.
Company.NotFoundException:
Place.FindThing()
AClass.FindAThing()
Now... to me that reeks of something like stack corruption. The exception is thrown and caught while the application is starting up. But the pointer to it survives on the stack for an hour or more, like a bullet in the brain, and then suddenly breaches a crucial artery, and the application dies in a puddle.
Extra clues:
The code within 'InternalFetch' uses some Marshal.[Alloc/Free]CoTask and pinvoke code. I have run FxCop over it looking for portability issues, and found nothing.
This particular manifestation of the issue is only affecting x64 code built in release mode (with code optimization on). The code I listed for the 'Place.Find' method reflects the optimized .NET code. The unoptimized code returns the found object as the last statement, not 'throw exception'.
We make some COM calls during startup before the above code is run... and in a scenario where the above problem is going to manifest, the very first COM call fails. (Exception is caught and swallowed). I have commented out that particular COM call, and it does not stop the exception sticking around on the heap.
The problem might also affect 32 bit systems, but if it does - then the problem does not manifest in the same spot. I was only sent (typical users!) a few pixels worth of a screen shot of an 'APP CRASH' dialog, but the one thing I could make out was 'StackHash_2264' in the faulting module field.
EDIT:
Breakthrough!
I have narrowed down the problem to a particular call to SetTimer.
The pInvoke looks like this:
[DllImport("user32")]
internal static extern IntPtr SetTimer(IntPtr hwnd, IntPtr nIDEvent, int uElapse, TimerProc CB);
internal delegate void TimerProc(IntPtr hWnd, uint nMsg, IntPtr nIDEvent, int dwTime);
There is a particular class that starts a timer in its constructor. Any timers set before that object is constructed work. Any timers set after that object is constructed work. Any timer set during that constructor causes the application to crash, more often than not. (I have a laptop that crashes maybe 95% of the time, but my desktop only crashes 10% of the time).
Whether the interval is set to 1 hour, or 1 second, seems to make no different. The application dies when the timer is due - usually by throwing some previously handled exception as described above. The callback does not actually get executed. If I set the same timer on the very next line of managed code after the constructor returns - all is fine and happy.
I have had a debugger attached when the bad timer was about to fire, and it caused an access violation in 'DispatchMessage'. The timer callback was never called. I have enabled the MDAs that relate to managed callbacks being garbage collected, and it isn't triggering. I have examined the objects with sos and verified that the callback still existed in memory, and that the address it pointed to was the correct callback function.
If I run '!analyze -v' at this point, it usually (but not always) reports something along the lines of 'ERROR_SXS_CORRUPT_ACTIVATION_STACK'
Replacing the call to SetTimer with Microsoft's 'System.Windows.Forms.Timer' class also stops the crash. I've used a Reflector on the class and can see internally it still calls SetTimer - but does not register a procedure. Instead it has a native window that receives the callback. It's pInvoke definition actually looks wrong... it uses 'ints' for the eventId, where MSDN documentation says it should be a UIntPtr.
Our own code originally also used 'int' for nIDEvent rather than IntPtr - I changed it during the course of this investigation - but the crash continued both before and after this declaration change. So the only real difference that I can see is that we are registering a callback, and the Windows class is not.
So... at this stage I can 'fix' the problem by shuffing one particular call to SetTimer to a slightly different spot. But I am still no closer to actually understanding what is so special about starting the timer within that constructor that causes this error. And I dearly would like to understand the root cause of this issue.
Just briefly thinking about it it sounds like an x64 interop issue (i.e., calling x32 native functions from x64 managed code is fraught with danger). Does the problem go away if you force your application to compile as x32 platform from within project properties?
You can read suggestions on forcing x32 compile during x32/x64 development on Dotnetrocks. Richard Campbell's suggestion is that Visual Studio should default to x32 platform and not AnyCPU.
http://www.dotnetrocks.com/default.aspx?showNum=341 (transcript).
With regard to advanced debugging, I have not had a chance to debug x64 interop code, but i hear that this book is an great resource: Advanced .NET Debugging.
Finally, one thing you might try is force Visual Studio to break when an exception is thrown.
Use something like DebugDiag for x64 or Windbg to write a dump on Kernel32!TerminateProcess and second chance exception on .NET which should give you the actual .excr context frame of the exception that occurred.
This should help you in identifying the call-stack for the process terminate.
IMO it could be mostly because of PInvoke calls. You could use Managed Debugging Assistants to debug these issues.
If MDA is used along with Windbg it would give out messages that would be helpful in debugging
Also I have found tools from the http://clrinterop.codeplex.com/ team are extremely handy when dealing with interop
EDIT
This should give an answer why it is not working in 64 bit Issue with callback method in SetTimer Windows API called from C# code .
This does sound like a corruption issue. I would go through all of your interop calls and ensure that all of the parameters to the DllImport'ed functions are the correct types. For exmaple, using an int in place of an IntPtr will work in 32 bit code but can crash 64 bit.
I would use a site like PInvoke.net to verify all of the signatures.

Finally Block Not Running?

Ok this is kind of a weird issue and I am hoping someone can shed some light. I have the following code:
static void Main(string[] args)
{
try
{
Console.WriteLine("in try");
throw new EncoderFallbackException();
}
catch (Exception)
{
Console.WriteLine("in Catch");
throw new AbandonedMutexException();
}
finally
{
Console.WriteLine("in Finally");
Console.ReadLine();
}
}
NOW when I compile this to target 3.5(2.0 CLR) it will pop up a window saying "XXX has stopped working". If I now click on the Cancel button it will run the finally, AND if I wait until it is done looking and click on the Close Program button it will also run the finally.
Now what is interesting and confusing is IF I do the same thing compiled against 4.0 Clicking on the Cancel button will run the finally block and clicking on the Close Program button will not.
My question is: Why does the finally run on 2.0 and not on 4.0 when hitting the Close Program button? What are the repercussions of this?
EDIT: I am running this from a command prompt in release mode(built in release mode) on windows 7 32 bit. Error Message: First Result below is running on 3.5 hitting close after windows looks for issue, second is when I run it on 4.0 and do the same thing.
I am able to reproduce the behavior now (I didn't get the exact steps from your question when I was reading it the first time).
One difference I can observe is in the way that the .NET runtime handles the unhandled exception. The CLR 2.0 runs a helper called Microsoft .NET Error Reporting Shim (dw20.exe) whereas the CLR 4.0 starts Windows Error Reporting (WerFault.exe).
I assume that the two have different behavior with respect to terminating the crashing process. WerFault.exe obviously kills the .NET process immediately whereas the .NET Error Reporting Shim somehow closes the application so that the finally block still is executed.
Also have a look at the Event Viewer: WerFault logs an application error notifying that the crashed process was terminated:
Application: ConsoleApplication1.exe
Framework Version: v4.0.30319
Description: The process was terminated due to an unhandled exception.
Exception Info: System.Threading.AbandonedMutexException
Stack:
at Program.Main(System.String[])
dw20.exe however only logs an information item with event id 1001 to the Event Log and does not terminate the process.
Think about how awful that situation is: something unexpected has happened that no one ever wrote code to handle. Is the right thing to do in that situation to run even more code, that was probably also not built to handle this situation? Possibly not. Often the right thing to do here is to not attempt to run the finally blocks because doing so will make a bad situation even worse. You already know the process is going down; put it out of its misery immediately.
In a scenario where an unhandled exception is going to take down the process, anything can happen. It is implementation-defined what happens in this case: whether the error is reported to Windows error reporting, whether a debugger starts up, and so on. The CLR is perfectly within its rights to attempt to run finally blocks, and is also perfectly within its rights to fail fast. In this scenario all bets are off; different implementations can choose to do different things.
All my knowledge on this subject is taken from this article here: http://msdn.microsoft.com/en-us/magazine/cc793966.aspx - please note it is written for .NET 2.0 but I have a feeling it makes sense for what we were experiencing in this case (more than "because it decided to" anyways)
Quick "I dont have time to read that article" answer (although you should, it's a really good one):
The solution to the problem (if you absolutly HAVE to have your finally blocks run) would be to a) put in a global error handler or b) force .NET to always run finally blocks and do things the way it did (arguably the wrong way) in .NET 1.1 - Place the following in your app.config:
<legacyUnhandledExceptionPolicy enabled="1">
The reason for it:
When an exception is thrown in .NET it starts walking back through the stack looking for exception handlers and when it finds one it then does a second walk back through the stack running finally blocks before running the content of the catch. If it does not find a catch then this second walk never happens thus the finally blocks are never run here which is why a global exception handler will always run finally clauses as the CLR will run them when it finds the catch, NOT when it runs it (which I belive means even if you do a catch/throw your finally blocks will still get run).
The reason the app.config fix works is because for .NET 1.0 and 1.1 the CLR had a global catch in it which would swallow Exceptions before they went unmanaged which would, being a catch of course, trigger the finally blocks to run. Of course there is no way the framework can know enough about said Exception to handle it, take for example a stack overflow, so this is probably the wrong way of doing it.
The next bit is where it gets a bit sticky, and I am making assumptions based off of what the article says here.
If you are in .NET 2.0+ without the legacy exception handling on then your Exception would fall out into the Windows exception handling system (SEH) which seems pretty darn similar to the CLR one, in that it walks back through frames until it fails to find a catch and then calls a series of events called the Unhandled Exception Filter (UEF). This is an event you can subscribe to, but it can only have ONE thing subscribed to it at a time, so when something does subscribe Windows hands it the address of the callback that was there before, allowing you to set up a chain of UEF handlers - BUT THEY DON'T HAVE TO HONOR that address, they should call the address themselves, but if one breaks the chain, bap, you get no more error handling. I assume that this is what is happening when you cancel windows error reporting, it breaks the UEF chain which means that the application is shut down immediately and the finally blocks are not run, however if you let it run to the end and close it, it will call the next UEF in the chain. .NET will have registerd one which is what the AppDomain.UnhandledException is called from (thus even this event is not guaranteed) which I assume is also where you get your finally blocks called from - as I can't see how if you never transition back into the CLR a managed finally block can run (the article does not go into this bit.)
I believe this has something to do with changes to how the debugger is attached.
From the .NET Framework 4 Migration Issues document:
You are no longer notified when the debugger fails to start, or when there is no registered debugger that should be started.
What happens is that you choose to start the debugger, but you cancel it. I believe this falls under this category and the application just stops because of this.
Ran this in both release and debug, in both framework 3.5 and 4.0, I see "in Finally" in all instances, yes running it from command line, went as far as closing my vs sessions, maybe it's something on your machine or as Kobi pointed out, maybe platform related (I'm on Win7 x64)

Categories

Resources