Keep an Application Running even if an unhandled Exception occurs - c#

What ?
I am developing a Console Application that needs to keep running 24/7 No Matter WhatIs there any way to stop a Multi-Threaded Application from getting blown up by some unhandled exception happening in "some thread somewhere" ?
Why ?
Please refrain from giving lessons like "you should manage all your exceptions", "this should never happen" etc. I have my reasons : We are in test deployment and we need to keep this running, log exceptions, and restart all threads again. If anything unplanned happens and causes an unhandled exception to be thrown, it needs to be detected and some method called to restart all threads(atomicity is impossible due due the tier design)
This being said, I am aware it might no be possible to restart an application from "within" if it has blown because of and UnhandledException (which I already implemented).
So far, I have used Quartz.net's FileScan Job and a Flag File to detect stuff like that and restart the application from outwards. But this sounds hacky to me. I would like to find something cleaner and less quick and dirty.
DownVoting / Sniping Warning : I KNOW this might NOT be possible "as is'". Please be creative/helpful rather than abruptly critic and think of this more as an "Open question"

If it is going to run 24/7 anyway, why not just write it as a Windows service and take advantage of the recovery options built right into windows?
This approach has the additional advantage of being able to survive a machine reboot, and it will log failures/restarts in the system event logs.

You need to attach an event handler to UnhandledException event on the Current AppDomain:
AppDomain.CurrentDomain.UnhandledException += UnhandledExceptionHandler
Inside the handler, you will have to somehow save enough state (to a file, database, etc.) for the restarted application to pass along to the new threads. Then instantiate a new instance of your Console application with a call to System.Diagnostics.Process.Start("MyConsoleApp.exe").
Be very careful to introduce logic to avoid a continuous loop of crash/restart/crash/restart.

You can't keep a process running "no matter what". What if the process is killed?
You don't want to keep a process running "no matter what". What if the process state is corrupted in such a way that "bad things" happen if it keeps running?

Well I can think of a few drawbacks in the following solution, but it is good enough to me for the moment :
static void Main()
{
AppDomain.CurrentDomain.UnhandledException += new UnhandledExceptionEventHandler(OnUnhandledException);
CS = new ConsoleServer();
CS.Run();
}
public static void OnUnhandledException(object sender, UnhandledExceptionEventArgs e)
{
Exception exception = (Exception)e.ExceptionObject;
Logger.Log("UNHANDLED EXCEPTION : " + e.ExceptionObject.ToString());
Process.Start(#"C:\xxxx\bin\x86\Release\MySelf.exe");
}

If your using .Net 2.0 and above, the answer is you can't.
In the .NET Framework versions 1.0 and 1.1, an unhandled exception
that occurs in a thread other than the main application thread is
caught by the runtime and therefore does not cause the application to
terminate. Thus, it is possible for the UnhandledException event to be
raised without the application terminating. Starting with the .NET
Framework version 2.0, this backstop for unhandled exceptions in child
threads was removed, because the cumulative effect of such silent
failures included performance degradation, corrupted data, and
lockups, all of which were difficult to debug. For more information,
including a list of cases in which the runtime does not terminate, see
Exceptions in Managed Threads.
Taken from here:
http://msdn.microsoft.com/en-us/library/system.appdomain.unhandledexception.aspx
If you want your application to survive, then you will need very aggressive try/catch around your methods so nothing escapes.
I would advise using a windows service as mentioned by others. It's the same as a console application, but with an extra bit of service layer code on top. You could take your console app and covert it to a service application easily. Just need to override service.start/pause/stop methods.

Related

Application shutting down without notice

I am currently managing a complicated application. It's written in C# and .Net 4.7.2.
Sometimes this program shuts down without notice. No error message even with a try/catch block and MessageBox.Show() in the Main method (I know it's probably not the best way but should work).
There are several threads running at different points, calling external DLLs and sometimes even drivers. So in order to log whether it's another thread that crashes the whole thing, I do this at the beginning :
AppDomain.CurrentDomain.UnhandledException += CurrentDomain_UnhandledException;
Application.ThreadException += Application_ThreadException;
Because I'm not sure which one is the correct one. In the methods, I log the Exception (after performing null checks) into a file (using File.AppendText and a timestamped based file).
Still nothing. The application keeps crashing after some random amount of time (between 2 and 6 hours) and I have no log information, no error message and I'm getting kind of lost here.
The app is running in Release mode and I cannot use Visual Studio to run the debugger into it (because that would make it easy). Maybe there's another way to run an external debugger ?
Can someone give me a hint on how to catch up for an exception that would cause an application to crash silently ?
Based on your explanations the only thing that brings to my mind is that you have some fire and forget threads in your application that throw exception sometimes but your application can't keep track of them to log or catch their exceptions.
Make sure all your tasks are awaited correctly and you don't have any async void method.
If you really need some fire and forget actions in your app, at least keep them alive with something like private Task fireAndForgetTaskAliver in your classes.
Another probability could be memory leak in your app that causes stack overflow exception.
The only way to catch an exception that is not caught anywhere in the code is indeed to look it the Windows Event Log, under Applications.
Thanks to Pavel Anikhouski for his comment.

Windows service / A new guard page for the stack cannot be created

I have a windows service that does some intensive work every one minute (actually it is starting a new thread each time in which it syncs to different systems over http). The problem is, that after a few days it suddenly stops without no error message.
I have NLog in place and I have registered for AppDomain.CurrentDomain.UnhandledException. The last entry in the textfile-log is just a normal entry without any problems. Looking in the EventLog, I also can't find any message in the application log, however, there are two entries in the system log.
One basically says that the service has been terminated unexpectedly. Nothing more. The second event (at the same time as the first one) says: "...A new guard page for the stack cannot be created..."
From what I've read, this is probably a stack overflow exception. I'm not parsing any XML and I don't do recursive work. I host a webserver using Gate, Nancy and SignalR and have RavenDB running in embedded mode. Every minute a new task is started using the Taskfactory from .NET 4.0 and I also have a ContinueWith where I re-start a System.Timers.Timer to fire again in one minute.
How can I start investigating this issue? What could be possible reasons for such an error?
Based on the information that you provided, I would at least, at the minimum, do the following:
Pay extra attention to any third party calls, and add additional info logging around those points.
There are some circumstances in which AppDomain.CurrentDomain.UnhandledException won't help you - a StackOverflowException being one of them. I believe the CLR will simply just give you a string in this case instead of a stack trace.
Pay extra attention around areas where more than one thread is introduced.
An example of an often overlooked StackOverflowException is:
private string myString;
public string MyString { get { return MyString; } } //should be myString
I got this on a particular computer and traced it to a c# object referencing itself from within an initializer
Just as a 'for what it is worth' - in my case this error was reported when the code was attempting to write to the Windows Event Log and the interactive user did not have sufficient permission. This was a small console app that logged exceptions to a text file and the event log (if desired). On exception, the text file was being updated but then this error was thrown and not caught by the error handling. Disabling the Event Logging stopped the error occurring.
Just in case any other person is having the same problem, in my case I found that my windows service was trapped in an endless recursive loop accidentally. So If anyone else have this problem, take in consideration method calls that may be causing huge recursive loops.
I think why you might all be stumped is because this MAY BE a SSD hardware fault. I get this error consistently while playing games about every 3-5 hours and its my computers page file failing somehow.. I know it isnt RAM because i replaced my CPU/RAM/MOBO combo trying to battle this. And its not programming because different games and different apps all fail at the same time, unless its windows corruption?
I could be wrong but just an idea.
I have two samsung evo's in raid

application level global exception handler didn't get hit

My .net application has a global exception handler by subscribing to AppDomain.Current.Domain UnhandledException event. On a few occassions i have seen that my application crashes but this global exception handler never gets hit. Not sure if its help but application is doing some COM interop.
My understanding is that as long as I don't have any local catch blocks swallowing the exception, this global exception handler should always be hit. Any ideas on what I might be missing causing this handler never been invoked?
Is this the cause of your problem?
AppDomain.CurrentDomain.UnhandledException not firing without debugging
The CLR is not all-powerful to catch every exception that unmanaged code can cause. Typically an AccessViolationException btw. It can only catch them when the unmanaged code is called from managed code. The scenario that's not supported is the unmanaged code spinning up its own thread and this thread causing a crash. Not terribly unlikely when you work with a COM component.
Since .NET 4.0, a Fatal Execution Engine exception no longer causes the UnhandledException event to fire. The exception was deemed too nasty to allow any more managed code to run. It is. And traditionally, a StackOverflowException causes an immediate abort.
You can diagnose this somewhat from the ExitCode of the process. It contains the exception code of the exception that terminated the process. 0x8013yyyy is an exception caused by managed code. 0xc0000005 is an access violation. Etcetera. You can use adplus, available from the Debugging Tools For Windows download to capture a minidump of the process. Since this is likely to be caused by the COM component, working with the vendor is likely to be important to get this resolved.
Since you are doing COM interop I do strongly suspect that some unmanaged code was running in another thread which did cause an unhandled exception. This will lead to application exit without a call to your unhandled exception handler.
Besides this with .NET 4.0 the policy did get stronger when the application is shut down without further notice.
Under the following conditions your application is shut down without further notice (Environmnt.FailFast).
Pre .NET 4:
StackOverFlowException
.NET 4:
StackoverFlowException
AccessViolationException
You can override the behaviour in .NET 4 by decorating a method with the HandleProcessCorruptedStateExceptionsAttribute or you can add the legacyCorruptedStateExceptionsPolicy tag to your App.config.
If your problem is an uncatched exception in unmanaged code you can either run your application under a debugger or you let it crash und collect a memory dump for post mortem debugging. Debugging crash dumps is usualy done with WindDbg.
After you have downloaded Windbg you have adplus (a vbs script located under Programm Files\Debugging Tools for Windows) which you can attach to your running process to trigger a crash dump when the process terminates due to an exception.
adplus -crash -p yourprocessid
Then you have a much better chance to find out what was going on when your process did terminate. Windows can also be configured to take a crash dump for you via DrWatson on older Windows Versions (Windows Error Reporting)
Crash Dump Generation
Hard core programmers will insist to create their own dump generation tool which basically uses the AEDebug registry key. When this key has a value which points to an existing executable it will be called when an application crashes which can e.g. show the Visual Studio Debugger Chooser Dialog or it can trigger the dump generation for your process.
Suspend Threads
An often overlooked thing is when you create a crash dump with an external tool (it is better to rely on external tools since you do not know how bad your process is corrupted and if it is out memory you are already in a bad situation) that you should suspend all threads from the crashed process before you take the dump.
When you take a big full memory dump it can take several minutes depending on the allocated memory of the faulted process. During this time the application threads can continue to wreak havoc on your application state leaving you with a dump which contains an inconsistent process state which did change during dump generation.
This would happen if your handler throws an exception.
It would also happen if you call Environment.FailFast or if you Abort the UI thread.

Finally Block Not Running?

Ok this is kind of a weird issue and I am hoping someone can shed some light. I have the following code:
static void Main(string[] args)
{
try
{
Console.WriteLine("in try");
throw new EncoderFallbackException();
}
catch (Exception)
{
Console.WriteLine("in Catch");
throw new AbandonedMutexException();
}
finally
{
Console.WriteLine("in Finally");
Console.ReadLine();
}
}
NOW when I compile this to target 3.5(2.0 CLR) it will pop up a window saying "XXX has stopped working". If I now click on the Cancel button it will run the finally, AND if I wait until it is done looking and click on the Close Program button it will also run the finally.
Now what is interesting and confusing is IF I do the same thing compiled against 4.0 Clicking on the Cancel button will run the finally block and clicking on the Close Program button will not.
My question is: Why does the finally run on 2.0 and not on 4.0 when hitting the Close Program button? What are the repercussions of this?
EDIT: I am running this from a command prompt in release mode(built in release mode) on windows 7 32 bit. Error Message: First Result below is running on 3.5 hitting close after windows looks for issue, second is when I run it on 4.0 and do the same thing.
I am able to reproduce the behavior now (I didn't get the exact steps from your question when I was reading it the first time).
One difference I can observe is in the way that the .NET runtime handles the unhandled exception. The CLR 2.0 runs a helper called Microsoft .NET Error Reporting Shim (dw20.exe) whereas the CLR 4.0 starts Windows Error Reporting (WerFault.exe).
I assume that the two have different behavior with respect to terminating the crashing process. WerFault.exe obviously kills the .NET process immediately whereas the .NET Error Reporting Shim somehow closes the application so that the finally block still is executed.
Also have a look at the Event Viewer: WerFault logs an application error notifying that the crashed process was terminated:
Application: ConsoleApplication1.exe
Framework Version: v4.0.30319
Description: The process was terminated due to an unhandled exception.
Exception Info: System.Threading.AbandonedMutexException
Stack:
at Program.Main(System.String[])
dw20.exe however only logs an information item with event id 1001 to the Event Log and does not terminate the process.
Think about how awful that situation is: something unexpected has happened that no one ever wrote code to handle. Is the right thing to do in that situation to run even more code, that was probably also not built to handle this situation? Possibly not. Often the right thing to do here is to not attempt to run the finally blocks because doing so will make a bad situation even worse. You already know the process is going down; put it out of its misery immediately.
In a scenario where an unhandled exception is going to take down the process, anything can happen. It is implementation-defined what happens in this case: whether the error is reported to Windows error reporting, whether a debugger starts up, and so on. The CLR is perfectly within its rights to attempt to run finally blocks, and is also perfectly within its rights to fail fast. In this scenario all bets are off; different implementations can choose to do different things.
All my knowledge on this subject is taken from this article here: http://msdn.microsoft.com/en-us/magazine/cc793966.aspx - please note it is written for .NET 2.0 but I have a feeling it makes sense for what we were experiencing in this case (more than "because it decided to" anyways)
Quick "I dont have time to read that article" answer (although you should, it's a really good one):
The solution to the problem (if you absolutly HAVE to have your finally blocks run) would be to a) put in a global error handler or b) force .NET to always run finally blocks and do things the way it did (arguably the wrong way) in .NET 1.1 - Place the following in your app.config:
<legacyUnhandledExceptionPolicy enabled="1">
The reason for it:
When an exception is thrown in .NET it starts walking back through the stack looking for exception handlers and when it finds one it then does a second walk back through the stack running finally blocks before running the content of the catch. If it does not find a catch then this second walk never happens thus the finally blocks are never run here which is why a global exception handler will always run finally clauses as the CLR will run them when it finds the catch, NOT when it runs it (which I belive means even if you do a catch/throw your finally blocks will still get run).
The reason the app.config fix works is because for .NET 1.0 and 1.1 the CLR had a global catch in it which would swallow Exceptions before they went unmanaged which would, being a catch of course, trigger the finally blocks to run. Of course there is no way the framework can know enough about said Exception to handle it, take for example a stack overflow, so this is probably the wrong way of doing it.
The next bit is where it gets a bit sticky, and I am making assumptions based off of what the article says here.
If you are in .NET 2.0+ without the legacy exception handling on then your Exception would fall out into the Windows exception handling system (SEH) which seems pretty darn similar to the CLR one, in that it walks back through frames until it fails to find a catch and then calls a series of events called the Unhandled Exception Filter (UEF). This is an event you can subscribe to, but it can only have ONE thing subscribed to it at a time, so when something does subscribe Windows hands it the address of the callback that was there before, allowing you to set up a chain of UEF handlers - BUT THEY DON'T HAVE TO HONOR that address, they should call the address themselves, but if one breaks the chain, bap, you get no more error handling. I assume that this is what is happening when you cancel windows error reporting, it breaks the UEF chain which means that the application is shut down immediately and the finally blocks are not run, however if you let it run to the end and close it, it will call the next UEF in the chain. .NET will have registerd one which is what the AppDomain.UnhandledException is called from (thus even this event is not guaranteed) which I assume is also where you get your finally blocks called from - as I can't see how if you never transition back into the CLR a managed finally block can run (the article does not go into this bit.)
I believe this has something to do with changes to how the debugger is attached.
From the .NET Framework 4 Migration Issues document:
You are no longer notified when the debugger fails to start, or when there is no registered debugger that should be started.
What happens is that you choose to start the debugger, but you cancel it. I believe this falls under this category and the application just stops because of this.
Ran this in both release and debug, in both framework 3.5 and 4.0, I see "in Finally" in all instances, yes running it from command line, went as far as closing my vs sessions, maybe it's something on your machine or as Kobi pointed out, maybe platform related (I'm on Win7 x64)

How to catch ALL exceptions/crashes in a .NET app [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
.NET - What’s the best way to implement a “catch all exceptions handler”
I have a .NET console app app that is crashing and displaying a message to the user.
All of my code is in a try{<code>} catch(Exception e){<stuff>} block, but still errors are occasionally displayed.
In a Win32 app, you can capture all possible exceptions/crashes by installing various exception handlers:
/* C++ exc handlers */
_set_se_translator
SetUnhandledExceptionFilter
_set_purecall_handler
set_terminate
set_unexpected
_set_invalid_parameter_handler
What is the equivalent in the .NET world so I can handle/log/quiet all possible error cases?
You can add an event handler to AppDomain.UnhandledException event, and it'll be called when a exception is thrown and not caught.
Contrary to what some others have posted, there's nothing wrong catching all exceptions. The important thing is to handle them all appropriately. If you have a stack overflow or out of memory condition, the app should shut down for them. Also, keep in mind that OOM conditions can prevent your exception handler from running correctly. For example, if your exception handler displays a dialog with the exception message, if you're out of memory, there may not be enough left for the dialog display. Best to log it and shut down immediately.
As others mentioned, there are the UnhandledException and ThreadException events that you can handle to collection exceptions that might otherwise get missed. Then simply throw an exception handler around your main loop (assuming a winforms app).
Also, you should be aware that OutOfMemoryExceptions aren't always thrown for out of memory conditions. An OOM condition can trigger all sorts of exceptions, in your code, or in the framework, that don't necessarily have anything to do with the fact that the real underlying condition is out of memory. I've frequently seen InvalidOperationException or ArgumentException when the underlying cause is actually out of memory.
This article in codeproject by our host Jeff Atwood is what you need.
Includes the code to catch unhandled exceptions and best pratices for showing information about the crash to the user.
The Global.asax class is your last line of defense.
Look at:
protected void Application_Error(Object sender, EventArgs e)
method
Be aware that some exception are dangerous to catch - or mostly uncatchable,
OutOfMemoryException: anything you do in the catch handler might allocate memory (in the managed or unmanaged side of the CLR) and thus trigger another OOM
StackOverflowException: depending whether the CLR detected it sufficiently early, you might get notified. Worst case scenario, it simply kills the process.
You can use the AppDomain.CurrentDomain.UnhandledException to get an event.
Although catching all exceptions without the plan to properly handle them is surely a bad practice, I think that an application should fail in some graceful way. A crash should not scare the user to death, and at least it should display a description of the error, some information to report to the tech support stuff, and ideally a button to close the application and restart it. In an ideal world, the application should be able to dump on disk the user data, and then try to recover it (but I see that this is asking too much).
Anyway, I usually use:
AppDomain.CurrentDomain.UnhandledException
You may also go with Application.ThreadException Event.
Once I was developing a .NET app running inside a COM based application; this event was the very useful, as AppDomain.CurrentDomain.UnhandledException didn't work in this case.
I think you should rather not even catch all Exception but better let them be shown to the user. The reason for this is that you should only catch Exceptions which you can actually handle. If you run into some Exception which causes the program to stop but still catch it, this might cause much more severe problems.
Also read FAQ: Why does FxCop warn against catch(Exception)?.
Be aware that catching these unhandled exceptions can change the security requirements of your application. Your application may stop running correctly under certain contexts (when run from a network share, etc.). Be sure to test thoroughly.
it doesn't hurt to use both
AppDomain.CurrentDomain.UnhandledException
Application.ThreadException
but keep in mind that exceptions on secondary threads are not caught by these handlers; use SafeThread for secondary threads if needed

Categories

Resources