Is it possible to write a watchdog process to catch application crashes?

Is it possible to write a watchdog process to catch application crashes? - c#

We're developing a video game that has literally no bugs ever has, like any application, bugs that can on occasion cause hard crashes. Unfortunately a number of the crashes we've cataloged so far are out of our control in terms be being able to solve them or work around them due to the closed source middleware we're using (Unity 3D).
Whilst we can hope and wait for the middleware developer to fix the problem we'd like to see if its possible to at least make the crashes more informative and user friendly. For example - One of the rare crashes our users can have is that certain AV products cause some kind of thread context race condition and cause the game to explode. We'd like to be able to detect the crash and error signature, and provide to the user a link to our wiki or forums on how to resolve it (If possible).
Is it possible to write a lightweight watchdog process or parent process that can respond to crash events on the Windows platforms?

Collecting crash dumps outside the crashing process is essential. You never know whether your unhandled exception handler is affected or not. But there are other options:
Enable WER LocalDumps and write a watchdog (FileSystemWatcher) for the directory where the dumps are stored.
Configure AeDebug and attach your own debugger at the time of the crash.

Related

Window "capture" application, upon unexpected termination, allows captured windows to disappear, how can I prevent/fix this issue?

I have an application (C# + WPF) that attempts to wrest control of the graphical interface of any process passed to it as an input and resize/reposition for my own purposes.
It does its job rather well, I think. Upon expected termination (the base class inherits from IDisposable) the "captured" process is released - its parent is set to the original, its windowstyle is reset, etc. etc.
In fact, on testing, I can capture, release, recapture, and so on, the same process as many times as I want with no issues.
However, upon unexpected termination (say another process forcefully kills it), the process never regains its graphical interface! I can tell its still running but I can never set that process back to its original state.
It almost seems like the process doesn't respond to window-based Win32 API calls that set specific window features anymore (for example, I can get information with GetParent, GetWindowThreadProcessId, etc but calling ShowWindow or related results in nothing).
Any suggestions on why this is happening? I'm guessing that since I set the parent of the process to my WPF application (which then unexpectedly closes) it causes some issue in trying to recover the initial interface?
This is why it's happening (or, at least, an indication of why I had so much difficulty finding the issue out on my own); can I recover from it? And, if so, how?
Edit -
IInspectable makes a good point in the comments, question adjusted to make better sense for this particular application.

It seems I've gotten my answer; so, for the sake of completeness I'll post what I've gotten here in case anyone else has a similar issue.
According to the information provided by IInspectable in here and here (with more context in the comments), it seems that what I'm trying to do here (assign a new parent cross-process) is essentially unsupported behavior.
My Solution:
Recovering (at least at the point that I'm talking about - i.e. unexpected crashes or exits) probably isn't feasible, as we've already gone off the end in undetermined/unknown behavior. So I've decided to go for the preventative route.
Our current project already makes use of the Nancy framework to communicate across servers/processes so I'm going to "refine" our shutdown procedure a bit for my portion of the program to allow it to exit more gracefully.
In the case of a truely unexpected termination, I'm still at a loss. I could just restart the processes (actually services with a console output, in our case, but w/e) but my application is just a GUI/Interface and isn't very important when compared to the function these processes serve. I may make some sort of semaphore file that indicates whether a successful shutdown occurs and branch my code off so that it indicates that the processes are no longer visible until the next time they're restarted.

What will be the exception code for a service crash?

I have created a WPF service for tracking a user session, while tracking the user session I also want to track a event of service crash. For that I have been checking the windows event log and identifying the error. But I am confused, It was showing a error there which tells that failed to process a sessionchange!
Is that a service crash?? Is there any specific exception code for a service crash/
Can anyone help with suggesting relavent articles/ points for identify a system crash?

Not a crash, you are just seeing the .NET framework's ServiceBase class doing its job. In a few specific cases it will catch an exception and create an entry in the application event log. In does so in its code that causes the OnStart(), OnStop(), etcetera method to run.
Looks like the service's OnSessionChange() method fell over, just a bog-standard file locking error. In all likelihood the service code is a bit clumsy, it needs to open that file in its Main() method so nobody else can mess with it. Probably wasn't tested really well, OnSessionChange() does not fire very often. And certainly little reason to try to log anything, but who knows.
This should not otherwise affect the service process, the service control manager doesn't give much of a hoot if the OnSessionChange notification fails. Nothing it can do about it. So you are seeing this mostly because you started looking, services do tend to misbehave without anybody noticing. It just isn't very visible that they do. Do make sure it wasn't your code that put a lock on the Log.txt file. If you do then you'll have to use FileShare.ReadWrite to prevent the service from falling over.

How to unload unsued COM objects/libraries after a complete restart?

Here is the thing. I'm connecting via COM to some devices at KNX/EIB. But sometimes - and I want to be ready for worst-case anyways - my application crashes leaving all objects and libraries exposed somewhere, somehow. I noticed when I restart the app I have trouble to get a connection again. I get an error for a connection procedure that is actually working well normally. Sometimes this connect procedure is working sometimes it is not, randomly. That is bad! After some time (several minutes) it seems to work again after a series of complete fails. But I think I see a pattern now. It doesn't work after a crash with no clean disconnect. My guess is there are objects that hold a connection to the device that us why I can't get a new connection. This is why I ask this question.
Question:
How do I unload those unused objects to kill undead connections?
How do I make Windows to check for unused libraries to be unloaded?
I just want to tell Windows, "I messed up badly and I need to continue my work. Please clean up my mess for me, so I can start fresh! Do I deserve a 2nd chance?"
Edit:
The scenario is the app has crashed and closed. I have no references to anything anymore. No finally clause or anything. The app can only be started again. What can I do to clean up the mess that has been made before, programmatically?
Edit 2:
Hans gave me the hint of killing the responsible server. So for now I solve that with calling taskkill on startup (at least as long I'm in dev). And it works!
C:\Windows\System32\taskkill.exe /F /IM Falcon.exe

This is the failure mode of an out-of-process COM server. If the client program crashes to the desktop without releasing the interface pointers then the server is completely unaware that the client isn't around anymore. And tends to get balky when you try to reconnect, many servers just permit one client.
By far the most common way that programmers induce this failure mode is by using a debugger. They'll click the Red Button or use the Stop Debugging command. Bam, no cleanup of course.
COM garbage-collects unused servers automatically. But that isn't particularly fast, takes an easy 10 minutes before it decides it needs to step in. And doesn't always work for every server, Office programs notoriously don't get cleaned-up for example.
Not much you can do about this when your app keels over in regular usage. Otherwise the kind of problem that killed middle-ware. Still, having such a mishap in a C# program is pretty unusual, the CLR releases interface pointers at program termination even when the app crashed with an exception. You'd have to have the very nasty kind of mishaps to bypass this, critical exceptions like ExecutionEngineException or the one this site is named after.
Don't focus too much on the Stop Debugging induced failures, it is normal and using Task Manager to kill the server is expected and required. Otherwise just be sure to get the nasty bugs out of your code and you won't have a problem. If you need more help then be sure to contact the owner of the server, be sure to have a small repro project available that demonstrates the issue.

How can I terminate an ASP.NET application on IIS following an unrecoverable error?

Supopse I have a unhandled exception (or a known serious, unrecoverable error). The scariest situation is a security breach, but it could apply to anything that means my state is so badly hosed I can't expect to continue safely.
What do I do?
In a traditional application, the usual technique is to end my process, quickly. as soon as possible. I'm calling Process.Exit, TerminateProcess, die, or whatever other tool the environment has that means "END. NOW". Eric Lippert's post expresses the reasoning for this attitude well.
In a production ASP.NET application running on IIS, it's not so simple. I can certainly end the current process and cough an error to the event log or wherever. That's essentially what happens with any unhandled exception. But the next time a request comes in, IIS is just going to spin up a new worker process. If my fatal error was a transient problem that's great.
But if my problem persists past the lifetime of my process, the new one won't be any better. It could even be compounded by the intialization code or a reattempt. Plus, if IIS is running multiple worker processes within the same application pool, even killing my process doesn't kill the application. Logically speaking all those other workers may be hosed too and just not know it yet.
So far I've only come up with two options.
End the process and hope for the best. Knowing that the app will just be restarted, this is pretty much the same as "catch(Exception) {}". Hardly satisfying.
"Reaching out" to tell IIS to disable the app, stop IIS, the machine, etc. This seems like a brutal hack. Moreover I'd guess it's likely to require elevated security credentials. During termination of a possibly-compromised process seems like a poor time to have those.

What I can think of are something as following:
You can go ahead use the advanced setting of an Application Pool in IIS named "Rapid-Fail protection", set the Failure Interval long enough as you like, and make the Maximum Failures as 1, then go ahead thrown the exception and make the IIS think this application pool can't work correctly so that it will send back Service Unavailable to client side or even reset the connection(depend on your setting). For more detail please check it here: Failure Settings for an Application Pool . However you need to be very careful to not overkill, I mean you need to write a very good application that all exception been handled properly and only the one you want to terminate application can really been detected by IIS, otherwise maybe a single user click just brought down your site.
Another solution is just go ahead make it your own code, I mean you can record such an error in some certain way like creating a file named SystemCrashed, and then terminate the Application, then check if file exist on Application_Startup and do nothing but terminate the Application if file been found. Something like a lock. This need more code but maybe safer than IIS settings, I mean there can't be too much overkill as long as you get it right to remove the lock.

C# + Linq program randomly just disappears

I wrote a program that uses Linq to talk to an microsoft sql server. It runs nonstop and from time to time does some changes to the db, mostly at midnight. This works. But after a few days the process randomly just disappears. There is no excpetion window or an entry in the system event logs.
Now, I "fixed" it somehow. What I did: I just reconnect to the sql server every time it does some changes.
The sql server uns on the same machine btw. and there are other programs running using that sql server. So it can't be down or something like that. Besides, I'ld expect an exception in that case.
Just in case it's important: There are other clients using that same database.
How is it possible a .net app can just disappear? Shouldn't it throw exceptions? And even if it uses some native code, which this process does not, wouldn't there be a message like "windows terminated this process because of xxxxx"?

How is it possible a .net app can just disappear?
One of three things happened: The program terminated normally, the program terminated itself abnormally (via an exception or a failfast), or some other process terminated the process.
Shouldn't it throw exceptions?
I don't understand the question.
And even if it uses some native code, which this process does not, wouldn't there be a message like "windows terminated this process because of xxxxx"?
Well, first off, it might not be Windows terminating the process. For example, perhaps someone attached a debugger to the process and then instructed the debugger to terminate the process.
Some applications are noisy when they terminate abnormally, and some are not -- in particular, applications which terminate with a failfast by definition do not spend time terminating slowly -- writing to logs and letting you know what happened, and so on. That's because they're terminating as fast as possible.
Now, I "fixed" it somehow. What I did: I just reconnect to the sql server every time it does some changes
Were I in that situation I'd prefer to fix the problem by understanding the problem before I try to fix it.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.