Windows service / A new guard page for the stack cannot be created - c#

I have a windows service that does some intensive work every one minute (actually it is starting a new thread each time in which it syncs to different systems over http). The problem is, that after a few days it suddenly stops without no error message.
I have NLog in place and I have registered for AppDomain.CurrentDomain.UnhandledException. The last entry in the textfile-log is just a normal entry without any problems. Looking in the EventLog, I also can't find any message in the application log, however, there are two entries in the system log.
One basically says that the service has been terminated unexpectedly. Nothing more. The second event (at the same time as the first one) says: "...A new guard page for the stack cannot be created..."
From what I've read, this is probably a stack overflow exception. I'm not parsing any XML and I don't do recursive work. I host a webserver using Gate, Nancy and SignalR and have RavenDB running in embedded mode. Every minute a new task is started using the Taskfactory from .NET 4.0 and I also have a ContinueWith where I re-start a System.Timers.Timer to fire again in one minute.
How can I start investigating this issue? What could be possible reasons for such an error?

Based on the information that you provided, I would at least, at the minimum, do the following:
Pay extra attention to any third party calls, and add additional info logging around those points.
There are some circumstances in which AppDomain.CurrentDomain.UnhandledException won't help you - a StackOverflowException being one of them. I believe the CLR will simply just give you a string in this case instead of a stack trace.
Pay extra attention around areas where more than one thread is introduced.
An example of an often overlooked StackOverflowException is:
private string myString;
public string MyString { get { return MyString; } } //should be myString

I got this on a particular computer and traced it to a c# object referencing itself from within an initializer

Just as a 'for what it is worth' - in my case this error was reported when the code was attempting to write to the Windows Event Log and the interactive user did not have sufficient permission. This was a small console app that logged exceptions to a text file and the event log (if desired). On exception, the text file was being updated but then this error was thrown and not caught by the error handling. Disabling the Event Logging stopped the error occurring.

Just in case any other person is having the same problem, in my case I found that my windows service was trapped in an endless recursive loop accidentally. So If anyone else have this problem, take in consideration method calls that may be causing huge recursive loops.

I think why you might all be stumped is because this MAY BE a SSD hardware fault. I get this error consistently while playing games about every 3-5 hours and its my computers page file failing somehow.. I know it isnt RAM because i replaced my CPU/RAM/MOBO combo trying to battle this. And its not programming because different games and different apps all fail at the same time, unless its windows corruption?
I could be wrong but just an idea.
I have two samsung evo's in raid

Related

Application shutting down without notice

I am currently managing a complicated application. It's written in C# and .Net 4.7.2.
Sometimes this program shuts down without notice. No error message even with a try/catch block and MessageBox.Show() in the Main method (I know it's probably not the best way but should work).
There are several threads running at different points, calling external DLLs and sometimes even drivers. So in order to log whether it's another thread that crashes the whole thing, I do this at the beginning :
AppDomain.CurrentDomain.UnhandledException += CurrentDomain_UnhandledException;
Application.ThreadException += Application_ThreadException;
Because I'm not sure which one is the correct one. In the methods, I log the Exception (after performing null checks) into a file (using File.AppendText and a timestamped based file).
Still nothing. The application keeps crashing after some random amount of time (between 2 and 6 hours) and I have no log information, no error message and I'm getting kind of lost here.
The app is running in Release mode and I cannot use Visual Studio to run the debugger into it (because that would make it easy). Maybe there's another way to run an external debugger ?
Can someone give me a hint on how to catch up for an exception that would cause an application to crash silently ?
Based on your explanations the only thing that brings to my mind is that you have some fire and forget threads in your application that throw exception sometimes but your application can't keep track of them to log or catch their exceptions.
Make sure all your tasks are awaited correctly and you don't have any async void method.
If you really need some fire and forget actions in your app, at least keep them alive with something like private Task fireAndForgetTaskAliver in your classes.
Another probability could be memory leak in your app that causes stack overflow exception.
The only way to catch an exception that is not caught anywhere in the code is indeed to look it the Windows Event Log, under Applications.
Thanks to Pavel Anikhouski for his comment.

Is it possible to write a watchdog process to catch application crashes?

We're developing a video game that has literally no bugs ever has, like any application, bugs that can on occasion cause hard crashes. Unfortunately a number of the crashes we've cataloged so far are out of our control in terms be being able to solve them or work around them due to the closed source middleware we're using (Unity 3D).
Whilst we can hope and wait for the middleware developer to fix the problem we'd like to see if its possible to at least make the crashes more informative and user friendly. For example - One of the rare crashes our users can have is that certain AV products cause some kind of thread context race condition and cause the game to explode. We'd like to be able to detect the crash and error signature, and provide to the user a link to our wiki or forums on how to resolve it (If possible).
Is it possible to write a lightweight watchdog process or parent process that can respond to crash events on the Windows platforms?
Collecting crash dumps outside the crashing process is essential. You never know whether your unhandled exception handler is affected or not. But there are other options:
Enable WER LocalDumps and write a watchdog (FileSystemWatcher) for the directory where the dumps are stored.
Configure AeDebug and attach your own debugger at the time of the crash.

Keep an Application Running even if an unhandled Exception occurs

What ?
I am developing a Console Application that needs to keep running 24/7 No Matter WhatIs there any way to stop a Multi-Threaded Application from getting blown up by some unhandled exception happening in "some thread somewhere" ?
Why ?
Please refrain from giving lessons like "you should manage all your exceptions", "this should never happen" etc. I have my reasons : We are in test deployment and we need to keep this running, log exceptions, and restart all threads again. If anything unplanned happens and causes an unhandled exception to be thrown, it needs to be detected and some method called to restart all threads(atomicity is impossible due due the tier design)
This being said, I am aware it might no be possible to restart an application from "within" if it has blown because of and UnhandledException (which I already implemented).
So far, I have used Quartz.net's FileScan Job and a Flag File to detect stuff like that and restart the application from outwards. But this sounds hacky to me. I would like to find something cleaner and less quick and dirty.
DownVoting / Sniping Warning : I KNOW this might NOT be possible "as is'". Please be creative/helpful rather than abruptly critic and think of this more as an "Open question"
If it is going to run 24/7 anyway, why not just write it as a Windows service and take advantage of the recovery options built right into windows?
This approach has the additional advantage of being able to survive a machine reboot, and it will log failures/restarts in the system event logs.
You need to attach an event handler to UnhandledException event on the Current AppDomain:
AppDomain.CurrentDomain.UnhandledException += UnhandledExceptionHandler
Inside the handler, you will have to somehow save enough state (to a file, database, etc.) for the restarted application to pass along to the new threads. Then instantiate a new instance of your Console application with a call to System.Diagnostics.Process.Start("MyConsoleApp.exe").
Be very careful to introduce logic to avoid a continuous loop of crash/restart/crash/restart.
You can't keep a process running "no matter what". What if the process is killed?
You don't want to keep a process running "no matter what". What if the process state is corrupted in such a way that "bad things" happen if it keeps running?
Well I can think of a few drawbacks in the following solution, but it is good enough to me for the moment :
static void Main()
{
AppDomain.CurrentDomain.UnhandledException += new UnhandledExceptionEventHandler(OnUnhandledException);
CS = new ConsoleServer();
CS.Run();
}
public static void OnUnhandledException(object sender, UnhandledExceptionEventArgs e)
{
Exception exception = (Exception)e.ExceptionObject;
Logger.Log("UNHANDLED EXCEPTION : " + e.ExceptionObject.ToString());
Process.Start(#"C:\xxxx\bin\x86\Release\MySelf.exe");
}
If your using .Net 2.0 and above, the answer is you can't.
In the .NET Framework versions 1.0 and 1.1, an unhandled exception
that occurs in a thread other than the main application thread is
caught by the runtime and therefore does not cause the application to
terminate. Thus, it is possible for the UnhandledException event to be
raised without the application terminating. Starting with the .NET
Framework version 2.0, this backstop for unhandled exceptions in child
threads was removed, because the cumulative effect of such silent
failures included performance degradation, corrupted data, and
lockups, all of which were difficult to debug. For more information,
including a list of cases in which the runtime does not terminate, see
Exceptions in Managed Threads.
Taken from here:
http://msdn.microsoft.com/en-us/library/system.appdomain.unhandledexception.aspx
If you want your application to survive, then you will need very aggressive try/catch around your methods so nothing escapes.
I would advise using a windows service as mentioned by others. It's the same as a console application, but with an extra bit of service layer code on top. You could take your console app and covert it to a service application easily. Just need to override service.start/pause/stop methods.

Error when multiple users access my web app at the same time

I'm using .Net 2008 and Oracle 10g as my database. The problem I'm getting is after deploying the application in IIS, when multiple users access the same page at a time i'm getting the error. Can't get the output.
Note: Both the users accessing the same page, same menu at a time.
How can I resolve this?
My guess would be a standard thread-safety / synchronization bug, most likely due to some static resource such as a static connection. Obviously this is pure speculation without some more code, but it (=web-sites being highly threaded) is a surprisingly common oversight.
If it is a static resource, then... well, it probably shouldn't be static. Either per-request, or (specifically in the case of connections) scoped to the local code (and let the connection-pooling worry about re-use).
Does it "Work on your machine"? ;)
If not, try to deploy a version locally and attach the debugger to iis. Point two browsers at the site. Whenever your browsers are/seem stuck, open the debugger's threads window and see where the threads are blocked/blocking. You can also ask the debugger to stop on exception throwing
You mean that nothing appears in the Browser?
Look in your program's logs. Any error messages?
Put some trace statements into your code so that you can figure out where it's going.
So the error is saying there's a failure to create a table. Would you expect to create a table for each user? Have a look at the code around the table creation. Consider what the correct behaviour should be when two copies of that code run at the same time.
Again add trace into the code at these points so you can see what happening. Often it's easier to see this than to debug because when mutiple threads are running the debugger gets in the way of reality.,

Exception handling best practice in a windows service?

I am currently writing a windows service that runs entirely in the background and does something every day. My idea is that the service should be very stable so if something goes wrong it should not stop but try it next day again and of course log the exception. Can you suggest me any best practice how to make truly stable windows services?
I have read the article of Scott Hanselman of exception handling best practice where he writes that there are only few cases when you should swallow an exception. I think somehow that windows service is one of the few cases, but I would be happy to get some confirmation on that.
'Swallowing' an exception is different to 'abandoning a specific task without stopping the entire process'.
In our windows service, we catch exceptions, log their details, then gracefully degrade that task and wait for the next task. We can then use the log to troubleshoot the error while the server is still running.
The question you should be asking, is should your Windows service be fault tolerant. Remebering that any unhandled exceptions will bring the service down, which results in its immediate unavailability. How do you think your service should behave? Should it try and continue servicing whatever it needs to? Should it be terminated?
Actually, if you have an unexpected exception that is passed all the way to the top level of your service, you should not continue processing; log it and propogate it. If you truly need a "reliable" service, then you'll need a "watchdog" that restarts the original service when it exits.
Note that modern operating systems act as a watchdog, so you don't need a watchdog service in most cases (check out the "Recovery" tab under your Service properties). Historically, critical services would have a second "watchdog" service whose sole purpose is to restart the real service if it fails.
It sounds like your design may be able to make use of the scheduler; just let Windows take care of the "once a day" part and just have your service do the task a single time. If it fails, fine; Windows is responsible for starting it again the next day.
One final note: this level of reliability in a service is rarely needed. In commercial code, I've only seen it used in a couple of antivirus programs and a network filtering program (that had to be running or else all network communication would fail). I've done a couple "watchdog" programs myself, but these were for customers like auto companies who would lose tons of money when their assembly line systems went down. In addition to the software watchdog, these systems also had redundant power supplies, RAIDed hot-swappable hard drives, and a complete duplicate of the entire system for use as an automatic failover.
Just saying: you may want to reconsider how much you really need to increase reliability (keeing in mind that 100% reliability is impossible; it can only be approached, at exponential cost).
In my opinion, you should establish a strong distinction between unrecoverable and recoverable exceptions, i.e., exceptions that prevent the continuation of your service (if your "static" data structures are corrupted) and exceptions that just determine the failure of the current operation. To make clear the distinction you may have to separated exception classes hierarchies.
This distinction should go along with a strong distinction between the structures of the "supervisor" part of the service (the one that schedules the periodic action) and the part of the service that actually does such periodic action. In case of a recoverable exception, you could abort the running operation and completely reset this last part, obviously logging all the details of the exception to the system event log; on the other hand, if you got an unrecoverable error (supervisor's structures in an inconsistent state and SEH exceptions, of course) you should just log your error and exit, since continuing running in an inconsistent state is much more dangerous than not running at all.
Swallowing exceptions is rarely a good idea and as Scott says in his article, there really are only a few valid cases where it might be the best option.
My advice would be to firstly, know what exceptions you're catching and catch them. It'll be more useful to you in the future if you know what you're catching rather than the generic (Exception e)
Once you've caught the exception then as you stated above, writing that to a logging service, perhaps emailing the details to the maintainer of the code or even firing off another event that sets up a re-try of the code with a limit on the number of attempts before a new message is issued to the code maintainer.
By catching specific exceptions you can do specific things about them. You can also catch the general exception to ensure that exceptions you really didn't expect don't cause a complete system failure.
Once you know about exceptions you weren't aware of before, these can then be refactored into the next release with a more ideal way of handling them.
Like so many things in software development rarely does "one size fit all". If you deem it appropriate to swallow the exception with the intention of retrying at a later date then that's perfectly reasonable. What really does matter is that you clean up after yourself, log and determine a reasonable retry policy before notifying someone.
The Exception Handling Block of the Enterprise Library may prove useful as you can modify your exception policy within config without changing the code.
A service should never stop. There are two classes of errors, errors in the Service itself, and errors in data provided to the service. Data Errors should be reported but not ignored. These two goals can be accomplished by having the service log errors, by providing a way to transmit error information to the user, and by having the service retry the failure after the user (or programmer in the case of an error in the service) has corrected what caused the service to fail (obviously the service will have to be stopped, re-installed, and re-started if a program error is corrected).

Categories

Resources