I am currently writing a windows service that runs entirely in the background and does something every day. My idea is that the service should be very stable so if something goes wrong it should not stop but try it next day again and of course log the exception. Can you suggest me any best practice how to make truly stable windows services?
I have read the article of Scott Hanselman of exception handling best practice where he writes that there are only few cases when you should swallow an exception. I think somehow that windows service is one of the few cases, but I would be happy to get some confirmation on that.
'Swallowing' an exception is different to 'abandoning a specific task without stopping the entire process'.
In our windows service, we catch exceptions, log their details, then gracefully degrade that task and wait for the next task. We can then use the log to troubleshoot the error while the server is still running.
The question you should be asking, is should your Windows service be fault tolerant. Remebering that any unhandled exceptions will bring the service down, which results in its immediate unavailability. How do you think your service should behave? Should it try and continue servicing whatever it needs to? Should it be terminated?
Actually, if you have an unexpected exception that is passed all the way to the top level of your service, you should not continue processing; log it and propogate it. If you truly need a "reliable" service, then you'll need a "watchdog" that restarts the original service when it exits.
Note that modern operating systems act as a watchdog, so you don't need a watchdog service in most cases (check out the "Recovery" tab under your Service properties). Historically, critical services would have a second "watchdog" service whose sole purpose is to restart the real service if it fails.
It sounds like your design may be able to make use of the scheduler; just let Windows take care of the "once a day" part and just have your service do the task a single time. If it fails, fine; Windows is responsible for starting it again the next day.
One final note: this level of reliability in a service is rarely needed. In commercial code, I've only seen it used in a couple of antivirus programs and a network filtering program (that had to be running or else all network communication would fail). I've done a couple "watchdog" programs myself, but these were for customers like auto companies who would lose tons of money when their assembly line systems went down. In addition to the software watchdog, these systems also had redundant power supplies, RAIDed hot-swappable hard drives, and a complete duplicate of the entire system for use as an automatic failover.
Just saying: you may want to reconsider how much you really need to increase reliability (keeing in mind that 100% reliability is impossible; it can only be approached, at exponential cost).
In my opinion, you should establish a strong distinction between unrecoverable and recoverable exceptions, i.e., exceptions that prevent the continuation of your service (if your "static" data structures are corrupted) and exceptions that just determine the failure of the current operation. To make clear the distinction you may have to separated exception classes hierarchies.
This distinction should go along with a strong distinction between the structures of the "supervisor" part of the service (the one that schedules the periodic action) and the part of the service that actually does such periodic action. In case of a recoverable exception, you could abort the running operation and completely reset this last part, obviously logging all the details of the exception to the system event log; on the other hand, if you got an unrecoverable error (supervisor's structures in an inconsistent state and SEH exceptions, of course) you should just log your error and exit, since continuing running in an inconsistent state is much more dangerous than not running at all.
Swallowing exceptions is rarely a good idea and as Scott says in his article, there really are only a few valid cases where it might be the best option.
My advice would be to firstly, know what exceptions you're catching and catch them. It'll be more useful to you in the future if you know what you're catching rather than the generic (Exception e)
Once you've caught the exception then as you stated above, writing that to a logging service, perhaps emailing the details to the maintainer of the code or even firing off another event that sets up a re-try of the code with a limit on the number of attempts before a new message is issued to the code maintainer.
By catching specific exceptions you can do specific things about them. You can also catch the general exception to ensure that exceptions you really didn't expect don't cause a complete system failure.
Once you know about exceptions you weren't aware of before, these can then be refactored into the next release with a more ideal way of handling them.
Like so many things in software development rarely does "one size fit all". If you deem it appropriate to swallow the exception with the intention of retrying at a later date then that's perfectly reasonable. What really does matter is that you clean up after yourself, log and determine a reasonable retry policy before notifying someone.
The Exception Handling Block of the Enterprise Library may prove useful as you can modify your exception policy within config without changing the code.
A service should never stop. There are two classes of errors, errors in the Service itself, and errors in data provided to the service. Data Errors should be reported but not ignored. These two goals can be accomplished by having the service log errors, by providing a way to transmit error information to the user, and by having the service retry the failure after the user (or programmer in the case of an error in the service) has corrected what caused the service to fail (obviously the service will have to be stopped, re-installed, and re-started if a program error is corrected).
Related
Supopse I have a unhandled exception (or a known serious, unrecoverable error). The scariest situation is a security breach, but it could apply to anything that means my state is so badly hosed I can't expect to continue safely.
What do I do?
In a traditional application, the usual technique is to end my process, quickly. as soon as possible. I'm calling Process.Exit, TerminateProcess, die, or whatever other tool the environment has that means "END. NOW". Eric Lippert's post expresses the reasoning for this attitude well.
In a production ASP.NET application running on IIS, it's not so simple. I can certainly end the current process and cough an error to the event log or wherever. That's essentially what happens with any unhandled exception. But the next time a request comes in, IIS is just going to spin up a new worker process. If my fatal error was a transient problem that's great.
But if my problem persists past the lifetime of my process, the new one won't be any better. It could even be compounded by the intialization code or a reattempt. Plus, if IIS is running multiple worker processes within the same application pool, even killing my process doesn't kill the application. Logically speaking all those other workers may be hosed too and just not know it yet.
So far I've only come up with two options.
End the process and hope for the best. Knowing that the app will just be restarted, this is pretty much the same as "catch(Exception) {}". Hardly satisfying.
"Reaching out" to tell IIS to disable the app, stop IIS, the machine, etc. This seems like a brutal hack. Moreover I'd guess it's likely to require elevated security credentials. During termination of a possibly-compromised process seems like a poor time to have those.
What I can think of are something as following:
You can go ahead use the advanced setting of an Application Pool in IIS named "Rapid-Fail protection", set the Failure Interval long enough as you like, and make the Maximum Failures as 1, then go ahead thrown the exception and make the IIS think this application pool can't work correctly so that it will send back Service Unavailable to client side or even reset the connection(depend on your setting). For more detail please check it here: Failure Settings for an Application Pool . However you need to be very careful to not overkill, I mean you need to write a very good application that all exception been handled properly and only the one you want to terminate application can really been detected by IIS, otherwise maybe a single user click just brought down your site.
Another solution is just go ahead make it your own code, I mean you can record such an error in some certain way like creating a file named SystemCrashed, and then terminate the Application, then check if file exist on Application_Startup and do nothing but terminate the Application if file been found. Something like a lock. This need more code but maybe safer than IIS settings, I mean there can't be too much overkill as long as you get it right to remove the lock.
I have a windows service that does some intensive work every one minute (actually it is starting a new thread each time in which it syncs to different systems over http). The problem is, that after a few days it suddenly stops without no error message.
I have NLog in place and I have registered for AppDomain.CurrentDomain.UnhandledException. The last entry in the textfile-log is just a normal entry without any problems. Looking in the EventLog, I also can't find any message in the application log, however, there are two entries in the system log.
One basically says that the service has been terminated unexpectedly. Nothing more. The second event (at the same time as the first one) says: "...A new guard page for the stack cannot be created..."
From what I've read, this is probably a stack overflow exception. I'm not parsing any XML and I don't do recursive work. I host a webserver using Gate, Nancy and SignalR and have RavenDB running in embedded mode. Every minute a new task is started using the Taskfactory from .NET 4.0 and I also have a ContinueWith where I re-start a System.Timers.Timer to fire again in one minute.
How can I start investigating this issue? What could be possible reasons for such an error?
Based on the information that you provided, I would at least, at the minimum, do the following:
Pay extra attention to any third party calls, and add additional info logging around those points.
There are some circumstances in which AppDomain.CurrentDomain.UnhandledException won't help you - a StackOverflowException being one of them. I believe the CLR will simply just give you a string in this case instead of a stack trace.
Pay extra attention around areas where more than one thread is introduced.
An example of an often overlooked StackOverflowException is:
private string myString;
public string MyString { get { return MyString; } } //should be myString
I got this on a particular computer and traced it to a c# object referencing itself from within an initializer
Just as a 'for what it is worth' - in my case this error was reported when the code was attempting to write to the Windows Event Log and the interactive user did not have sufficient permission. This was a small console app that logged exceptions to a text file and the event log (if desired). On exception, the text file was being updated but then this error was thrown and not caught by the error handling. Disabling the Event Logging stopped the error occurring.
Just in case any other person is having the same problem, in my case I found that my windows service was trapped in an endless recursive loop accidentally. So If anyone else have this problem, take in consideration method calls that may be causing huge recursive loops.
I think why you might all be stumped is because this MAY BE a SSD hardware fault. I get this error consistently while playing games about every 3-5 hours and its my computers page file failing somehow.. I know it isnt RAM because i replaced my CPU/RAM/MOBO combo trying to battle this. And its not programming because different games and different apps all fail at the same time, unless its windows corruption?
I could be wrong but just an idea.
I have two samsung evo's in raid
During half a year of Winforms-MVP I designed the following exception handling strategy. I have a base abstract Presenter class with several Execute methods taking a delegate as input parameter (signatures vary). Interaction between the View and Presenter is done via events (input) defined in the IView and by setting public properties (output) or calling methods defined in the IView as well and implemented by the View. Each event handler in the presenter calls one of the Execute methods providing it with concrete realization.
In the execute method I have several catch blocks for very definite exceptions that may occur (mainly because of some problems in the external components that are widely used). Each of these exceptions stops the execution of the current operation, is being logged and shown to the user with meaningful explanation by calling View's methods.
Not long ago (in fact VERY not long ago) I started learning WPF-MVVM which from the first glance seems to have much in common with MVP. I was looking for some handy advice concerning exception handling strategy there (mainly informing the user about problems), but this questions are difficult to search for in general - I mean, much is said, but mainly in principle. I've found more than 20 examples of "handling" unhandled exceptions in the app.xaml.cs, it's all very nice, but tell me sincerely - if you know exact exceptions that may crash you app, won't you handle them a little bit earlier (even if you will be forced to close your app)? I'm no way a fan of catching all the possible exceptions. Quite a lot of exceptions that are caused by the network problems, temporary database unavailability and so on should be handled without closing the application without scary error icons giving an ordinary user a chance to repeat his request.
So as an experiment I tried almost the same thing as I described earlier - I've created events in ViewModel for exceptions transition and subscribed View to them. But, frankly speaking, this way gives me creeps.
(It was a very long speech, I know) The question: how do you handle exceptions in what is concerning informing user when using MVVM? No, I'm not interested in data validation just for now. Any criticism and/or advice about MVP is also welcome.
We have a couple of different strategies for different kinds of error conditions in our Wpf apps.
For anticipated errors that the code can handle and proceed without notifying the user, we do the normal Try Catch blocks.
For errors that are anticipated but that result in a failure from the users point of view, we expose a Notifications collection on our ViewModels, bound to an ItemsControl on our View which is templated in a similar way to the notification bars in Firefox/IE/Chrome. Each notification has a show duration property (the Notifications collection is self pruning using a dispatcher timer) and a close button in the view, so that they can appear for a specific period of time or can be closed explicitly by the user. The nice thing about this model is that it can be used for Completion messages, warnings and exceptions - and also some conditions that might not manifest as an exception but that are still error conditions from the users point of view. Notifications are often a good replacement for the message box as they don't interrupt the users workflow.
For errors that we don't anticipate we use Red Gate SmartAssembly to capture full details so users can send them to our support for analysis. Our view is that catching and continuing your app after exceptions that you haven't anticipated is a very risky strategy - the stack from an unexpected exception is not unwound and your app will be left in an inconsistent state after the error (which could result in anything from a weird UI to corrupt data) and there could be side effects that are impossible to predict. It's not a great user experience to have an app crash, but it's a significantly worse experience to have it corrupt data because of an unanticipated state caused by an error that was ignored by the app. Our strategy is to capture as much detail about the crash so the user knows we are serious about addressing the problem and we will get it fixed / caught in a future update - rather than just carrying on and heading for potentially worse problems.
I agree, leaving the handling of exceptions in your app.xaml.cs is not good, because it is basically too late!
For operations where a potential exception is relatively high (file handling, network IO), ensure that you actively catch exceptions. I expose this to the view in one of two ways:
For errors that indicate some long-running issue, network problems for example, I expose an 'ErrorState' poperty
For transient issues, file not found for example, expose an event.
1)
1 - Only handle exceptions that you
can actually do something about, and
2 - You can't do anything about the vast majority of exceptions
a) I assume that “By not handling an exception” the text is suggesting that we should let the exception bubble up the stack, where runtime will abort our application?!
b) But why is letting the runtime abort the exception preferred over catching an exception, logging it and then informing the user of failure? Only difference between the two is that in the latter case application isn’t aborted
For example, if database goes down, why should the whole program crash ( due to not handling an exception ), if we can instead catch the exception, log it and notify user of failure and that way we can keep the program up and running
2) If you know that exception potentially raised by some block of code can’t be handled, should you include this code inside a try-finally block or is it better to leave it outside any try-finally blocks?
Thank you
No, the guideline is not to catch an exception you cannot do anything about except at the top-level of your application or thread.
You should try to avoid letting your application crash - log the information somewhere, give a message to your user explaining what happened, and tell them how to report the error to you. Perhaps also try to save their unsaved data in a recovery file so that the next time the application starts it can offer the option to attempt to recover their lost work.
Try looking at it this way... The database goes down. How do you know? Because you get an timeout/an exception/something. But your application probably isnt getting the exception. ADO.NET/Linq to SQL/Entity Framework/Whatever data provider you are using is actually getting the exception and throwing it to your application. To me, this is what that advice is advising: as a component designer, prefer to throw exceptions you can't do anything about.
For the database down example, is there anything the ADO.NET data provider can do? Can it bring a server back up? Repair network connections? Reset permissions? No. So it doesn't handle the exception, it throws it.
The guideline you cite is for component development, not the outer edge of a run-time boundary (a thread or application). At that level, it would be correct to make a decision on how to handle exception that have bubbled that far.
I think the person you are quoting suggests that you should let the exception bubble up the stack until something higher up can make sense of it or until it reaches the top of the call stack where you do have code that would log it, or display a error message to the user then exit your program if it is fatal, or carry on if it is not.
Sometimes it may be better to not continue executing the program - if you get a OutOfMemoryException for example or some other situation where the programs actions are undefined - a potential disaster.
I think the key to
Only handle exceptions that you can actually do something about
is that you should only handle the exception if you can carry on from that point in your application.
To take a trivial example.
If you're looking for a file on the user's system and it's not there when it should be you should raise the "file not found" exception. Now if you can recover from this (say by simply recreating the file) then do so and carry on. However, if the file can't be recreated then you shouldn't let your program carry on.
However, this doesn't mean you can't have a top level catch all exception handler in your main program to display a friendly message to the user, perhaps to log the exception or even mail it to you the developer.
That statement holds true. But it is a reference to catching exception in the deeper layers of application. Basically most of the code we write does not need exception handling. It is only the client part of the application is responsible for catching the error and presenting it to the user - as well as logging.
For example, the same business code/database code can be used in a web application and windows/wpf application and logging/handling could be different and deeper layers do not know about how this will be handled so they need to leave the responsibility to the UI tier.
The point is that you don't want to have try/catch blocks nested everywhere in your code as this tends to hide issues with your code. It is better to only implement exception handling where you understand the error and the desired outcome, else don't handle it and let it bubble up.
As for as the errors bubbling up, you should have a global exception handler for these uncaught application errors. This is only implemented in one spot in your app and will allow you to log or present the error to the user. Again this is only implemented in one spot in your app, and is implemented by hooking the application.error event.
Event to hook in .net win forms application:
AppDomain.CurrentDomain.UnhandledException
Event to hook in .net asp.net application:
HttpApplication.Error
Enjoy!
Without knowledge about the context of both statements, stated that both statements apply to methods and classes then they make sense:
A piece of code which calls a method can only handle exceptions for which it has enough information about the context. In most cases a piece of code won't have enough information, to handle all exceptions.
Example: A piece of code, which calls a method SaveData() can handle a DatabaseStorageException when it knows, that it saves data to a database. On the other hand, if the piece of code is programmed in a storage agnostic manner, than catching such a specific exception is not a very good idea. In this case it is better to let the exception pop up the callstack and let some other code handle the exception, which has enough context information to handle it.
I am reading some C# books, and got some exercise don't know how to do, or not sure what does the question mean.
Problem:
After working for a company for some time, your skills as a knowledgeable developer are recognized, and you are given the task of “policing” the implementation of exception handling and tracing in the source code (C#) for an enterprise application that is under constant incremental development. The two goals set by the product architect are:
100% of methods in the entire application must have at least a standard exception handler, using try/catch/finally blocks; more complex methods must also have additional exception handling for specific exceptions
All control flow code can optionally write “tracing” information to assist in debugging and instrumentation of the application at run-time in situations where traditional debuggers are not available (eg. on staging and production servers).
(I am not quite understand these criterias, I came from the java world, java has two kind of exception, check and unchecked exception. Developer must handle checked exception, and do logging. about unchecked exception, still do logging maybe, but most of the time we just throw it. however here comes to C#, what should I do?)
Question for Problem:
List rules you would create for the development team to follow, and the ways in which you would enforce rules, to achieve these goals.
How would you go about ensuring that all existing code complies with the rules specified by the product architect; in particular, what considerations would impact your planning for the work to ensure all existing code complies?
As you mentioned Java has checked and unchecked exceptions. For checked exceptions you have to either declare your method throws it, or handle the exception in the method. C# does not have that limitation, your method doesn't have to declare what exception it could possibly throw.
100% of methods in the entire application must have at least a standard exception handler, using try/catch/finally blocks; more complex methods must also have additional exception handling for specific exceptions
This seems like a stupid requirement. If you have no meaningful way to recover from an exception and continue executing normally, you would ideally allow the exception to bubble up the stack unimpeded. That way when you log the exception (right before shutting down gracefully, or not so gracefully) you'll have a full stack-trace of what exactly caused the exception. It is a very common mistake (from the code I've seen) to use pokemon exception handling and logging the exceptions too early (so you know something bad happened but not what piece of code triggered it.
You should also take a look at this list of similar question for a good overview of good exception handling practices.
And for good measure Vexing exceptions.
After you define your application architecture, you should determine how the exceptions generated by your application will be handled. The strategy should be to meet all security, privacy, and performance requirements. The following are general guidelines for an exception handling strategy:
Do not catch exceptions unless some kind of value can be added. In other words, if knowing about the exception is of no use to the user, to you, or to the application, do not catch it.
Do catch exceptions if you want to retry the operation, add relevant information to the exception, hide sensitive information contained in the exception, or display formatted information.
Generally, handle exceptions only at an application boundary (such as the top of a logical layer, tier, the boundary of a service, or at the top of a UI layer). Also, replace exceptions that contain sensitive information with new exceptions that contain information that can be exposed safely outside the current boundary.
Do not propagate sensitive information across trust boundaries. This is a standard security consideration, but it is frequently overlooked when dealing with exception information. Instead, replace exceptions that contain sensitive information with new exceptions that contain information that can be exposed safely outside the current boundary.
Make exceptions as accurate as possible and allow for specific actions to be taken when the exception is raised. Sometimes this may require writing custom exceptions with the required attributes.
Error messages that are displayed to users should be relevant and they should suggest the corrective action to take. In most cases, user-displayed error messages should never contain sensitive information, such as stack traces or server names.