As we have moved from NSB5 to NSB6 we also looked into removing NServiceBus.Host and instead use Topshelf. When we did, our service no longer shows that it has stopped when we receive a critical failure.
As an example, when we have trouble to reach the database for any reason I want the service to end and in Services Manager it should indicate not running. Though, it still says running but service is actually stopped. Therefore no recovery is run either.
This was working as we were using NServiceBus.Host.
I was looking in the wrong direction, towards Topshelf. The answer lies in how to configure NServiceBus to take care of critical errors.
EndpointConfiguration.DefineCriticalErrorAction(OnCriticalError);
and
private async Task OnCriticalError(ICriticalErrorContext context)
{
await context.Stop().ConfigureAwait(false);
}
This worked for me.
Related
We're having an issue in production that is really difficult to diagnose. The symptom is, when using SignalR one or more threads on the server periodically hangs. When it happens the other threads seem to continue merrily, unless one of them also experience this problem.
No error is logged, but creating a dump reveals the threads experiencing an ObjectDisposedException at:
System.Net.Sockets.SafeCloseSocket.GetOrAllocateThreadPoolBoundHandle(Boolean)
System.Net.Sockets.Socket.GetOrAllocateThreadPoolBoundHandleSlow()
System.Net.Sockets.SocketAsyncEventArgs.DoOperationSendSingleBuffer(System.Net.Sockets.SafeCloseSocket)
System.Net.Sockets.Socket.SendAsync(System.Net.Sockets.SocketAsyncEventArgs)
If we disable SignalR on the client side, there is no problem.
We've scaled back the websocket functionality to a point where we only create an empty hub, and the problem still persists.
We initialize SignalR with:
services.AddSignalR();
app.UseSignalR(routes =>
{
routes.MapHub<LiveHub>("/live");
});
The contents of LiveHub.cs is:
public class LiveHub : Hub<ILiveHub>
{
public Task Ping()
{
return Clients.Caller.SendMessage("pong");
}
}
We are running on AWS behind an Application Loadbalancer, but only on one server currently.
Anyone else have had this problem?
We have an Azure-based ASP.NET Web Service that accesses an Azure KeyVault. We are seeing two instances in which a method "hangs" on a first try, and then works a minute or so later.
In both instances, a KeyVault access occurs. In both instances the problem started when we started using the KeyVault in these methods.
We have done very careful logging in the first instance, and cannot see anything else in our code that could cause the hang. The KeyVault access is the primary suspect.
In addition, if we run the app from our local servers (from Visual Studio), the KeyVault access works fine on the "first try". It only produces the "hang" error when it runs in production on Azure, and only on that "first try".
By "hang" I mean that in one instance, which is triggered by an external API, it takes at least 60 seconds (we can tell that because the external API times out.) In the other instance, which is triggered by a page request, several minutes can pass and the page just spins, at which point we assume the DB request or something else has timed out.
When I say "a minute or so later", that's as fast as we have timed the retry.
Is there some kind of issue or function where the KeyVault needs to be "warmed up" before it works on the first try?
Update: I'm looking at the code more carefully, and I see at least a couple of places where we can insert still more logging to get a more exact picture of where the failure occurs. I'm going to do that, and then I'll report back here.
Update: See answer below - major newbie error, has been corrected.
Found the problem, and the solution.
Key Vault access needs to be called from an async task, because there is a multi-second delay.
private async Task<string> GetKeyVaultSecretValue(varSecretParms) {
I don't understand the underlying technology, however, apparently, if the call is from within a standard code sequence, the server doesn't like to wait, and so the thread is abandoned/halts.
According to your description, it seems that it dues to WebApp that does not enable Always on .
By default, web apps are unloaded if they are idle for some period of time. This lets the system conserve resources. In Basic or Standard mode, you can enable Always On to keep the app loaded all the time
If possible, please have a try to enable Always on and try it again.
I have a REST service in a self hosted ASP.Net WebApi application (Console).
Some clients poll the server in specific intervals to fetch new data. In general all is working fine.
The problem is, that the server stops responding to requests after some random duration (~30mins - 2.5 hours). All client requests start to time out.
The weird thing is, the server doesn't seem to receive the requests anymore as no controller method is invoked anymore). Server didn't throw any exceptions and the console app is still responsive. So I can only suppose there is a problem, before the request reaches the API controller.
In the debugger everything seems fine.
How can I diagnose such an issue?
What else can I try to fix the described behavior?
Notes:
Tested on multiple systems
.Net 4.5.1
Asp.Net WebApi 5.1.2
I have found the issue, the reason this is happening is because of connection leaks. If you are sending requests and aren't closing them correctly, either after the request is finished, or within an exception, the amount of open connections will eventuelly reach it's max value. Either you change the max amount of open connections in the connectionstring or(the prefered way) make sure your code is handling the closing part:
SqlConnection myConnection = new SqlConnection(ConnectionString);
try
{
conn.Open();
someCall (myConnection);
}
finally
{
myConnection.Close();
}
Credit goes to How can I solve a connection pool problem between ASP.NET and SQL Server? Where you can read more about this.
In my case, the issue was caused by never ending tasks. Due a misusage of the ReactiveExtensions Api, I randomly created never ending tasks. It seems, at some point the task scheduler simply couldn't handle them anymore, although I'm not completely sure about that.
Thing learned: It seems, by doing bad things in your app code (too many tasks, SQL connections ...) you can kill the WebApi infrastructure, so that it doesn't handle requests - at any level - anymore.
I am having MSMQ on windows 2008. Messages are available in private queue. I have one WCF subscriber (written in C#) which is installed as windows service. Now problem is that sometimes the WCF subscriber stops picking messages from Queue. If I restart service again it works fine. Now I attached IError Handler to log the reason and exception.
Now to Handle this issue what I wanted to do is, I will set the recovery property to restart service on first failure and now problem is how to throw the error from HandleError() method of IErrorHandler class?
Please tell me best way to throw an exception in a window service so it can be restarted.
While it is probably better to address the underlying cause of your exceptions, it is certainly valid in certain scenarios to implement a fail fast methodology. Indeed, this ability to kill processes which have become "flawed" in some manner is critical to the concept of fault tolerance.
So, to make a windows service commit suicide:
void KillSelf()
{
try
{
// Code to close open connections/dispose
// of unmanaged resources etc
...
}
finally
{
Environment.Exit(1);
}
}
Service recovery options should be set to restart automatically. This will ensure your service comes straight back up again.
As far as I know one cannot throw an exception to restart a windows service.
I usually encapsulate a try catch (with logging) to prevent any exceptions crashing the service, which is the opposite to what you are suggesting.
It may be that you can catch an error and stop the service (not sure) and configure the service to restart if it stops?
I am working on Windows Service. It works fine. When i am trying to stop the service from services.msc, it throws the following error:
Windows could not stop the xxx service on Local Computer.
The service did not return an error. This could be an internal Windows error or an internal service error.
If the problem persists, contact your system administrator.
If I try to stop it again, it takes lots of time and then throws the error as below:
Windows could not stop the xxx service on Local Computer.
Error 1061: The service cannot accept control messages at this time.
At this point, the service has stopped. But if I try to reinstall the service, it throws another error:
Windows could not stop the xxx service on Local Computer.
Error 1001: The specified service is marked for deletion.
After closing services.msc, it lets me reinstall the service and again things start working fine.
In OnStop(), I am doing some lengthy operations and it takes time to complete.
Any idea how I can make the whole thing go smoothly?
--Edit--
This is what my OnStop method looks like:
protected override void OnStop()
{
base.OnStop();
//Some lengthy operation
}
The windows service have a default timeout in onstart and onstop events. Normally if you are doing time consuming operations in any of these events start a new thread and let the operation to perform in background.
Usually the OnStart as well as OnStop events within the windows service are used to initiate a process and the time consuming process will carryout it's execution within a child thread.
Hope this will solve your issue..
There is a registry entry that controls how much time windows gives services to shut down before giving up: http://support.microsoft.com/kb/146092
Trivially, increasing this time would fix the issue, assuming that your service is actually shutting down after that long operation.