Topshelf service with dependency on RabbitMQ not starting on reboot

Topshelf service with dependency on RabbitMQ not starting on reboot - c#

I have a windows services which uses EasyNetQ and RabbitMQ.
The service starts normally from the service control manager.
However I have seen occasionally on a reboot, the service does not start with the error in the services event log :
A timeout was reached (30000 milliseconds)
The <serviceName> service failed to start due to the following error:
The service did not respond to the start or control request in a timely fashion.
I have tried auto delaying the service and this has not helped.
In addition, I was thinking about setting the recovery mechanism so that if it does not start it restarts on first/second and subsequent failures. Not sure if this will work.
So my questions is how can I determine what the dependency is that is causing my service not to start sometimes?

In order to determine what dependency is causing the error you could try to attach a handler on TopShelf "OnException" (https://topshelf.readthedocs.io/en/latest/configuration/config_api.html#onexception)
and log the exception that caused the error.

Related

A connection attempt failed because the connected party did not properly respond after a period of time in Azure

I have a Azure VM on which a Windows service is installed.
In this Windows service, we are registering to the EventHub like mentioned in this website.
I have method which logs error like this:
Task IEventProcessor.ProcessErrorAsync(PartitionContext context, Exception error)
{
log.Error($"Error: {error.Message}");
return Task.CompletedTask;
}
This Windows service keeps listening to the event hub and whenever a event is received, it performs some processing based on the event (some business logic). When the processing is happening, it sometimes throws this error in between.
Error: A connection attempt failed because the connected party did not
properly respond after a period of time, or established connection
failed because connected host has failed to respond.
But there after some time, it resumes its processing as usual. So why do these error messages come and is it getting resolved by itself. This is not causing any issue for me but want to know the reason and want to avoid this in future completely if possible. I have searched a lot about this on net, couldn't find any proper solution.

I think you should take a look at Azure Event Hubs quotas and limits and How exactly does Event Hubs Throttling Work?, as from your description of the problem, it might be possible you are hitting a throttling threshold.
Hope it helps!

Adding to #ItayPodhajcer's answer.
Add a Retry Policy. It is very common to have transient connection errors in a cloud based services. Refer to the Service Bus Retry policy document here:
https://learn.microsoft.com/en-us/azure/architecture/best-practices/retry-service-specific#service-bus

CommunicationException; Store App -> .NET 4.5 WCF Duplex TCP Service; Instant Timeout

I am having trouble getting a Windows Store App to make calls into a WCF service.
The service is a Duplex service using a netTCP binding. The first time the client (A Windows 8.1 Store Application) uses the service, it throws an exception:
An exception of type 'System.ServiceModel.CommunicationException'
occurred in mscorlib.dll but was not handled in user code
Additional information: The socket connection was aborted. This could
be caused by an error processing your message or a receive timeout
being exceeded by the remote host, or an underlying network resource
issue. Local socket timeout was '00:09:59.9968452'.
This timeout is near-equal to my max, 10 minutes. The exception, however, happens immediately, and breakpoints in the service function are never hit. The two do seem to be talking at some level because altering the security protocol or the endpoint address cause other exceptions (security and connection as you would expect). I put a breakpoint in the service and the function I am trying to call never gets hit.
I have tried:
Ensure feature equivalence between Service and Client NetTcpBinding configurations
Raise timeouts, sizes (1-10 minutes for each, 10000000 for max sizes)
Ensure all passed object types are DataContracts with default constructors
Prayer; Considering a burnt offering
Any help would be greatly appreciated. New to WCF and having trouble finding help for the Windows Store / netTCP / Duplex targeted scenario.

If your Windows Store App client and WCF service are on the same computer and you're not running the Windows Store App client from inside Visual Studio, then you need to enable loopback communication. Check out this article:
http://msdn.microsoft.com/en-us/library/windows/apps/dn640582.aspx
At the very bottom it talks about the command line utility:
checknetisolation
Also, try making your method call as simple as possible with no return and no arguments. I know you said you checked all of the 'DataContract' attributes. But things can get tricky such as if you're using polymorphism and the base class doesn't have a 'KnownType' for a derived.

Azure worker role recycling with unhandled Service Bus fault message

I have been running an Azure worker role deployment that uses the Microsoft.ServiceBus 2.2 library to respond to jobs posted from other worker roles and web roles. Recently (suspiciously around the time of the OS update discussed here), the instances of the cluster started constantly recycling, rebooting, running for a short period of time, and then recycling again.
I can confirm that the role instances make it all the way through the OnStart() method of my RoleEntryPoint from the trace messages I have in my diagnostics. Occasionally, the Instances pane of the Azure Management Portal would mention that a recycling role had experienced an "unhandled exception," but would not give more detail. After logging in with remote desktop to one of the instances, the two clues I have are:
Performance counters indicate that \Processor(_Total)\% Processor Time is hovering at 100%, periodically dropping to the mid-80s coinciding with drops in \TCPv4\Connections Established. Some drops in \TCPv4\Connections Established do not correlate with drops in \Processor(_Total)\% Processor Time.
I was able to find, in the Local Server Events in the Server Manager of one of the instances, the following message:
Application: WaWorkerHost.exe
Framework Version: v4.0.30319
Description: The process was terminated due to an unhandled exception.
Exception Info: Microsoft.ServiceBus.Common.CallbackException
Stack:
at Microsoft.ServiceBus.Common.Fx+IOCompletionThunk.UnhandledExceptionFrame(UInt32, UInt32, System.Threading.NativeOverlapped*)
at System.Threading._IOCompletionCallback.PerformIOCompletionCallback(UInt32, UInt32, System.Threading.NativeOverlapped*)
There have been no permissions configuration changes associated with the service bus during this time, and this message occurs despite us not having updated any of our VMs. Nonetheless, it also appears that our service is still functioning => jobs are being processed and removed from the Service Bus Queues they are listening to.
Most Googling on these issues turns up suggestions that this is somehow related to IntelliTrace, however, these VMs do not have IntelliTrace enabled on them.
Does anyone have any ideas on what is going on here?

The service bus exceptions turned out to be a red herring from the perspective of the crashing - a namespace conflict in one of the data contracts being sent between two different VM roles that were published at different times. Adding additional tracing to exceptions thrown during one of the receive retries revealed it. Still a mystery as to why it's working at all, and the role recycling has not ceased, just the service bus exception.

I had the similar issue. The main reason is that it could not resolve the Service Bus dll version issues make sure the version you are redirecting in AppSettings and the version you actually added reference to are same.
It may occur with any dll mismatches not only with service bus dll...

Self starting and fault exceptions

What is the best way to handle any faults that may occur for a WCF service? So apart from try/catch in the service/WCF itself, what if the client faults for example the system went down (i.e MSMQ went down on a cluster or something) - things like this will cause WCF service host to fault.
How can I restart the service safely after a period of x seconds? I tried doing this but even when I create a new ServiceHost after Abort() when I have entered the Faulted state, I always get an error saying that the communication channel has faulted or is closed.
What can you recommend as a good solution to restart the service host app if it faults, and to successfully re-establish that host after it being faulted?

Try to implement WCF service as Windows service. In this case when the server restarts you host will restarted too. You should implement Windows Service class inherits from ServiceBase and then override OnStart and OnStop methods.

Azure Service Bus Binding causes Worker Role to become unhealthy after single fault

I've been working on a Azure worker role that hosts a netTcpRelayBinding WCF service. All seems to work well until One of my connected hosts disconnects unexpectedly. Over the next couple of minutes, the role consistently loses stability and then reports itself as unhealthy.
I'm not sure where I should be looking. I've got IntelliTrace enabled, and I've got some exceptions, which start with the TimeoutException you'd expect, but then continue on. I get these messages:
System.ServiceModel.CommunicationException - Inactivity timeout of (00:00:10) has been exceeded.
System.InvalidProgramException - The Common Language Runtime detected an invalid program
After this, I get a series of communication exceptions, timeoutexceptions, and then eventually the whole host crashes with an OutOfMemoryException.
Things to note: I've got 1 client connected. No other calls or activity. When he disconnects unexpectedly, the above consistently happens.
Tried catching the servicehost Faulted but that seemed to do nothing (can't see where it was hit in IntelliTrace logs.
Any ideas on where I should be looking? Surely I don't need to recreate the service every time something like this happens right?

That doesn't sound familiar. Can you reproduce this and how's your role feeling overall, meaning what's the CPU load looking like and what else is going on in parallel?
Cheers,
Clemens

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Topshelf service with dependency on RabbitMQ not starting on reboot - c#

In order to determine what dependency is causing the error you could try to attach a handler on TopShelf "OnException" (https://topshelf.readthedocs.io/en/latest/configuration/config_api.html#onexception) and log the exception that caused the error.

Related

A connection attempt failed because the connected party did not properly respond after a period of time in Azure

CommunicationException; Store App -> .NET 4.5 WCF Duplex TCP Service; Instant Timeout

Azure worker role recycling with unhandled Service Bus fault message

Self starting and fault exceptions

Azure Service Bus Binding causes Worker Role to become unhealthy after single fault

Categories

Resources