ASP.NET Websocket behavior on Pool recycle

ASP.NET Websocket behavior on Pool recycle - c#

I'm currently evaluating using asp.net websockets for connecting a few thousands clients that will stay connected with the app pretty much 24x7, except for when the server goes offline for patching etc. Generally, the expectation from the system is that websockets should not disconnect unnecessarily and the clients will basically stay connected and ping the server every few mins.
While I was researching asp.net websocket viability for our new architecture, I came across another stackoverflow post: IIS App Pool cannot recycle when there is an open ASP.NET 4.5 Websocket which seems to suggest that IIS doesn't recycle the pool if there is an active websocket connection. Is this by design or did the other person experience an anomaly? If IIS does recycle the pool while the websocket connections are active, what's the expected behavior? Does Http.sys keep the connections, recycles the pool and things resume as if nothing happened (from the client's perspective)? Should I just create a separate app pool for websockets and disable recycling on it?

From my experience no, the WebSockets on the old worker process are not transitioned to the new worker process. I have observed that the old WebSockets are put into a non-Open state and it is up to your code to check this and stop maintaining those threads - my application was creating a heartbeat to the client over WebSockets and if the heartbeat failed I needed to close that (already closed) WebSocket context as soon as possible so the old worker process could fully unload and die.

Related

30,000+ sticky TCP connections causing delaying issue and unresponsiveness in .NET 4.5 Application

I have 30k+ active devices that connect through TCP sticky session from my application. Each request than creates its own threads and start receiving and sending data. Each requests around take 20 mins to send the complete packets over the network. It means that application need to create 30K threads approximately.
Problem is that my application sometimes become unresponsive and doesn't send and receive messages and it has happening randomly. I was reading on HERE that the max limit of threads in each process is 2000. I am wondering if this is the bad design to hold the 30k+ session and wait for new message to send and receive? What can be done to manage such load?
My Assumption/Suggestion:
1: Creating 2000 threads at max would cause the application to slow because of context switching?
2: At any point in time 28000 devices will be on wait because of worker thread limit?
System Information:
64 bit
8 CPU cores

Use fewer threads. Use async IO instead. I would strongly recommend using Kestrel to host this using the "pipelines" model, which would deal with threading, buffer management, and most of the TCP nuances for you, so you can just worry about your protocol and application code. It scales far beyond your needs (although for 30k sockets you might want to start using multiple ports, especially if you're behind a load balancer, due to ephemeral port exhaustion). Kestrel would require .NET Core or above, but the same approach is available on .NET Framework via Pipelines.Sockets.Unofficial. One guide to pipelines and Kestrel in this context is in a 4-part series here - other guides are available.

SignalR (Version 1) hub connection to server during application pool recycle

We have several servers running an ASP.NET Web API application. The Web API is accessed from a Windows desktop client application written in C#. The client uses a SignalR HubConnection object to start a reverse channel with the server through which it receives live feedback (log statements, status, etc.). Last year, one of our users reported that the client was periodically encountering the following error:
An existing connection was forcibly closed by the remote host
This error is received when the HubConnection.Error event is raised. I dug into the Windows Event Viewer logs on the server side and discovered that these errors coincided exactly with the occurrence of the following events:
A process serving application pool 'ASP.NET v4.0' exceeded time limits during shut down. The process id was ‘xxxx'.
This event immediately followed 90 seconds after the application pool recycling event:
A worker process with process id of ‘xxxx' serving application pool 'ASP.NET v4.0' has requested a recycle because the worker process reached its allowed processing time limit.
So clearly, the old worker process serving the ASP.NET v4.0 application pool was failing to shut down within the ShutdownTimeLimit of 90 seconds. I did some further experiments and discovered the following request was being retained in the request queue was causing the forced shutdown of the worker process:
The version of the SignalR libraries we were using at the time was 1.0.20228.0.
Last year, we upgraded to SignalR Version 2.2.31215.272 on both the client and server. It seems that this change resolved the problem I described above. The 'signalr/connect' request is still retained for the life of the hub connection, but when the application pool recycles the client and server gracefully reconnect it without any issues. Apparently some fix was made between SignalR V1 and V2 which allows it to handle application pool recycle events in a much more graceful manner.
Just for my own understanding, why was this issue being caused with V1 of the SignalR libraries, and what changed between V1 and V2 which resolved this issue?
Thanks.

Background thread in WCF services

I have developed a WCF service for serving our customers and hosted it on IIS. We have a requirement to log all the requests received and responses sent from WCF in to a database.
But, because of this logging, we don't want to interrupt main flow of requests and responses. So, we are using threads (Threading.Thread and Thread.IsBackground = true) to call procedures to insert/log the requests and responses to database.
I just want to know if there will be problems in implementing/invoking threads on a WCF service. If so, what will be a good solution for this?

Yes, there can be a problem. The application pool in IIS can get recycled which means that the background thread will be killed, even if it's in the middle of some processing.
In reality that will only be a problem when you update your application (as the logger should be done when the app pool is stopped due to the idle timeout).
So if you can live with lost log entries during updates you do not have a problem.

SignalR connection handling on app pool recycle

I'm using SignalR (0.5.3) Hubs for a chat app where each keystroke is sent to the server (saved in the DB), relayed to all clients and a return value (a string token of sorts) is sent back from the server.
It works fine, until the app pool recycles, then it stops relaying the keystrokes to all the clients (because the in-memory server state is lost I suppose) and the server doesn't return back any values as well. At this point, I suppose all requests via SignalR are queued by IIS and then processed once the app pool has been recycled.
My question is how can I handle this scenario so that all clients are aware of the server unavailability/delay due to app pool recycle, notify the user to wait for a while and then resume operation on reconnect?

There's two options.
For 0.5.3 you can detect when the client goes into "reconnecting" or is "disconnected" and notify the user that there is server issues. Keep in mind in most situations the client will not actually know that it is disconnected if the server just goes away.
OR
If you wait for the next release (1.0alpha) we will take care of the bulk of this for you. On lack of a server we will trigger an onConnectionSlow event which will then result in the client shifting into the "reconnect" (if it does not receive any info) state until the server comes back online. The client will also know if the server goes away (we're adding this functionality) for edge cases such as an app pool recycle.
Hope this helps!

Windows NT Service shutdown issues

I have developed middleware that provides RPC functionality to multiple client applications on multiple platforms within our organization. The middleware is written in C# and runs as a Windows NT Service. It handles things like file access to network shares, database access, etc. The middleware is hosted on two high end systems running Windows Server 2008.
When one of our server administrators goes to reboot the machine, primarily to do Windows Updates, there are serious problems with how the system behaves in regards to my NT Service. My service is designed to immediately stop listening for new connections, immediately start refusing new requests on existing connections, and otherwise shut down as rapidly as possible in the case of an OnStop or OnShutdown request from the SCM. Still, to maintain system integrity, operations that are currently in progress are allowed to continue for a reasonable time. Usually the server shuts down inside of 30 seconds (when the service is manually stopped for example). However, when the system is instructed to restart, my service immediately loses access to network drives and UNC paths, causing data integrity problems for any open files and partial writes to those locations. My service does list Workstation (and thus SMB Redirector) as a dependency, so I would think that my service would need to be stopped prior to Workstation/Redirector being stopped if Windows were honoring those dependencies.
Basically, my application is forced to crash and burn, failing remote procedure calls and eventually being forced to terminate by the operating system after a timeout period has elapsed (seems to be on the order of 20-30 seconds).
Unlike a Windows application, my Windows NT Service doesn't seem to have any power to stop a system shutdown in progress, delay the system shutdown, or even just the opportunity to save out any pending network share disk writes before being forcibly disconnected and shutdown. How is an NT Service developer supposed to have any kind of application integrity in this environment? Why is it that Forms Applications get all of the opportunity to finish their business prior to shutdown, while services seem to get no such benefits?
I have tried:
Calling SetProcessShutdownParameters via p/invoke to try to notify my application of the shutdown sooner to avoid Redirector shutting down before I do.
Calling ServiceBase.RequestAdditionalTime with a value less than or equal to the two minute limit.
Tweaking the WaitToKillServiceTimeout
Everything I can think of to make my service shutdown faster.
But in the end, I still get ~30 seconds of problematic time in which my service doesn't even seem to have been notified of an OnShutdown event yet, but requests are failing due to redirector no longer servicing my network share requests.
How is this issue meant to be resolved? What can I do to delay or stop the shutdown, or at least be allowed to shut down my active tasks without Redirector services disappearing out from under me? I can understand what Microsoft is trying to do to prevent services from dragging their feet and showing shutdowns, but that seems like a great goal for Windows client operating systems, not for servers. I don't want my servers to shutdown fast, I want operational integrity and graceful shutdowns.
Thanks in advance for any help you can provide.
UPDATE: I've done the exact timings. In a test shutdown, I got shutdown notification at 23:55:58 and noticed losing network share connectivity at 23:56:02. So within four seconds, I've lost the ability to save out any active state.

This question: https://serverfault.com/questions/34427/windows-service-dependencies on ServerFault should answer your own. It links to this article: http://blogs.technet.com/askperf/archive/2008/02/04/ws2008-service-shutdown-and-crash-handling.aspx, which should help you get pre-shutdown notification and service shutdown ordering.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.