What happens to other users if the .NET worker process crashes? - c#

My knowledge of how processes are handled by the ASP.Net worker process is woefully inadequate. I'm hoping some of the experts out there can fill me in.
If I crash the worker process with a System.OutOfMemoryException, what would the user experience be for other users who were being served by the same process? Would they get a blank screen? 503 error?
I'm going to attempt to test this scenario with some other folks in our lab, but I thought I would float this out there. I will update with our results.
UPDATE: Our results varied. If we artificially induced a OOM exception (for example by loading larger and larger PDFs into memory), other threads being served by that worker process would "hang" temporarily and then complete, while others seemingly would never return. Thank you for your responses.

W3WP.exe is the process
IIS runs all web apps in a generic worker process - w3wp.exe. Whether you write in ASP.NET, or ISAPI, or some other framework, the process that serves the web request is w3wp.exe. In the ASP.NET case, w3wp.exe loads the ASP.NET JIT-compiled DLLs and services the requests through them. In other cases, it works differently. But the key point is, w3wp.exe is the process. This model started in IIS6.0 and continues in IIS7.0.
Unexpected Failures
If the W3WP.exe fails unexpectedly, for any reason, all transactions it was handling will likely get 500 errors (Server error). IIS will start a new worker process in its place (MS calls this "Health Monitoring"), which means the web app will continue to run. Users that did not have a request being served by the failing process at the time of failure, will be unaware of any of this.
The HTTP 500 error that a client receives in this case will be indistinguishable from a 500 error that the client receives in the case of an application error, let's say an uncaught exception in your ASPNET application code.
For those requests that were in the failing process, there's no way to recover them. They will result in 500 errors at the browser. A 503 Server Busy results from IIS actively refusing the connection due to a threshold on the number of connections. A 503 does not result from an application failure, so you shouldn't expect to see 503 for in-flight transactions in the out-of-memory-and-crash scenario. On a heavily loaded system, you may see 503's as the process-crash-and-restart happens, as a secondary effect. If this is really what you're seeing, you need a larger margin of safety to handle the load in the single-error condition.
The Request Queue
IIS has a hand-off approach for requests. As they arrive on the network layer (Http.sys), they are placed in a queue, to be picked up by a worker process. Any requests waiting in the IIS queue to be handled by a WP will continue unaffected, though they might see a slight temporary increase in latency (service time) due to resource contention, since one fewer process is running on the server. Wait time in this queue is generally very very short, on a system that is configured properly.
It is when this queue is full that you will see 503 errors.
Auto restart of W3WP.exe
IIS has an auto-restart (or "nanny") facility, through which it restarts worker processes after they have exceeded configured thresholds, such as memory size, number of requests, or time-of-running. In all those cases, IIS will quiesce and restart worker processes when the configured threshold is reached. These pro-active restarts normally do not result in any disruption of requests. When IIS decides that a restart of a worker process is necessary, it prevents any new requests from arriving at that to-be-quiesced WP. Existing requests are drained: any in-flight transactions in that WP are allowed to complete normally. When all requests in the WP complete, then the WP dies and IIS starts a new one in its place. This new process then immediately begins picking up new requests from the dispatch queue. This is all transparent to users or browsers.
I say normally because it's possible that the worker process has become truly sick at the same time as the threshold has been reached. In that case the w3wp.exe may not respond to IIS within the configured "quiesce" timeout, and thus IIS has to eventually kill the process even though it hasn't reported that all of its in-flight requests have completed. This should be exceedingly rare, because it's two distinct exceptional conditions, but it happens. In this case, the in-flight requests will once again, get 500 errors.
Web gardens
Also - IIS allows multiple worker processes on a single server. MS calls this a "web garden", a play on words from "web farm". If you have a web garden set up, then transactions being served by w3wp.exe instances other than the failing one, will continue unaffected. "Unaffected" presumes though, that the out-of-memory error is localized, and not a system-wide problem.
Bottom Line
The bottom line is that there is no substitute for your own testing. The configuration options are pretty broad - from restart thresholds to web gardens and so on. Also the failure modes tend to be pretty complex and varied, whether it's memory, timeout, too busy, and so on. You'll want to understand what to expect.
ps: this Q&A really belongs on serverfault.com !!
references:
http://blogs.iis.net/thomad/archive/2008/05/07/the-iis-process-model-features.aspx

A new worker thread will be started and the user would not know anything happened. Unless it shuts down completely via rapid fail (http://technet.microsoft.com/en-us/library/cc779127(WS.10).aspx)

If it's an out of memory situation, iis usually just recycles the app pool.

As the other answers say, in most cases everything just restarts, and most users who did not have a pending request at the time will not notice much more than a delay.
However, if your application uses session variables with In-Proc session state, all session variables for all users will be lost when the app pool restarts. This may or may not have a negative effect on the users, depending on what you're doing with the session variables. You can avoid this by switching to StateServer or SQL Server session storage.

Related

Are all the web requests executed in parallel and handled asynchronously?

I am using a WebApi service controller, hosted by IIS,
and i'm trying to understand how this architecture really works:
When a WebPage client is sending an Async requests simultaneously, are all this requests executed in parallel at the WebApi controller ?
At the IIS app pool, i've noticed the queue size is set to 1,000 default value - Does it mean that 1,000 max threads can work in parallel at the same time at the WebApi server?
Or this value is only related to ths IIS queue?
I've read that the IIS maintains some kind of threads queue, is this queue sends its work asynchronously? or all the client requests sent by the IIS to the WebApi service are being sent synchronously?
The queue size you're looking at specifies the maximum number of requests that will be queued for each application pool (which typically maps to one w3wp worker process). Once the queue length is exceeded, 503 "Server Too Busy" errors will be returned.
Within each worker process, a number of threads can/will run. Each request runs on a thread within the worker process (defaulting to a maximum of 250 threads per process, I believe).
So, essentially, each request is processed on its own thread (concurrently - at least, as concurrently as threads get) but all threads for a particular app pool are (typically) managed by a single process. This means that requests are, indeed, executed asynchronously as far as the requests themselves are concerned.
In response to your comment; if you have sessions enabled (which you probably do), then ASP.NET will queue the requests in order maintain a lock on the session for each request. Try hitting your sleeping action in Chrome and then your quick-responding action in Firefox and see what happens. You should see that the two different sessions allow your requests to be executed concurrently.
Yes, all the requests will be executed in parallel using the threads from the CLR thread pool subject to limits. About the queue size set against the app pool, this limit is for IIS to start rejecting requests with a 503 - Service unavailable status code. Even before this happens, your requests will be queued by IIS/ASP.NET. That is because threads cannot be created at will. There is a limit to number of concurrent requests that can run which is set by MaxConcurrentRequestsPerCPU and a few other parameters. For 1000 threads to execute in parallel in a true sense, you will need 1000 CPU cores. Otherwise, threads will need to be time sliced and that adds overhead to the system. Hence, there are limits to number of threads. I believe it is very difficult to comprehensively answer your questions through a single answer here. You will probably need to read up a little bit and a good place to start will be http://blogs.msdn.com/b/tmarq/archive/2007/07/21/asp-net-thread-usage-on-iis-7-0-and-6-0.aspx.

Broken connection does not time out

We've got a basic ASP Web API setup: a single ApiController with a number of methods configured as end-points for POST requests. These receive rather large JSON streams which are deserialized through the [FromBody] attribute.
What happens is that if connections are interrupted/terminated, the requests are kept in the IIS Worker Process Request Queue. They don't seem to time out. Worse, as something in the RequestHandler is keeping busy, after a number of failures and "stuck" requests, the CPU consumption is near 100%. Logging indicated that the action methods in the ApiController haven't executed as yet.
To make a large story short. Is there anything which I can do to let these methods time-out so they will be removed from the queue? Normal web.config doesn't seem to work.
in most cases compilation debug = true is the problem..
Here is some more discussion on similar issue..
if this also doesn't help... take a memory dump of w3wp process when CPU consumption is near 100% and do a dump analysis using windows debugging tools..
You can configure the Recycling settings of your application pool. In this settings from IIS you can recycle the application pool using a schedule, request number or a specific MB limit.

Flush HttpRuntime.Cache objects across all worker processes on IIS webserver

I have an asp.net web application running on IIS 7 set-up in web-garden mode. I want to clear runtime cache items across all worker processes using a single-step. I can setup a database key-value, but that would mean a thread executing on each worker process, on each of my load-balanced-scenario web servers will poll for changes on that key-value and flush cache. That would be a very bad mechanism as I flush cache items once per day at max. Also I cannot implement a push notification using the SqlCacheDependency with Service Broker notifications as I have a MySql db. Any thoughts? Is there any dirty work-around? One possible workaround, expose an aspx page, and hit that page multiple times using the ip and port on which the site is hosted instead of the domain name - ex: http://ip.ip.ip.ip:82/CacheClear.aspx, so that a request for that page might be sent to all the worker processes within that webserver, and on Page_Load, clear the cache items. But this is a really dirty hack and may not work in cases when all requests are sent to the same worker process.
You need to setup inter-process communication.
For caching there are two commonly used ways of doing this:
Setup a shared cache (memcached or the like.)
Setup a message queue (e.g. ms-mqueue or rabbitMq) and use it to spread state to the local caches.
A shared cache is the ultimate solution as it means the whole cache is distributed but it is also the most complex: it needs to be set up so the cache load is properly distributed between nodes and make sure it doesn't become a bottle neck.
The second option requires more code on your part but it is easier if you don't want to share the cache content (as in your case.)
The easiest is to setup a listener thread or task to handle the cache clear or individual entries invalidation messages. This thread will be dormant if there are no messages so the impact on performance is minimal.
You can also forgo the listener thread by handling messages as part of the usual iis request pipeline. I.e. set up a filter/module that checks for messages in the queue and processes them before handling the request; but performance wise the first option is (slightly) better.

Windows NT Service shutdown issues

I have developed middleware that provides RPC functionality to multiple client applications on multiple platforms within our organization. The middleware is written in C# and runs as a Windows NT Service. It handles things like file access to network shares, database access, etc. The middleware is hosted on two high end systems running Windows Server 2008.
When one of our server administrators goes to reboot the machine, primarily to do Windows Updates, there are serious problems with how the system behaves in regards to my NT Service. My service is designed to immediately stop listening for new connections, immediately start refusing new requests on existing connections, and otherwise shut down as rapidly as possible in the case of an OnStop or OnShutdown request from the SCM. Still, to maintain system integrity, operations that are currently in progress are allowed to continue for a reasonable time. Usually the server shuts down inside of 30 seconds (when the service is manually stopped for example). However, when the system is instructed to restart, my service immediately loses access to network drives and UNC paths, causing data integrity problems for any open files and partial writes to those locations. My service does list Workstation (and thus SMB Redirector) as a dependency, so I would think that my service would need to be stopped prior to Workstation/Redirector being stopped if Windows were honoring those dependencies.
Basically, my application is forced to crash and burn, failing remote procedure calls and eventually being forced to terminate by the operating system after a timeout period has elapsed (seems to be on the order of 20-30 seconds).
Unlike a Windows application, my Windows NT Service doesn't seem to have any power to stop a system shutdown in progress, delay the system shutdown, or even just the opportunity to save out any pending network share disk writes before being forcibly disconnected and shutdown. How is an NT Service developer supposed to have any kind of application integrity in this environment? Why is it that Forms Applications get all of the opportunity to finish their business prior to shutdown, while services seem to get no such benefits?
I have tried:
Calling SetProcessShutdownParameters via p/invoke to try to notify my application of the shutdown sooner to avoid Redirector shutting down before I do.
Calling ServiceBase.RequestAdditionalTime with a value less than or equal to the two minute limit.
Tweaking the WaitToKillServiceTimeout
Everything I can think of to make my service shutdown faster.
But in the end, I still get ~30 seconds of problematic time in which my service doesn't even seem to have been notified of an OnShutdown event yet, but requests are failing due to redirector no longer servicing my network share requests.
How is this issue meant to be resolved? What can I do to delay or stop the shutdown, or at least be allowed to shut down my active tasks without Redirector services disappearing out from under me? I can understand what Microsoft is trying to do to prevent services from dragging their feet and showing shutdowns, but that seems like a great goal for Windows client operating systems, not for servers. I don't want my servers to shutdown fast, I want operational integrity and graceful shutdowns.
Thanks in advance for any help you can provide.
UPDATE: I've done the exact timings. In a test shutdown, I got shutdown notification at 23:55:58 and noticed losing network share connectivity at 23:56:02. So within four seconds, I've lost the ability to save out any active state.
This question: https://serverfault.com/questions/34427/windows-service-dependencies on ServerFault should answer your own. It links to this article: http://blogs.technet.com/askperf/archive/2008/02/04/ws2008-service-shutdown-and-crash-handling.aspx, which should help you get pre-shutdown notification and service shutdown ordering.

WCF communication between 2 servers crashes after IIS7 process recycle

I am kind of stumped with this one, and was hoping I could find some answers here.
Basically, I have an ASP.NET application that is running across 2 servers. Server A has all of the business logic/data access exposed as web services, and Server B has the website which talks to those services (via WCF, with net.tcp binding).
The problem occurs a few seconds after a recycle of my app pool is initiated by IIS on Server A. The recycle happens after the allotted time (using the default of 29 hours set in IIS).
In the server log (of Server A):
A worker process with process id of
'####' serving application pool
'AppPoolName' has requested a recycle
because the worker process reached its
allowed processing time limit.
I believe that this is normal behavior. The problem is that a few seconds later, I get this exception on Server B:
This channel can no longer be used to
send messages as the output session
was auto-closed due to a
server-initiated shutdown. Either
disable auto-close by setting the
DispatchRuntime.AutomaticInputSessionShutdown
to false, or consider modifying the
shutdown protocol with the remote
server.
This doesn't happen on every recycle; I assume that it happens when someone is hitting the site with a request WHILE the recycle happens.
Furthermore, my application is down until I intervene; this exception continues to occur every time a subsequent request is made to the page. I intervene by editting the web.config (by adding a space or something benign to the end of file) and saving it- I assume that that causes my application to recompile and brings the services back up. I also have experimented with running a batch file that does this for me every time the exception happens ;)
Now, I could barely find any information on this exception, and I've been looking for a while. Most of the information I did find pertains to WCF settings that I am not using.
I already read up on "DispatchRuntime.AutomaticInputSessionShutdown" and I don't think it pertains to this situation. This particular property refers to the service shutting down automatically in response to behavior on the client side, which is not what is happening here. Here, the service is shutdown because of IIS.
I did read this which went through some sort of work around to bring the service back up automatically, but I am really looking to understand what is going on here, not to hack around it!
I have started playing around with the settings in IIS7, specifically turning on/off Overlapped Recycling and increasing the process startup/shutdown times. I am wondering whether it is safe to turn off recycling completely (I believe if I put 0 for the recycling time interval?) But again, I want to know what's going on!
Anyway, if you need more information, let me know. Thanks in advance!
This is probably related to how you open and close WCF connections.
If you open a proxy when your app starts and then continue to use this, a break in the connection, which is caused by a restart on the server side. Results in a error on the client side, since the server that the proxy was talking to is no longer there.
When you restart the client side (changing the web.config) new proxies are created against a server that is running.
The way to fix this is to make sure that you close a WCF connection after you use it.
http://www.codeguru.com/csharp/.net/net_wcf/article.php/c15941/
You should also make sure that you're using the correct SessionMode for your Web Service. I remember having similar trouble with some of my Services until I sorted out the correct mode. This is especially true when you're mixing this with any other authentication mode that is not "None".
This link might have some pointer.
http://msdn.microsoft.com/en-us/library/ms731193.aspx
My suggestion is to simply stop using IIS to host your services. Unless there is something you really need from IIS, I would recommend just writing a standard Windows Service to host your WCF endpoints.
If you can't do that, then by all means turn off recycling. AppPool recycling is mainly there because web developers write crappy code. I know that sounds rather blunt, but if you have enough sense to write code that doesn't leak then there is no reason to have IIS constantly restart your program.

Categories

Resources