We've got a basic ASP Web API setup: a single ApiController with a number of methods configured as end-points for POST requests. These receive rather large JSON streams which are deserialized through the [FromBody] attribute.
What happens is that if connections are interrupted/terminated, the requests are kept in the IIS Worker Process Request Queue. They don't seem to time out. Worse, as something in the RequestHandler is keeping busy, after a number of failures and "stuck" requests, the CPU consumption is near 100%. Logging indicated that the action methods in the ApiController haven't executed as yet.
To make a large story short. Is there anything which I can do to let these methods time-out so they will be removed from the queue? Normal web.config doesn't seem to work.
in most cases compilation debug = true is the problem..
Here is some more discussion on similar issue..
if this also doesn't help... take a memory dump of w3wp process when CPU consumption is near 100% and do a dump analysis using windows debugging tools..
You can configure the Recycling settings of your application pool. In this settings from IIS you can recycle the application pool using a schedule, request number or a specific MB limit.
Related
We have several OData API's using Entity Framework and AutoMapper. These connect to an on-premise SQL database through a VNet. The GET requests of this API are not asynchronous per example found here. The scaling is set to S2. We have enabled always on.
Sometimes the requests complete in 500 ms. Sometimes the very same requests take 40 seconds. We have tried scaling out but this offers no tangible benefit. We have tried making the GET function on the controllers async. We have tried disabling authentication. We have tried looking at the application insights call stack in the profiler but sometimes the code hangs on one call, while other times on another. We even found a 39 second call to String.Replace(). We've tried Kudu but can't seem to get any knowledge from it.
On top of this I alone succeed in bringing the server to its knees simply by spamming F5 on a relatively simple request, locking the CPU at 100%. S2 seems pretty high already, and we are stunned that the server apparently cannot handle it. And it's also not always the case that low CPU usage on the server equals fast requests. Sometimes these requests also take an extraordinary amount of time.
We have tried looking at the application insights data but grow even more confused as some data suggests one thing is at fault while other data suggests it is not.
CPU usage on the app service plan is high.
CPU usage in the live metrics usually remains low.
This suggests that SQL is at fault. But we have almost ruled that out since if we spam an API on one app service plan and send the same single request to another app service plan we get the result immediately.
This suggests that the code or server is at fault.
How can we diagnose this issue and find the bottleneck?
If this has been asked before my apologies, and this is .NET 2.0 ASMX Web services, again my apologies =D
A .NET Application that only exposes web services. Roughly 10 million messages per day load balanced between multiple IIS Servers. Each incoming messages is XML, and an outgoing message is XML. (XMLElement) (we have beefy servers that run on steroids).
I have a SLA that all messages are processed in under X Seconds.
One function, Linking Methods, in the process is now taking 10-20 seconds, it is required for every transaction, however is not critical that it happens before the web service returns the results. Because of this I made a suggestion to throw it on another thread, but now realize that my words and the eager developers behind them might have not fully thought this through.
The below example shows on the left the current flow. On the right what is being attempted
Effectively what I'm looking for is to have a web service spawn a long running (10-20 second) thread that will execute even after the web service is completed.
This is what, effectively, is going on:
Thread linkThread= new Thread(delegate()
{
Linkmembers(GetContext(), ID1, ID2, SomeOtherThing, XMLOrSomething);
});
linkThread.Start();
Using this we've reduced the time from 19 seconds to 2.1 seconds on our dev boxes, which is quite substantial.
I am worried that with the amount of traffic we get, and if a vendor/outside party decides to throttle us, IIS might decide to recycle/kill those threads before they're done processing. I agree our solution might not be the "best" however we don't have the time to build in a Queue system or another Windows Service to handle this.
Is there a better way to do this? Any caveats that should be considered?
Thanks.
Apart from the issues you've described, I cannot think of any. That being said, there are ways to fix the problem that do not involve building your own solution from scratch.
Use MSMQ with WCF: Create a WCF service with an MSMQ endpoint that is IIS hosted (no need to use a windows service as long as WAS is enabled) and make calls to the service from within your ASMX service. You reap all the benefits of reliable queueing without having to build your own.
Plus, if your MSMQ service fails or throws an exception, it will reprocess automatically. If you use DTC and are hitting a database, you can even have the MSMQ transaction flow to the DB.
We have a large process in our application that runs once a month. This process typically runs in about 30 minutes and generates 342000 or so log events. Recently we updated our logging to a centralized model using WCF and are now having difficulty with performance. Whereas the previous solution would complete in about 30 minutes, with the new logging, it now takes 3 or 4 hours. The problem it seems is because the application is actually waiting for the WCF request to complete before execution continues. The WCF method is already configured as IsOneWay and I wrapped the call on the client side to that WCF method in a different thread to try to prevent this type of problem but it doesn't seem to have worked. I have thought about using the async WCF calls but thought before I tried something else I would ask here to see if there is a better way to handle this.
342000 log events in 30 minutes, if I did my math correctly, comes out to 190 log events per second. I think your problem may have to do with the default throttling settings in WCF. Even if your method is set to one-way, depending on if you're creating a new proxy for each logged event, calling the method will still block while the proxy is created, the channel is opened, and if you're using an HTTP-based binding, it will block until the message has been received by the service (an HTTP-based binding sends back a null response for a 1-way method call when the message is received). The default WCF throttling limits concurrent instances to 10 on the service side, which means only 10 requests will be handled at a time, and any further requests will get queued, so pair that with an HTTP binding, and anything after the first 10 requests are going to block at the client until it's one of the 10 requests getting handled. Without knowing how your services are configured (instance mode, etc.) it's hard to say more than that, but if you're using per-call instancing, I'd recommend setting MaxConcurrentCalls and MaxConcurrentInstances on your ServiceBehavior to something much higher (the defaults are 16 and 10, respectively).
Also, to build on what others have mentioned about aggregating multiple events and submitting them all at once, I've found it helpful to setup a static Logger.LogEvent(eventData) method. That way it's simple to use throughout your code, and you can control in your LogEvent method how you want logging to behave throughout your application, such as configuring how many events should get submitted at a time.
Making a call to another process or remote service (i.e. calling a WCF service) is about the most expensive thing you can do in an application. Doing it 342,000 times is just sheer insanity!
If you must log to a centralized service, you need to accumulate batches of log entries and then, only when you have say 1000 or so in memory, send them all to the service in one hit. This will give you a reasonable performance improvement.
log4net has a buffering system that exists outside the context of the calling thread, so it won't hold up your call while it logs. Its usage should be clear from the many appender config examples - search for the term bufferSize. It's used on many of the slower appenders (eg. remoting, email) to keep the source thread moving without waiting on the slower logging medium, and there is also a generic buffering meta-appender that may be used "in front of" any other appender.
We use it with an AdoNetAppender in a system of similar volume and it works wonderfully.
There's always the traditional syslog there are plenty of syslog daemons that run on Windows. Its designed to be a more efficient way of centralised logging than WCF, which is designed for less intensive opertions, especially if you're not using the tcpip WCF configuration.
In other words, have a go with this - the correct tool for the job.
My knowledge of how processes are handled by the ASP.Net worker process is woefully inadequate. I'm hoping some of the experts out there can fill me in.
If I crash the worker process with a System.OutOfMemoryException, what would the user experience be for other users who were being served by the same process? Would they get a blank screen? 503 error?
I'm going to attempt to test this scenario with some other folks in our lab, but I thought I would float this out there. I will update with our results.
UPDATE: Our results varied. If we artificially induced a OOM exception (for example by loading larger and larger PDFs into memory), other threads being served by that worker process would "hang" temporarily and then complete, while others seemingly would never return. Thank you for your responses.
W3WP.exe is the process
IIS runs all web apps in a generic worker process - w3wp.exe. Whether you write in ASP.NET, or ISAPI, or some other framework, the process that serves the web request is w3wp.exe. In the ASP.NET case, w3wp.exe loads the ASP.NET JIT-compiled DLLs and services the requests through them. In other cases, it works differently. But the key point is, w3wp.exe is the process. This model started in IIS6.0 and continues in IIS7.0.
Unexpected Failures
If the W3WP.exe fails unexpectedly, for any reason, all transactions it was handling will likely get 500 errors (Server error). IIS will start a new worker process in its place (MS calls this "Health Monitoring"), which means the web app will continue to run. Users that did not have a request being served by the failing process at the time of failure, will be unaware of any of this.
The HTTP 500 error that a client receives in this case will be indistinguishable from a 500 error that the client receives in the case of an application error, let's say an uncaught exception in your ASPNET application code.
For those requests that were in the failing process, there's no way to recover them. They will result in 500 errors at the browser. A 503 Server Busy results from IIS actively refusing the connection due to a threshold on the number of connections. A 503 does not result from an application failure, so you shouldn't expect to see 503 for in-flight transactions in the out-of-memory-and-crash scenario. On a heavily loaded system, you may see 503's as the process-crash-and-restart happens, as a secondary effect. If this is really what you're seeing, you need a larger margin of safety to handle the load in the single-error condition.
The Request Queue
IIS has a hand-off approach for requests. As they arrive on the network layer (Http.sys), they are placed in a queue, to be picked up by a worker process. Any requests waiting in the IIS queue to be handled by a WP will continue unaffected, though they might see a slight temporary increase in latency (service time) due to resource contention, since one fewer process is running on the server. Wait time in this queue is generally very very short, on a system that is configured properly.
It is when this queue is full that you will see 503 errors.
Auto restart of W3WP.exe
IIS has an auto-restart (or "nanny") facility, through which it restarts worker processes after they have exceeded configured thresholds, such as memory size, number of requests, or time-of-running. In all those cases, IIS will quiesce and restart worker processes when the configured threshold is reached. These pro-active restarts normally do not result in any disruption of requests. When IIS decides that a restart of a worker process is necessary, it prevents any new requests from arriving at that to-be-quiesced WP. Existing requests are drained: any in-flight transactions in that WP are allowed to complete normally. When all requests in the WP complete, then the WP dies and IIS starts a new one in its place. This new process then immediately begins picking up new requests from the dispatch queue. This is all transparent to users or browsers.
I say normally because it's possible that the worker process has become truly sick at the same time as the threshold has been reached. In that case the w3wp.exe may not respond to IIS within the configured "quiesce" timeout, and thus IIS has to eventually kill the process even though it hasn't reported that all of its in-flight requests have completed. This should be exceedingly rare, because it's two distinct exceptional conditions, but it happens. In this case, the in-flight requests will once again, get 500 errors.
Web gardens
Also - IIS allows multiple worker processes on a single server. MS calls this a "web garden", a play on words from "web farm". If you have a web garden set up, then transactions being served by w3wp.exe instances other than the failing one, will continue unaffected. "Unaffected" presumes though, that the out-of-memory error is localized, and not a system-wide problem.
Bottom Line
The bottom line is that there is no substitute for your own testing. The configuration options are pretty broad - from restart thresholds to web gardens and so on. Also the failure modes tend to be pretty complex and varied, whether it's memory, timeout, too busy, and so on. You'll want to understand what to expect.
ps: this Q&A really belongs on serverfault.com !!
references:
http://blogs.iis.net/thomad/archive/2008/05/07/the-iis-process-model-features.aspx
A new worker thread will be started and the user would not know anything happened. Unless it shuts down completely via rapid fail (http://technet.microsoft.com/en-us/library/cc779127(WS.10).aspx)
If it's an out of memory situation, iis usually just recycles the app pool.
As the other answers say, in most cases everything just restarts, and most users who did not have a pending request at the time will not notice much more than a delay.
However, if your application uses session variables with In-Proc session state, all session variables for all users will be lost when the app pool restarts. This may or may not have a negative effect on the users, depending on what you're doing with the session variables. You can avoid this by switching to StateServer or SQL Server session storage.
I am kind of stumped with this one, and was hoping I could find some answers here.
Basically, I have an ASP.NET application that is running across 2 servers. Server A has all of the business logic/data access exposed as web services, and Server B has the website which talks to those services (via WCF, with net.tcp binding).
The problem occurs a few seconds after a recycle of my app pool is initiated by IIS on Server A. The recycle happens after the allotted time (using the default of 29 hours set in IIS).
In the server log (of Server A):
A worker process with process id of
'####' serving application pool
'AppPoolName' has requested a recycle
because the worker process reached its
allowed processing time limit.
I believe that this is normal behavior. The problem is that a few seconds later, I get this exception on Server B:
This channel can no longer be used to
send messages as the output session
was auto-closed due to a
server-initiated shutdown. Either
disable auto-close by setting the
DispatchRuntime.AutomaticInputSessionShutdown
to false, or consider modifying the
shutdown protocol with the remote
server.
This doesn't happen on every recycle; I assume that it happens when someone is hitting the site with a request WHILE the recycle happens.
Furthermore, my application is down until I intervene; this exception continues to occur every time a subsequent request is made to the page. I intervene by editting the web.config (by adding a space or something benign to the end of file) and saving it- I assume that that causes my application to recompile and brings the services back up. I also have experimented with running a batch file that does this for me every time the exception happens ;)
Now, I could barely find any information on this exception, and I've been looking for a while. Most of the information I did find pertains to WCF settings that I am not using.
I already read up on "DispatchRuntime.AutomaticInputSessionShutdown" and I don't think it pertains to this situation. This particular property refers to the service shutting down automatically in response to behavior on the client side, which is not what is happening here. Here, the service is shutdown because of IIS.
I did read this which went through some sort of work around to bring the service back up automatically, but I am really looking to understand what is going on here, not to hack around it!
I have started playing around with the settings in IIS7, specifically turning on/off Overlapped Recycling and increasing the process startup/shutdown times. I am wondering whether it is safe to turn off recycling completely (I believe if I put 0 for the recycling time interval?) But again, I want to know what's going on!
Anyway, if you need more information, let me know. Thanks in advance!
This is probably related to how you open and close WCF connections.
If you open a proxy when your app starts and then continue to use this, a break in the connection, which is caused by a restart on the server side. Results in a error on the client side, since the server that the proxy was talking to is no longer there.
When you restart the client side (changing the web.config) new proxies are created against a server that is running.
The way to fix this is to make sure that you close a WCF connection after you use it.
http://www.codeguru.com/csharp/.net/net_wcf/article.php/c15941/
You should also make sure that you're using the correct SessionMode for your Web Service. I remember having similar trouble with some of my Services until I sorted out the correct mode. This is especially true when you're mixing this with any other authentication mode that is not "None".
This link might have some pointer.
http://msdn.microsoft.com/en-us/library/ms731193.aspx
My suggestion is to simply stop using IIS to host your services. Unless there is something you really need from IIS, I would recommend just writing a standard Windows Service to host your WCF endpoints.
If you can't do that, then by all means turn off recycling. AppPool recycling is mainly there because web developers write crappy code. I know that sounds rather blunt, but if you have enough sense to write code that doesn't leak then there is no reason to have IIS constantly restart your program.