I've recently deployed a MVC application to an IIS6 web server. One strange behaviour I've been having is the load times will randomly blow up to 30sec+ and then return to normal. Our tests have shown this occurring on multiple connections at the same time. Once the wait has passed, the site become responsive again. It's completely random when this will occur, but will probably happen about once every 15 minutes or so.
My first thought was the application was being restarted by the web server for some reason, but I determined this wasn't the case because the process recycling is set very infrequently, and I placed some logging in the application startup.
It's also nothing to do with the database connection. This slowdown happens simply by moving between static pages too. I've watched the database with a SQL profiler, and nothing is hitting it when these slowdowns occur.
Finally, I've placed entry and exit logging on my controller actions, the slowdown always happens outside of the controller. The entry and exit time for a controller action is always appropriately fast.
Does anyone have any ideas of what could be causing this? I've tried running it locally on IIS7 and I haven't had the issue. I can only think it's something to do with our hosting provider.
Is this running on a deadicated server? if not it might be your hosting providor.
It sounds to me from what you have said that the server every 15 mins is maxing its CPU for some reason. It could be something in code hitting a infinate loop, have you had a look in the event log for any crashes / error from the application.
Run the web app under a profiler (eg JetBrains) and dump out the results after one of these 30 seconds lockups occur. The profiler output should make locating the bottleneck fairly obvious as it will pinpoint the exact API call which is consuming the time/blocking other threads.
At a guess it could be memory pressure causing items being dumped from cache or garbage collection, although 30 seconds sounds a little excessive for this.
Related
I'm the developer in charge of a C# web application running on IIS 10.0 and I use the FluentScheduler library to schedule my jobs.
This job does a database query and then generates some files. Recently our jobs have been failing as they were running for too long and Windows kills the thread (this only happens on specific days with a large influx of data to be processed).
After doing some optimizations on the database access I got the killing of the thread down a bunch but it still occasionally happens.
The problem is that after having it's thread killed (and logging the exception), the job stops running on it's scheduled time.
How can I make sure the job keeps running even if this exception does happen?
My code below:
Schedule(new GenerateFiles()).NonReentrant().ToRunOnceAt(DateTime.Now.AddMinutes(10)).AndEvery(30).Seconds();
The 10-minute delay on the first run is there because we cache some information available on the database to improve the application's performance and this job uses data from that cache and I don't know how to make it only begin after the cache is done so I added this delay.
Any other exceptions caught and logged on my jobs does not cause this issue. It's only when the thread runs for too long and Windows kills it that it stops running again.
Edit: Adding the line in which the application fails (at least that's what the Stack Trace tells me.
The entire job is quite extensive and I can't really post it here.
foreach (var datumToGenerate in context.GenerateData.Include(f => f.Datum))
{
var datum = datumToGenerate.Datum;
if (!datum.Generated)
{
output.Add(datum);
i++;
if (i == 100) return output;
}
}
As you can see, I lowered the number of entries to be processed at a time to 100 but even setting it as 50 or a low number, I get the error eventually as the GenerateData table is quite large even though its entries get deleted after being processed.
Edit2: The code, in fact, fails on any random part of the class. It runs fine for around 10 minutes and then it just crashes. Am I simply screwed??
Me and my colleague found the answer to the issue.
It had nothing to do with code.
IIS's application pool has a default sleep timeout of 20 minutes. We disabled the application pool's timeout by setting its value to 0 and never again did that exception occur.
I've got an API written in C# (webforms) and an SQL Server 2008 database that accepts JSON POST data on an AWS EC2 VM. My problem is that the "first" use of this API is rather slow to respond.
What I mean by "first" is that if I were to wait for an hour or so, then post some data, that would be the first. Subsequent posts would process rather quickly in comparison, and I would need to wait another hour or so before experiencing the slow "first" transaction again.
Since only the initial post is slow, it makes me wonder if something is "spinning down" after being idle for some time, and then spinning up again upon first use, adding the extra time.
Things I have tried -
Run program through a performance profiler - This didn't really help. As far as I can see, the program itself doesn't have any obvious parts that run very slowly or inefficiently.
Change configuration to persist at least 1 connection to the database at all times. Again, no real change. I did this by adding "Min Pool Size=1;Max Pool Size=100" to my connection string.
Change configuration to use named pipes instead of TCP. Once again, no real change. I did this by adding "np:" before the server specified in my connection string, eg. server=np:MyServer;database=MyDatabase;
Is there anything else I can do to diagnose the problem? What else should I be looking for in this scenario?
Chances are your app pool is shutting down after a designated period of non-use. The first call after the shutdown forces everything to get loaded back into memory which explains the lag.
You could play with these settings: http://technet.microsoft.com/en-us/library/cc771956%28v=ws.10%29.aspx to see if you get the desired effect, or setup a task scheduler job that makes at least one call every 10+/- minutes of so by doing a simulated post - a simple powershell script could handle that for you and will keep everything 'primed' for the next use.
I have a fairly busy site which does around 10m views a month.
One of my app pools seemed to jam up for a few hours and I'm looking for some ideas on how to troubleshoot it..? I suspect that it somehow ran out of threads but I'm not sure how to determine this retroactively..? Here's what I know:
The site never went 'down', but around 90% of requests started timing out.
I can see a high number of "HttpException - Request timed out." in the log during the outage
I can't find any SQL errors or code errors that would have caused the timeouts.
The timeouts seem to have been site wide on all pages.
There was one page with a bug on it which would have caused errors on that specific page.
The site had to be restarted.
The site is ASP.NET C# 3.5 WebForms..
Possibilities:
Thread depletion: My thought is that the page causing the error may have somehow started jamming up the available threads?
Global code error: Another possibility is that one of my static classes has an undiscovered bug in it somewhere. This is unlikely as the this has never happened before, and I can't find any log errors for these classes, but it is a possibility.
UPDATE
I've managed to trace the issue now while it's occurring. The pages are being loaded normally but for some reason WebResource.axd and ScriptResource.axd are both taking a minute to load. In the performance counters I can see ASP.NET Requests Queued spikes at this point.
The first thing I'd try is Sam Saffron's CPU analyzer tool, which should give an indication if there is something common that is happening too much / too long. In part because it doesn't involve any changes; just run it at the server.
After that, there are various other debugging tools available; we've found that some very ghetto approaches can be insanely effective at seeing where time is spent (of course, it'll only work on the 10% of successful results).
You can of course just open the server profiling tools and drag in various .NET / IIS counters, which may help you spot some things.
Between these three options, you should be covered for:
code dropping into a black hole and never coming out (typically threading related)
code running, but too slowly (typically data access related)
Calling a WCF published orchestration from a C# program usually is sub-second response time. However, on some occasions, it can take 20-50- seconds between the call in the C# program and the first trace message from the orchestration. The C# that runs calls the WCF runs under HIS/HIP (Host Integration Services/CICS Host-Initiated Processing).
Almost everytime I restart the HIS/HIP service, we have a very slow response time, and thus a timeout in CICS. I'm also afraid it might happen during the day if things "go cold" - in other words maybe things are being cached. Even JIT first-times compiles shouldn't take 20-50 seconds should they? The other thing that seem strange is that the slow response time seems to be the load of the orchestration, which is running under the BizTalk service, not the HIP/Service which I cycled.
The fear is that when we go live, the first user in the morning (or after a "cold-spell" will get the timeout). The second time they try it after the time-out, it is always fast.
I've done a few tests by restarting each of the following:
1) BizTalk services
2) IIS
3) HIS/HIP Transaction Integrator (HIP Service)
Restarting any one of them tends to cause about a 20 second delay.
Restarting all 3 is like the kiss of death - about a 60 second delay before first trace appears from orchestration.
The HIP program always gives its first trace quickly, even when the HIP service is restarted. Not sure why restarting HIP slows down the starting of the orchestration.
Thanks,
Neal Walters
I have seen this kind of behavior with the MQSeries adapter as well. After a period of inactivity the COM+ components which enable communication with MQSeries will shut down due to inactivity.
What we had was a 10 minute timer which would force some sort of a keep-alive message. I don't know if you have a non-destructive call which can be sent, or if you can build one into the system just for this purpose.
I have the same problem with a BizTalk flow that needs to work in 2 seconds, but when it was unused for some time the reload of the dll into cache generated a timeout.
We found a solution in MS's Orchestration Engine Configuration documentation, where they explain how to avoid unloading of the dlls:
Using the options SecondsIdleBeforeShutdown and SecondsEmptyBeforeShutdown from AppDomainSpecs and assigning to the desired dlls in the ExactAssignmentRules or PatternAssignmentRules sections, you can have your dlls permanently loaded, and maybe you can avoid the caller application.
Take into account that if you restart the BizTalk host, the dll will be loaded again.
This is a pretty vague question and getting it answered seems like a long shot, but I don't know what else to do.
Ever since I made my website live every now and then it will just freeze. You click on a link and the browser will just site there looking like its trying to connect. It seems the freezing can last up to 2 minutes or so, then everything is just fine. Then a little while later, it will do the same thing.
I track all the exceptions that occur on my website in a log file.
I get these quite a bit ..
Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding
And the stack trace shows it leading to some method that's connecting to the database.
I'm assuming the freezing has to do with this timeout problem. My website is hosted on a shared server, and my database is on some other server with about a billion other databases as well.
But even being on a shared server, this freezing problem happens all the time. Its extremely annoying. And I can see this being a pretty catastrophic problem considering my site is ecommerce based and people are doing transactions on it. The last thing I want is the site freezing when my users hit the 'Submit payment' button, then it results in them hitting the submit payment button over and over again because the site froze, then there credit card gets charged about 10 extra times.
Does anyone have any suggestions on the best way to handle this?
I am guessing that it has to do with the database connections. Check to see that they are getting released properly? If not then it will use them all up.
Also check to see if your database has connection pooling configured.
Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding
That's a Sql command timeout exception - it can be somewhat common if your database is under load. Make sure you're disposing of SqlConnections and SqlCommands - though that'd usually result in a pool timeout exception (can't retrieve a connection from the connection pool).
Odds are, someone is running queries that are badly tuned or otherwise sucking resources. It may be your site, but since you're on a shared db server, it could just as easily be someone else's. It could also be blocking, or open transactions - since those would be on your database, that'd be a coding issue. You'll probably need to get your hosting provider involved to track it down or move to a dedicated db server.
You can decrease the CommandTimeout of your SqlCommands - I know that sounds somewhat counter-intuitive, but I often find that it's better to fail early than try for 60 seconds throwing additional load on the server. If your .5 second query isn't done in 5 seconds, odds are it won't be done in 60 either.
Alternatively, if you're the patient type, you can increase the CommandTimeout - but there's also a IIS timeout of 90 seconds that you'll need to modify if you bump it up too much.
The timeout errors is definitely the source of the freezing pages. When it happens the page will wait something like a minute for the database connection before it returns the error message. As the web server only handles one page at a time from each user, the entire site will seem to be frozen for the user until the timeout error comes. Even if it only happens for a few users once in a while, it will seem quite severe to them as they can't access the site at all for a minute or so.
How severe the problem really is depends on how many errors you get. From your description it sounds like you get a bit too many to be normal.
Make sure that all your data readers, command objects and connection objects gets disposed properly, so that you don't leave connections open.
Look for deadlock errors in the log also, as they can cause timeouts. If you have queries that lock each other, you may be able to improve them by changing the order that they use the tables.
Check the SQL Server logs, especially for deadlocks.
If you have multiple connections open, one might be waiting on a row that is locked by the other.