I am trying to test some web services that I have exposed. I have created some web performance tests that emulate user activity. I have put these into a load test, with a step load pattern, with the intent of loading on users to discover the number x of concurrent users that would make the response time go above 10 seconds.
I have tried doing this, but the results...are unexpected. Over 1,000 500-InternalServerErrors. Weirdly, however, my avg response time stays largely the same - at an extremely low value (the blue line in this graph), while the number of users increases to a maximum of 200, despite what the graph indicates (red line) and the number of requests/sec also increases (green). Surely this is incorrect, as page response time should go up with these.
Can anyone offer any insight as to what might be going on here, and how I might fix it ?
The data set I am working from is a test data set that is tiny, so my only theory is that perhaps all requests are being cached, explaining the snappy response time, but the server is still being inundated, hence the errors.
Apologies for the lack of details - I am new to performance testing. Any questions will be answered straight away. Many thanks :)
Related
I am developing a web API in ASP.NET that does some image processing. Essentially the user application will make get requests with a few arguments i.e an image, quality, text to draw on it, size, etc.
My concern is that I do not know exactly how fast these requests are going to come in. If I spam refresh on a get request for long enough, I see the memory starting to slowly increase until it hits 1G and then finally throws an OutOfMemory exception. Strangely enough, sometimes before hitting the OOM, I get a ArgumentException (even though I am using a valid request that works otherwise).
My questions are broad and as follows:
1) Is there a good tool to test this sort of mass request? I'd like to be able to spam my server so I can consistently analyze and troubleshoot any problems that arise. I havn't found anything and have just been clicking / pressing enter on the browser manually..
2) Is there a tool you'd recommend to analyze what specific processes in my program are causing this memory issue? If the Diagnostic Tools in VS are good enough, can you offer some guidance as to what I should be looking for? I.e investigating the call stack, memory profiling, etc..
3) Perhaps none of the above questions are even necessary if this one can be answered: Can these sort of requests be prevented? Maybe my API can ensure that they are only processed at a speed that can be handled (at the expense of user image load time).. I know that catching the exceptions alone isn't going to be enough, so is there something that ASP.NET provides for this sort of mass request prevention?
Thanks for taking the time to read, any answers are appreciated.
Removed the old question & rewriting completely because i've worked on this quite a bit to pinpoint the problem. My issue is that i'm writing a custom CMS with a custom server, with very very high speed/thoroughput as a goal, however i'm noticing that some data or data patterns cause major slowdowns (go from 0 to 55+ms response time). I really need someone better than me helping on this as i'm absolutely clueless about what is going on, i'm suspecting a bug in the .net Framework but i have no idea about where it could be, the little .net code browsing i did didn't suggest the output Stream does anything data-specific
Things that i've tested and am sure aren't the issue:
Size of the content (larger content is faster)
Type of the content (difference between the same content types)
Most of the surrounding code (made a minimalist project to reproduce the bug, standing at around 15 lines, find the link at the bottom of the post, includes data to reproduce it, run it, test with 2 URL, see for yourself).
Not an issue with webpages / cache etc, issue reproduced with a single image and CTRL+F5 in Firefox, removing the last few bytes of the image fixes it 100% of the time, adding them back causes the issue again
Not an issue that exists outside of the outputstream (replacing it with a target memorystream doesn't show the issue)
How to reproduce the issue:
Download & run the project
Use your favorite browser and go to localhost:8080/magicnumber
replace magicnumber in that url by what you want, you will receive the image back minus that amount of bytes
My result:
Constant 50ms or so with that image
Getting the magic number up to 1000 doesn't affect this at all
a bit further (i think around 1080 ish?) it suddently drops to 0MS
Not sure what is going on but it seems there are 2 requests per request at least when using CTRL+F5 in Firefox, in the correct case both are 0ms, in the error case the first remains 0ms but the other becomes 50ms, i'm assuming the first one is simply checking if the file cache is ok & i'm still answering but Firefox closes the connection or something?
Any help is much appreciated, placing all my rep on Bounty there as i really need to know if i go down this path / get more info to report this or if i go lower level and do my own http.sys interop (and, most of all, if the bug is only on the .net side or lower level & going lower level won't fix it!)
The sample file is a gziped array, my content is pre cached and pre compressed in db so this is representative of the data i need to send.
https://www.dropbox.com/s/ao63d7din939new/StackOverFlowSlowServerBug.zip
Edit : if i have fiddler open, the problematic test gets back to 0ms, i'm not sure what to make of it so far this means i'm getting a major slowdown, when sending some data, which isn't defined by the type of data but the actual data, and this doesn't happen if i have fiddler in between. I'm at loss!
Edit 2 : Tested with another browser just to be sure and, actually it's back to 0ms on IE so i'm assuming it may actually not be a HttpListener bug but instead a Firefox bug, i will edit my question & tags toward that if no one suggests otherwise. If this is the case, anyone know where i should be looking in FF's code to understand the issue? (it definately is an issue even if on their side since once again i'm comparing with 2 files, one larger than the other, same file format, the largest one always takes 0ms the smallest one always 55ms!)
Two requests problem:
Chrome:
First request = favicon
Second request = image
Firefox:
First request = image for the tab
Second request = image
More on this:
http://forums.mozillazine.org/viewtopic.php?t=341179
https://bugzilla.mozilla.org/show_bug.cgi?id=583351
IE:
Only appears to make the one request
If you send the requests through fiddler you never get two coming through.
Performance problem:
Firstly there's a problem with the timer in your demo app. It's restarted everytime the async request handler fires, meaning that the timer started for request A will be restarted when request B is received, possibly before request A is finished, so you won't be getting the correct values. Create the stopwatch inside the ContinueWith callback instead.
Secondly I can't see anyway that "magicnumber" will really affect performance (unless it causes an exception to be thrown I guess). The only way I can cause performance to degrade is by issuing a lot of concurrent requests and causing the wait lock to be continually hit.
In summary: I don't think there's a problem with the HttpListener class
The ReadWriteTimeout for HttpWebRequests seems to be defaulted to 5 minutes.
Is there a reason why it is that high? I was trying to set the timeout of an API call to 10 seconds, but it was spinning for a over 2 minutes.
WHen I set this to 30 seconds, it times out in a reasonable amount of time now.
Is it dangerous to set this too low?
I can't imagine something taking longer than 20-30 seconds in my application (small 2-30kb payloads).
Reference: http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.readwritetimeout.aspx
Sure there's a reason for a 5 minute time-out. It looks like this:
This contraption is a robotic tape retrieval system, used by the International Centre for Radio Astronomy Research. It stores 32.5 petabytes of historical data. When its server gets an HttpWebRequest, the machine sends the robot on its way to retrieve the tape with the data. This takes a while, as you might imagine.
These systems were quite common a decade ago, around the time .NET was designed. Not so much today, the unrelenting improvements in hard disk storage capacity made them close to obsolete. Although more than 5 petabyte of SAN storage still sets you back a rather major chunk of money. If speed is not essential then tape is hard to beat.
Clearly .NET cannot possibly reliably declare a timeout when it doesn't know anything about what's happening on the other end of the wire. So the default is high. If you have good reasons to believe that there's an upper limit on your particular setup then don't hesitate to lower it. Do make it an editable setting, you can't predict the future.
You can't possibly know what connection speed the users have that connect to your website. And as the creator of this framework you can't know either what the developer will host. This class already existed in .NET 1.1, so for a very long time. And back then the users had slower speed too.
Finding a good default value is very difficult. You don't want to set it too high to prevent security flaws, and you don't want to set it too low because this would result in a million (exaggerated) threads and requests about aborted requests.
I'm sorry I can't give you any official sources, but this is just reasonable.
Why 5 minutes? Why not?
JustAnotherUserYouMayKnow explained it to you pretty good.
But as usual, you have the freedom to change this default value to a value that suits to your very case, so feel free to follow the path that Christian pointed out.
Setting a default value is not an easy task at all when we are talking about millions of users and maybe millions of billions of possible scenarios involved.
The bootom line is that it isn't that much important why it's 5 minutes but rather how you can adjust it to your very needs.
Well by setting it that low you may or may introduce a series of issues. As you may be able to reach the site within a reasonable time, others may not.
A perfect example is Verizon, they invoke a series of Proxy Servers which can drastically slow a connection down. The reason I brought such an example up; is our application specified a one-minute Timeout before it throws an exception.
Our server has no issues with large amounts of request, it handles them quite easily. However, some of our users throughout the world receive this error: Error 10060.
The issue can route from a incorrect Proxy Configuration or Invalid Registry Key which actually handles the Timeout request.
You'd think that one minute would indeed be fast enough, but it actually isn't. As with this customers particular network it doesn't siphon through the data quick enough- thus causing an error.
So you asked:
Why is the HttpWebRequest ReadWrite Timeout Defaulted to five minutes?
They are attempting to account for the lowest common denominator.
Simply, each network and client may have a vast degree of traffic or delays as it moves to the desired location. If it can't get to the destination within your ports ideal socket request your user will experience an exception.
Some really important things to know about a network:
Some networks that are configured have a limited hop count / time to live.
Proxies and Firewalls which are heavy in filtering data and security, may delay your traffic.
Some areas do not have Fiber or Cable high-speed. They may rely on Satellite or DSL.
Each network protocol is different.
Those are a few variables that you have to consider. If we are talking about an internet; each client has a home network; which connects to ISP; which connects to the Internet; which connects to you. So you have several forms of traffic to be aggregated.
If we are talking about an Intranet, with most modern day technology the odds of your time being an issue are slim but still possible.
Also each individual computer can partake or cause an issue. In Windows 8 the default Timeout specified for the browser is one minute; in some cases those users may experience exceptions with your application, your site, or others. So you'd manually alter the ServerTimeOut and TimeOut key in the registry to assign a longer value.
In short:
Client Machines may pose a problem in reaching your site within your allocated time.
Network / ISP may incur a problem for some users.
Your Server may be configured incorrectly or not allocate the right amount of time.
These are all variables that need to be accounted for; as they will impact access to your application. Unfortunately you won't know for certain until it's launched and users begin to utilize your site.
Unfortunately you won't know if your time you specified will be enough; but it defaults to a higher number because there is so much variation across the world that it is trying to consider the lowest common denominator. As your goal is to reach as many people as possible.
By the way very nice question, and some great answers so far as well.
I have a fairly busy site which does around 10m views a month.
One of my app pools seemed to jam up for a few hours and I'm looking for some ideas on how to troubleshoot it..? I suspect that it somehow ran out of threads but I'm not sure how to determine this retroactively..? Here's what I know:
The site never went 'down', but around 90% of requests started timing out.
I can see a high number of "HttpException - Request timed out." in the log during the outage
I can't find any SQL errors or code errors that would have caused the timeouts.
The timeouts seem to have been site wide on all pages.
There was one page with a bug on it which would have caused errors on that specific page.
The site had to be restarted.
The site is ASP.NET C# 3.5 WebForms..
Possibilities:
Thread depletion: My thought is that the page causing the error may have somehow started jamming up the available threads?
Global code error: Another possibility is that one of my static classes has an undiscovered bug in it somewhere. This is unlikely as the this has never happened before, and I can't find any log errors for these classes, but it is a possibility.
UPDATE
I've managed to trace the issue now while it's occurring. The pages are being loaded normally but for some reason WebResource.axd and ScriptResource.axd are both taking a minute to load. In the performance counters I can see ASP.NET Requests Queued spikes at this point.
The first thing I'd try is Sam Saffron's CPU analyzer tool, which should give an indication if there is something common that is happening too much / too long. In part because it doesn't involve any changes; just run it at the server.
After that, there are various other debugging tools available; we've found that some very ghetto approaches can be insanely effective at seeing where time is spent (of course, it'll only work on the 10% of successful results).
You can of course just open the server profiling tools and drag in various .NET / IIS counters, which may help you spot some things.
Between these three options, you should be covered for:
code dropping into a black hole and never coming out (typically threading related)
code running, but too slowly (typically data access related)
Ok, I'm currently writing a scheduling web app for a large company, and it needs to be fast. Normal fast (<1s) doesn't cut it with these guys, so we're aiming for <0.5s, which is hard to achieve when using postbacks.
My question is: does anyone have a suggestion of how best to buffer calendar/schedule data to speed load times?
My plan is to load the selected week's data, and another week on either side, and use these to buffer the output: i.e. it will never have load the week you've asked for, it'll always have that in memory, and it'll buffer the weeks on either side for when you next change.
However, I'm not sure exactly how to achieve this, the asynch loading is simple when using ajax pagemethods, but it's a question of where to store the data (temporarily) after it loads: I am currently using a static class with a dictionary> to do it, but this is probably not the best way when it comes to scaling to the large userbase.
Any suggestions?
EDIT
The amount of data loaded is not particularly high (there are a few fields on each appointment, which are converted to a small container class and have some processing done on them to organise the dates and calculate the concurrent appointments, and it's unlikely there'll be more than ~30 appointments a week due to the domain) however the database is under very high load from other areas of the application (this is a very large scale system with thousands of users transfering a large volume of information around).
So are you putting your buffered content on the client or the server here? I would think the thing to do would be to chuck the data for previous and next weeks into a javascript data structure on the page and then let the client arrange it for you. Then you could just be bouncing back to the server asynchronously for the next week when one of your buffered neighbour weeks is opened so you're always a week ahead as you have said, assuming that the data will only be accessed in a week-by-week way.
I would also, for the sake of experimentation, see what happens if you put a lot more calendar data into the page to process with Javascript - this type of data can often be pretty small, even a lot of information barely adding up to the equivalent of a small image in terms of data transfer - and you may well find that you can have quiet a bit of information cached ahead of time.
It can be really easy to assume that because you have a tool like Ajax you should be using it the whole time, but then I do use a hammer for pretty much all jobs around the home, so I'm a fine one to talk on that front.
The buffering won't help on the first page, though - only on subsequent back/forward requests.
Tbh I don't think there's much point, as you'll want to support hyperlinks and redirects from other sources as much as or more than just back/forward. You might also want to "jump" to a month. Forcing users to page back and forwards to get to the month they want is actually going to take longer and be more frustrating than a <1s response time to go straight to the page they want.
You're better off caching data generally (using something like Velocity) so that you almost never hit the db, but even that's going to be hard with lots of users.
My recommendation is to get it working, then use a profiling tool (like ANTS Profiler) to see which bits of code you can optimise once it's functionally correct.