I am trying to simulate X number of concurrent requests for a WCF Service and measure the response time for each request. I want to have all the requests hit the Service at more or less the same time.
As the first step I spawned X number of Threads, using the Thread class, and have invoked the Start method. To synchronize all the requests, on the Thread callback I open the connection and have a Monitor.Wait to hold the request from being fired, till all the Threads are created and started. Once all the Threads are started, I call Monitor.PulseAll to trigger the method invocation on the WCF Client Proxy.
When I execute the requests this way, I see a huge delay in the response. A request that should just a few milliseconds, is taking about a second.
I also noticed huge lag between the time the request is dispatched and the time it was received at the service method. I measured this by send sending client time stamp as a parameter value to the service method for each request.
I have the following settings. Assume "X" to the Concurrent number of requests I want to fire. Also note with the following settings I don't get any Denial of Service issues.
The Call chain is as follows,Client->Service1->Service2->Service3
All Services are PerCall with Concurrency set to Multiple.
Throttling set to X Concurrent calls, X Concurrent Instances.
MaxConnections, ListenBacklog on the Service to X.
Min/Max Threads of ThreadPool set to X on both Client and Server (I have applied the patch provided by Microsoft).
Am not sure if the response time I'm measuring is accurate. Am I missing something very trivial?
Any inputs on this would be of great help.
Thanks.
-Krishnan
I did find the answer by myself the hard way. All this while, the way I was measuring the response time was wrong. One should spawn X number of threads, where X is the number of concurrent users one wants to simulate. In each thread, open the connection only once and have while loop to only execute the WCF Method that you want to test for a given duration. Measure the response time against each return, accumulate it and average it out against the number of calls that were executed within the given duration.
If all your outgoing calls are coming from a single process, it is likely that the runtime is either consolidating multiple requests onto a single open channel or capping the number of concurrent requests to a single target service. You may have better results if you move each simulated client into its own process, use a named EventWaitHandle to synchronize them, and call #Set() to unleash them all at once.
There is also a limit to the number of simultaneous pending (TCP SYN or SYN/ACK state) outgoing TCP connections allowed by desktop builds of Windows; if you encounter this limit you will get event 4226 logged and the additional simultaneous connections will be deferred.
Related
I have an Azure HttpTrigger function which processes POST requests and scales out on heavy load. The issue is that the caller of the function only waits 3 sec. for a HTTP 200 status code.
But when an azure function scales out it takes 4-6 sec. until the request gets processed. If the caller sends a request during the scale out it is possible that he cancels the request and my service is never able to process it. Which is a worst case scenario.
Is there a way to prevent that? My ideal scenario would be an immediate HTTP 202 answer to the caller. But I'm afraid that this is not possible during a scale out process.
A Scale-out will require your app to be loaded onto another instance so some delay occurs/incur for those requests because of time taken to load your app onto the new instance.
As described in the Official Documentation:
Consumption Plan is the true serverless hosting plan since it enables scaling to zero when idle state, but some requests might have additional latency at startup.
To get constant low latency with autoscaling, you should move to the premium hosting plan which avoids cold starts with perpetually warm instances.
I have a design question on how to best approach a process within an existing DotNet2 web service I have inherited.
At a high level the process at the moment is as follows
Client
User starts new request via web service call from client/server app
Request and tasks created and saved to database.
New thread created that begins processing the request
Request ID returned to client (client polls using ID).
Thread
Loads up request detail and the multiple tasks
For each task it requests XML via another service (approx 2 sec wait)
Passes XML to another service for processing (approx 3 sec wait)
Repeat until all tasks complete
Marks request completed (client will know its finished)
Overall this takes approximately 30 seconds for a request of 6 tasks. With each task being performed sequentially it is clearly inefficient.
Would it be better to break out each task again on a separate thread and then when they are all complete mark the request as completed?
My reservation is that I am immediately duplicating the number of threads by up to a factor of 6-10 (number of tasks) and concerned on how this would impact on IIS. I estimate that I could cut a normal 30 second call down to around 5 seconds if I had each task processing concurrently but under load would this design suffer?
The current design is operating well and users have no problem with the time taken to process but I would prefer it work faster if possible.
Is this just a completely bad design and if so is there a better approach? I am limited by the current DotNet version at present.
Thanks
If you are worried about IIS performance you probably want to keep the jobs outside of IIS, so IMO I would consider queueing the tasks and creating a separate service to do the work. This approach would be more scaleable in that you could add or remove front end IIS servers or task processors to address a varying load. A large-scale system would most certainly perform the processing off of the front end server.
I do have my OWIN application hosted as a Windows Service and I am getting a lot of timeout issues from different clients. I have some metrics in place around the request/response time however the numbers are very different. For example I can see the client is taking around one minute to perform a request that looks like in the server is taking 3-4 seconds. I am then assuming that the number of requests that can be accepted has reached the limit and subsequent requests that come in would get queued up. Am I right? If that's the case, is there any way I can monitor the number of incoming requests at a given time and how big is the queue (as in number of requests pending to get served)?
I am playing around with https://msdn.microsoft.com/en-us/library/microsoft.owin.host.httplistener.owinhttplistener.setrequestprocessinglimits(v=vs.113).aspx but doesn't look to have any effect.
Any feedback is much appreciated.
THanks!
HttpListener is built on top of Http.Sys so you need to use its performance counters and ETW traces to get this level of information.
https://msdn.microsoft.com/en-us/library/windows/desktop/cc307239%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396
http://blogs.msdn.com/b/wndp/archive/2007/01/18/event-tracing-in-http-sys-part-1-capturing-a-trace.aspx
I am using a WebApi service controller, hosted by IIS,
and i'm trying to understand how this architecture really works:
When a WebPage client is sending an Async requests simultaneously, are all this requests executed in parallel at the WebApi controller ?
At the IIS app pool, i've noticed the queue size is set to 1,000 default value - Does it mean that 1,000 max threads can work in parallel at the same time at the WebApi server?
Or this value is only related to ths IIS queue?
I've read that the IIS maintains some kind of threads queue, is this queue sends its work asynchronously? or all the client requests sent by the IIS to the WebApi service are being sent synchronously?
The queue size you're looking at specifies the maximum number of requests that will be queued for each application pool (which typically maps to one w3wp worker process). Once the queue length is exceeded, 503 "Server Too Busy" errors will be returned.
Within each worker process, a number of threads can/will run. Each request runs on a thread within the worker process (defaulting to a maximum of 250 threads per process, I believe).
So, essentially, each request is processed on its own thread (concurrently - at least, as concurrently as threads get) but all threads for a particular app pool are (typically) managed by a single process. This means that requests are, indeed, executed asynchronously as far as the requests themselves are concerned.
In response to your comment; if you have sessions enabled (which you probably do), then ASP.NET will queue the requests in order maintain a lock on the session for each request. Try hitting your sleeping action in Chrome and then your quick-responding action in Firefox and see what happens. You should see that the two different sessions allow your requests to be executed concurrently.
Yes, all the requests will be executed in parallel using the threads from the CLR thread pool subject to limits. About the queue size set against the app pool, this limit is for IIS to start rejecting requests with a 503 - Service unavailable status code. Even before this happens, your requests will be queued by IIS/ASP.NET. That is because threads cannot be created at will. There is a limit to number of concurrent requests that can run which is set by MaxConcurrentRequestsPerCPU and a few other parameters. For 1000 threads to execute in parallel in a true sense, you will need 1000 CPU cores. Otherwise, threads will need to be time sliced and that adds overhead to the system. Hence, there are limits to number of threads. I believe it is very difficult to comprehensively answer your questions through a single answer here. You will probably need to read up a little bit and a good place to start will be http://blogs.msdn.com/b/tmarq/archive/2007/07/21/asp-net-thread-usage-on-iis-7-0-and-6-0.aspx.
Given a bunch of synchronous web requests, executed sequentially - it will take N seconds to complete the web requests, and receiving B bytes per second. However doing the exact same, but using asynchronous web requests, which makes it possible to execute all of the web requests in parallel - it will no longer take N seconds, however it will still receive B bytes per second.
Running a simple test, with 12 web requests - using both the synchronous and parallel approach, confirms that they both receive B bytes per second (using Resource Monitor).
My question is therefore... should the approach that executes the web requests in parallel, not receive more than B bytes per second, in order to make up for that it's faster than the synchronous approach? - Else the synchronous approach will both run longer, AND receive more bytes (totally) than the parallel approach.
These requests are not processed on your machine (unless connecting to localhost). This means that for each request to be fully processed, your machine will have to wait for a response.
Consider sending an invitation for your birthday party to friend #1, and after receiving a response, send one to friend #2, etcetera. It would be faster to send the invitation to all friends, and then wait for all of them to respond. Especially if friend #1 happens to be on holiday.
I don't know why the number of bytes per second are identical, perhaps some node in the network limits the speed, but the parallel approach can at least send out each request and "parallel out" the total wait time.
I don't understand how the synchronicity could affect the total number of bytes received. You're talking about bytes/second, but not the number of seconds spent at that transfer speed.