I am developing a C# WCF service which calls a backend c# server application.
We are running performance tests on the service.
For example each test might consist of these steps
- Logon
- Create application object ( gets created in Sql database by server application )
- Delete application object
- Logoff
We run the test with 100 concurrent users ( ie unique client threads ) with no ramp up and no user wait time between test steps
We have done quite a bit of optimisation on the server side, so that the tests run quite well when we run them repeatedly - say 100 concurrent threads, each thread repeats the test steps 25 times - results of typically about average 1 second response time for each step in the test which is OK.
However when we run tests with 100 concurrent users but only run the tests once in each thread, the results are inconsistent - sometimes the test steps may take quite a bit longer average elapsed time could be 5 seconds for a step in the test.
It appears that under a sudden burst of activity, the service returns inconsistent performance results.
I have tried several things to optimise performance
( for compatibility with the client the WCF binding has to be BasicHttpBinding )
varying the serviceThrottling maxConcurrentCalls and maxConcurrentSessions parameters in the WCF configuration
using a semaphore to limit the number of concurrent requests inside the Wcf service
implementing the methods inside the wcf service as Tasks ( .net version is 4.5 )
making the methods async tasks
tuning the size of the ThreadPool using setMinThreads
using a custom attribute to extend WCF to implement a custom ThreadPool as per this Msdn article ( http://msdn.microsoft.com/en-us/magazine/cc163321.aspx )
I have found that running tests and varying parameters can be used to tune the application performance, but we still have the problem that performance results are poorer and more inconsistent when we run say 100 concurrent client threads repeating the test steps 1 time.
My question is : what are the best ways of tuning a c# WCF service so it will respond well to a sudden burst of client activity ?
Thanks.
Decided to post as answer instead so here the good thing to do :
1 - Check concurency connection. Sometime small server might be limited between 2 and 50 which is very low. your server admin should know what to do.
2 - Load balancing with WCF is possible and help alot when split over multiple servers.
3 - Have the IIS host server ONLY doing IIS work. i.e dont have SQL running on it either
4 - Do not open WCF service connection, query, close connection every single request. an handshake is needed every single time and over time with multiple user it become alot of time lost because of that. Instead open the connection once when the application start and close on exit (or error obviously)
5 - Use smaller type inside the service. Try avoiding type such as decimal, int64. Decimal is 128 bits and int64 is 64 bits and perform much slower than float/double/int. Obviously if you absolutely need to use them use them but try to limit.
6 - A single big method make the overall timing slower for everyone as the waiting line grow faster and slow IIS and might loose new connections if you have alot of user because of timeout. But smaller methods will take longer for everyone because of the extra back and forth of data BUT user will see more progress that way and will feel the software is faster even though it is really not.
BasicHttpbinding works great in anycase
Related
I'm working on widening the capabilities of an already existing Selenium-based web application testing system. The system currently can only execute tests one at a time. I'd like to increase the system's throughput by adding some parallelism to test execution.
To that end I've been looking into different technologies and am interested in Microsoft's Orleans. However, after spending a day with Orlean's documentation I couldn't find if the technology supports the scenario where I need to run a specific scalable actor a maximum of n times on one computer.
In my case it would be the actor executing a test. Typically with Selenium, test execution requires creation of a WebDriver object which then opens and controls a web browser instance. Those web browsers all have to be open as separate processes not just new windows of same process. With no upper limit and lets say a cluster of 3 nodes and a test plan of 60 tests I'd have 20 web browsers opening simultaneously on each machine, which would likely lead to major decrease of performance or a complete crash. I'd expect the system to have an upper limit of 3 test executing actors per machine and store the rest of tests in a queue, running them only after another worker actor finishes it's test.
How would I go about implementing that with the Orleans technology?
For Orleans, the right solution is NOT to muck around with MaxActiveThreads. You should not really be changing that, almost ever. Instead, you can either just have 9 regular grains total, use them by just having 9 different grain ids (0...8) and the system will spread them randomly across the 3 servers and when you send a new test request, just send it to a random grain out of those 9.
Alternatively, you have StatelessWorker grain, just one grain, but set its max activations (per server) to 3, and that way you will have exactly 3 activations per server (9 total) and the system will auto load balance.
http://dotnet.github.io/orleans/Documentation/Advanced-Concepts/StatelessWorker.html
There is no build in persistent queueing in Orleans. You will either have to queue on the client/sender side or send all 60 requests at once, and each grain can store the request in its internal in memory queue and start the next one when the work on the previous one has finished.
Orleans will do most of your requirements out of the box.
It creates so many grains parallel as you have config in the silo (default is cpu cores see MaxActiveThreads).
All other request will be queued by Orleans.
You can configure it in the Orleans config-file:
<Scheduler MaxActiveThreads="3"/>
Your problem: You want to parallelize your test-execution.
The solution is to use the Selenium-hub and node. You don't need Orleans for that.
Ok - here is the scenario:
I host a server application on Amazon AWS hosted windows instances. (I do not have access to the source code - so I cannot resolve the issues from within the applications source code)
These specific instances are able to build up CPU credits during times of idle cpu (less than 10-20% usage) and then spend those CPU credits during times of increased compute requirement.
My server application however, typically runs at around 15-20% cpu usage when no clients are connected- this is time when I would rather lower the cpu usage to around 5% through throttling of the cpu - maintaining enough cpu throughput to accept a TCP Socket from incoming clients.
When a connected client is detected, I would like to remove the throttle and allow full access to the reserve of AWS CPU Credits.
I have got code in place that can Suspend and Resume processes via C# using Windows API calls.
I am however a bit fuzzy on how to accurately attain a target cpu usage for that process.
What I am doing so far, which is having moderate success:
Looping inside another application
check the cpu usage of the server application - using performance counters (dont like these - they require a 100-1000 ms wait in order to return a % value)
I determine if the current value is above or below the target value - if above, I increase an int value called 'sleep' by 10ms
If below - 'sleep' is decreased by 10ms.
Then the application will call
Process.Suspend();
Threads.sleep(sleep);
Process.Resume();
Like I said - this is having moderate success.
But there are several reasons I don't like it:
1. It requires a semi-rapid loop in an external application: This might end up just shifting cpu usage to that application.
2. Im sure there are better mathematical solutions to work out the ideal sleep time.
I came across this application : http://mion.faireal.net/BES/
It seems to do everything I want, except I need to be able to control it, and I am not a c++ developer.
It also seems to be able to achieve accurate cpu throttling without consuming large cpu utself.
Can someone suggest CPU throttle techniques.
Remember - I cannot modify the source code of the application being throttled - at most, I could inject code into it: but it occurs to me that if I inject suspend code into it, then the resume code could not fire etc.
An external agent program might be the best way to go.
I am creating a windows application (using windows form application) which calls the web service to fetch data. In data, I have to fetch 200+ clients information and for each client, I have to fetch all users information. A client can have 50 to 100 users. So, I am calling web service in a loop (after getting all clients list) for each client to fetch the users listing. This is a long process. I want to reduce the execution time for this whole process. So, please suggest me which approach can help in reducing the execution time which is currently up to 40-50 mins for one time data fetch. Let me know any solution like multithreading or any thing else, whichever is best suited to my application.
Thanks in advance.
If you are in control of the web service, have a method that returns all the clients at once instead of 1 by one to avoid rountrips as Michael suggested.
If not, make sure to make as many requests at the same time (not in sequence) to avoid as much laterncy as possible. For each request you will have at least 1 rountrip (so at least your ping's Worth of delay), if you make 150 requests then you'll get your ping to the server X 150 Worth of "just waiting on the network". If you split those requests in 4 bunches, and do each of these bunches in parallel, then you'll only wait 150/4*ping time. So the more requests you do concurrently, the least you wait.
I suggest you to avoid calling the service in a loop for every user to get the details, but instead do that loop in the server and return all the data in one-shot, otherwise you will suffer of a lot of useless latencies caused by the thousand of calls, and not just because of the server time or data-transferring time.
This is also a pattern, called Remote Facade or Facade Pattern explained by Martin Fowler and the Gang of Four:
any object that's intended to be used as a remote objects needs a coarse-grained interface that minimizes the number of calls needed to get some-thing done [...] Rather than ask for an order and its order lines individually, you need to access and update the order and order lines in a single call.
In case you're not in control of the web service, you could try to use a Parallel.ForEach loop instead of a ForEach loop to query the web service.
The MSDN has a tutorial on how to use it: http://msdn.microsoft.com/en-us/library/dd460720(v=vs.110).aspx
I have a nice fast task scheduling component (windows service as it happens but this is irrelevant), it subscribes to an in memory queue of things to do.
The queue is populated really fast ... and when I say fast I mean fast ... so fast that I'm experiencing problems with some particular part.
Each item in the queue gets a "category" attached to it and then is passed to a WCf endpoint to be processed then saved in a remote db.
This is presenting a bit of a problem.
The "queue" can be processed in the millions of items per minute whereas the WCF endpoint will only realistically handle about 1000 to 1200 items per second and many of those are "stacked" in order to wait for a slot to dump them to the db.
My WCF client is configured so that the call is fire and forget (deliberate) my problem is that when the call is made occasionally a timeout occurs and thats when the headaches begin.
The thread just seems to stop after timeout no dropping in to my catch block nothing ... just sits there, whats even more confusing is that this is an intermittent thing, this only happens when the queue is dealing with extreme loads and the WCF endpoint is over taxed, and even in that scenario it's only about once a fortnight this happens.
This code is constantly running on the server, round the clock 24/7.
So ... my question ...
How can I identify the edge case that is causing my problem so that I can resolve it?
Some extra info:
The client calling the WCF endpoint seems to automatically "throttle itself" by the fact that i'm limiting the number of threads making calls, and the code hangs about until a call is considered complete (i'm thinking this is a http level thing as im not asking the service for a result of my method call).
The db is talked to with EF which seems to never open more than a fixed number of connections to the db (quite a low number too which is cool) and the WCF endpoint from the call reception back seems super reliable.
The problem seems to be coming off the queue processor to the WCf endpoint.
The queue processor has a single instance of my WCF endpoint client which it reuses for all calls ... (is it good practice to rebuild this endpoint per call? - bear in mind number of calls here).
Final note:
It's a peculiar "module" of functionality, under heavy load for hours at a time it's stable, but for some reason this odd thing happens resulting in the whole lot just stopping and not recovering. The call is wrapped in a try catch, but seemingly even if the catch is hit (which isn't guaranteed) the code doesn't recover / drop out as expected ... it just hangs.
Any ideas?
Please let me know if there's anything else I can add to help resolve this.
Edit 1:
binding - basicHttpBinding
error handling - no code written other than wrapping the WCF call in a try catch.
Seemingly my solution appears to be to increase the timeout settings on the client config to allow the server more time to respond.
The net result being that whilst the database is busy saving data (effectively the slowest part of this process) the calling client sits and waits (on all threads but seemingly not as long as i would have liked).
This issue seems to be the net result of a lot of multithreaded calls to the WCF and not giving it enough time to respond.
The high load is not conintuous, the service usage seems to spike then tail off, adding to the expected response time allows spikes to be filtered through as they happen.
A key note:
Way too many calls will result in the server / service treating them as a dos type attack and as such may simply terminate the connection.
This isn't what I'm getting, but some fine tuning and time may result in this ...
Time for some bigger servers !!!
I was just wondering what will have the best performance.
Lets say we have 3 physical servers, where each server has 32cores and 64gb ram, and the application is a "standard" asp.net application. Load balancing is already in place.
Setup 1# - One applicaiton consumes all
- One IIS server with 1 application running on each physical server. (total of 3 application "endpoints")
Setup 2# - Shared resources
- One IIS server with 16 applications in a webfarm. (total of 48 application "endpoints")
Setup 3# - Virtualization
Virtualization: 15 virtual servers (total of 45 application endpoints)
What would have the best performance, and why?
It depends! Much depends on what the application is doing and where it spends its time.
In broad terms, though:
If an application is compute-bound -- i.e. the time taken to retrieve data from an external source such as a database is limited -- then in most cases setup #1 will likely be fastest. IIS is itself highly multi-threaded and giving it control of the machine's resources will allow it to self-tune.
If the application is data-bound -- i.e. more than (say) 40% of the time taken for each request is spent getting and waiting for data -- then setup #2 may be better. This is especially the case for less-well-written applications that do synchronous in-process databases accesses: even if a thread is sitting around waiting for database access to complete it's still consuming resources.
As discussed here: How to increase thread-pool threads on IIS 7.0 you'll run out of thread pool threads eventually. However, as discussed on MSDN here: http://blogs.msdn.com/b/david.wang/archive/2006/03/14/thoughts-on-application-pools-running-out-of-threads.aspx by creating multiple IIS worker processes you're really just papering over the cracks of larger underlying issues.
Unless there's other reasons -- such as manageability -- I'd not recommend setup #3 as the overhead of managing additional operating systems in entire virtual machines is quite considerable.
So: monitor your system, use something like the MiniProfiler (http://code.google.com/p/mvc-mini-profiler/) to figure out where the issues in the code lie, and use asynchronous non-blocking calls whenever you can.
It really depends on your application, you have to design for each architecture and performance test your setups. Some applications will run fast on setup 1 and not on the other setups and the other way around. There are many more things you can optimize on performance in iis. The key thing is you design you application for monitoring and scaling.