I'm working on widening the capabilities of an already existing Selenium-based web application testing system. The system currently can only execute tests one at a time. I'd like to increase the system's throughput by adding some parallelism to test execution.
To that end I've been looking into different technologies and am interested in Microsoft's Orleans. However, after spending a day with Orlean's documentation I couldn't find if the technology supports the scenario where I need to run a specific scalable actor a maximum of n times on one computer.
In my case it would be the actor executing a test. Typically with Selenium, test execution requires creation of a WebDriver object which then opens and controls a web browser instance. Those web browsers all have to be open as separate processes not just new windows of same process. With no upper limit and lets say a cluster of 3 nodes and a test plan of 60 tests I'd have 20 web browsers opening simultaneously on each machine, which would likely lead to major decrease of performance or a complete crash. I'd expect the system to have an upper limit of 3 test executing actors per machine and store the rest of tests in a queue, running them only after another worker actor finishes it's test.
How would I go about implementing that with the Orleans technology?
For Orleans, the right solution is NOT to muck around with MaxActiveThreads. You should not really be changing that, almost ever. Instead, you can either just have 9 regular grains total, use them by just having 9 different grain ids (0...8) and the system will spread them randomly across the 3 servers and when you send a new test request, just send it to a random grain out of those 9.
Alternatively, you have StatelessWorker grain, just one grain, but set its max activations (per server) to 3, and that way you will have exactly 3 activations per server (9 total) and the system will auto load balance.
http://dotnet.github.io/orleans/Documentation/Advanced-Concepts/StatelessWorker.html
There is no build in persistent queueing in Orleans. You will either have to queue on the client/sender side or send all 60 requests at once, and each grain can store the request in its internal in memory queue and start the next one when the work on the previous one has finished.
Orleans will do most of your requirements out of the box.
It creates so many grains parallel as you have config in the silo (default is cpu cores see MaxActiveThreads).
All other request will be queued by Orleans.
You can configure it in the Orleans config-file:
<Scheduler MaxActiveThreads="3"/>
Your problem: You want to parallelize your test-execution.
The solution is to use the Selenium-hub and node. You don't need Orleans for that.
Related
I have 2 WebJobs in my WebApp. Let's call them WJ1 and WJ2, both WebJobs are triggered by queues (Q1 and Q2 respectively). Since Q2 gets a lot of messages and I would like to process them quickly, I scale out my WebApp to 10 instances (it only runs with 1 instance at times when no messages are expected).
The problem is that since Q2 receives overwhelmingly more messages than Q1, WJ2 takes all the resources and the processing of Q1 lags behind its expected schedule. Would there be a way of assigning WJ1 a higher priority than WJ2, so that whenever there is a message in Q1 it will be process before taking any more from Q2?
Both WebJobs are separate (and both have 2 functions each, triggered by the afore mentioned queues and timers) and can be started and stopped independently, if that is of any help.
Also, WJ1 would be happy with just one instance, since the messages are expected to be processed one after the other.
I have read and thought about splitting the WebJobs in 2 different WebApps and limit the WebApp2 to 9 of the 10 available instances to run WJ2. However, I don't like that option because WJ2 takes 3 hours or so to finish its burst of messages and WJ1 should be done within an hour or less, so it makes no sense to prevent WJ2 to take all the available capacity once WJ1 is done.
Also, if WJ1 was a singleton, would it then have more prio? (Probably not, but worth asking).
Thanks!
Simply scaling the compute resources might not be sufficient to
prevent loss of performance under load. You might also need to scale
storage queues and other resources to prevent a single point of the
overall processing chain from becoming a bottleneck. Also, consider
other limitations, such as the maximum throughput of storage and other
services that the application and the background tasks rely on.
Pls go through Scaling and performance considerations of Web Jobs.
Secondly, you should try to keep your webjobs smaller - 3 hours and 1 hour are very huge timespans.
It means that your WJ2 might acquire all the resources for 3 hours! It requires in depth performance review of the web job and identify if any/how many resource(s) is/are being held by the webjob unnecessarily. Smaller WebJobs will allow different instances to scale independently as per their own usage and free resources asap.
Suppose two machines are running the same code, but you want to offset the timing of the code being run so that there's no possibility of their not running simultaneously, and by simultaneously I mean not running within 5 seconds of each other.
One could generate a random number of seconds prior to the start of the running code, but that may generate the same number.
Is there an algorithm to independently guarantee different random numbers?
In order to guarantee that the apps don't run at the same time, you need some sort of communication between the two. This could be as simple as someone setting a configuration value to run at a specific time (or delay by a set amount of seconds if you can guarantee they will start at the same time). Or it might require calling into a database (or similar) to determine when it is going to start.
It sounds like you're looking for a scheduler. You'd want a third service (the scheduler) which maintains when applications are supposed to/allowed to start. I would avoid having the applications talk directly to each other, as this will become a nightmare as your requirements become more complex (a third computer gets added, another program has to follow similar scheduling rules, etc.).
Have the programs send something unique (the MAC address of the machine, a GUID that only gets generated once and stored in a config file, etc.) to the scheduling service, and have it respond with how many seconds (if any) that program has to wait to begin its main execution loop. Or better yet, give the scheduler permissions on both machines to run the program at specified times.
You can't do this in pure isolation though - let's say that you have one program uniquely decide to wait 5 seconds, and the other wait 7 seconds - but what happens when the counter for program 2 is started 2 seconds before program 1? A scheduler can take care of that for you.
As pointed in comments/other answer true random can't really provide any guarantees of falling into particular ranges when running in parallel independently.
Assuming your goal is to not run multiple processes at the same time you can force each machine to pick different time slot to run the process.
If you can get consensus between this machines on current time and "index" of machine than you can run your program at selected slots with possible random offset withing time slot.
I.e. use time service to synchronize time (default behavior for most of OS for machines connected to pretty much any network) and pre-assign sequential IDs to machines (and have info on total count). Than let machine with ID to run in time slot like (assuming count < 60, otherwise adjust start time based on count; provide enough time to avoid overlaps when small time drift happens between time synchronization interval)
(start of an hour + (ID*minutes) + random_offset (0,30 seconds))
This way no communications between machines is needed.
Have both app read a local config file, wait the number of seconds specified and then start running.
Put 0 in one, 6+ in the other. They'll not start within 5 seconds of each other. (Adjust the 6+ as necessary to cater for variations in machine loads, speeds etc.)
Not really an algorithm but you could create two arrays of numbers that are completely different and then grab a number from the array (randomly) before the app starts.
What is the penalty for them running at the same time?
The reason I ask is that even if you offset the starting time one could start before the other one is finished. If the data they are processing grows then this gets more likely as time goes on and the 5s rule becomes obsolete.
If they use the same resources then it would be best to use those resources somehow to tell you. E.g. Set a flag in the database, or check if there is enough memory available to run.
I am developing a C# WCF service which calls a backend c# server application.
We are running performance tests on the service.
For example each test might consist of these steps
- Logon
- Create application object ( gets created in Sql database by server application )
- Delete application object
- Logoff
We run the test with 100 concurrent users ( ie unique client threads ) with no ramp up and no user wait time between test steps
We have done quite a bit of optimisation on the server side, so that the tests run quite well when we run them repeatedly - say 100 concurrent threads, each thread repeats the test steps 25 times - results of typically about average 1 second response time for each step in the test which is OK.
However when we run tests with 100 concurrent users but only run the tests once in each thread, the results are inconsistent - sometimes the test steps may take quite a bit longer average elapsed time could be 5 seconds for a step in the test.
It appears that under a sudden burst of activity, the service returns inconsistent performance results.
I have tried several things to optimise performance
( for compatibility with the client the WCF binding has to be BasicHttpBinding )
varying the serviceThrottling maxConcurrentCalls and maxConcurrentSessions parameters in the WCF configuration
using a semaphore to limit the number of concurrent requests inside the Wcf service
implementing the methods inside the wcf service as Tasks ( .net version is 4.5 )
making the methods async tasks
tuning the size of the ThreadPool using setMinThreads
using a custom attribute to extend WCF to implement a custom ThreadPool as per this Msdn article ( http://msdn.microsoft.com/en-us/magazine/cc163321.aspx )
I have found that running tests and varying parameters can be used to tune the application performance, but we still have the problem that performance results are poorer and more inconsistent when we run say 100 concurrent client threads repeating the test steps 1 time.
My question is : what are the best ways of tuning a c# WCF service so it will respond well to a sudden burst of client activity ?
Thanks.
Decided to post as answer instead so here the good thing to do :
1 - Check concurency connection. Sometime small server might be limited between 2 and 50 which is very low. your server admin should know what to do.
2 - Load balancing with WCF is possible and help alot when split over multiple servers.
3 - Have the IIS host server ONLY doing IIS work. i.e dont have SQL running on it either
4 - Do not open WCF service connection, query, close connection every single request. an handshake is needed every single time and over time with multiple user it become alot of time lost because of that. Instead open the connection once when the application start and close on exit (or error obviously)
5 - Use smaller type inside the service. Try avoiding type such as decimal, int64. Decimal is 128 bits and int64 is 64 bits and perform much slower than float/double/int. Obviously if you absolutely need to use them use them but try to limit.
6 - A single big method make the overall timing slower for everyone as the waiting line grow faster and slow IIS and might loose new connections if you have alot of user because of timeout. But smaller methods will take longer for everyone because of the extra back and forth of data BUT user will see more progress that way and will feel the software is faster even though it is really not.
BasicHttpbinding works great in anycase
Original Question
Is there a heuristic or algorithim to programatically find out how many threads i can open in order to obtain maximum throughput of a async operation such as writing on a socket?
Further explained question
I'm assisting a algorithms professor in my college and he posted a assignment where the students are supossed to learn the basics about distributed computing, in his words: Sockets... The assignment is to create a "server" that listens on a given port, receives a string, performs a simple operations on it (i think it's supposed to count it's length) and return Ok or Rejected... The "server" must be able to handle a minimum of 60k submitions per second... My job is to create a little app to simulate 60K clients...
I've managed to automate the distribution of servers and the clients across a university lab in order to test 10 servers at a time (network infrastructure became the bottleneck), the problem here is: A lab is homogeneous, 2 labs are not! If not tunned correctly the "client" usually can't simulate 60k users and report back to me, especially when the lab is a older one, AND i would like to provide the client to the students so they could test their own "server" more reliably... The ability to determine the optimal number of threads to spawn has now become vital! PS: Fire-and-Forget is not a option because the client also tests if the returned value is correct, e.g If i send "Short sentence" i know the result will be "Rejected" and i have to check it...
A class have 60 students... and there's the morning class and the night class, so each week there will be 120 "servers" to test because as the semester moves along the "server" part will have to do more stuff, the client no (it will always only send a string and receive "Ok"/"Rejected")... So there's enough work to be done in order to justify all this work i'm doing...
Edit1
- Changed from Console to a async operation
- I dont want the maximum number of threads, i want the number that will provide maximum throughput! I imagine that on a 6 core pc the number will be higher than on a 2 core pc
Edit2
- I'm building a simple console app to perform some test in another app... one of thouse is a specific kind of load test (RUDY attack) where i have to simulate a lot of clients performing a specific attack... The thing is that there's a curve between throughput and number of threads, where after a given point, opening more threads actually decreases my throughput...
Edit3
Added more context to the initial question...
The Windows console is really meant to be used by more than one thread, otherwise you get interleaved writes. So the thread count for maximum console output would be one.
It's when you're doing computation that multiple threads makes sense. Then, it's rarely useful to use more than one thread per logical processor - or one background thread plus on UI thread for UI apps on a single-core processor.
It depends entirely on the situation - so the actual answer to your question of "is there a magical algorithm that will give me the perfect setup for max throughput?" is ... no.
Sure, more cores means more threads that can run and less context-switching. That said, you've edited your question to include an IO-bound example. IO-bound operations generally make use of completion ports for async operations. So, in that particular case, removing your use of your own dedicated threads for such an operation would be your main concern towards achieving maximum throughput.
Since you changed the question, I'll provide another answer.
It depends on the workload. If you're doing compute-heavy tasks, then use every logical processor. If you're doing IO, then use async calls rather than spawning new threads.
Of course, .NET has a way of managing this for you - the Thread Pool. Use it. Don't worry about how many threads you need, just kick off tasks.
If you are actually trying to do something productive (instead of just printing to the console), you should use System.Threading.Tasks.Task.Factory.StartNew. You can start as many tasks as you want. The runtime will try to distribute them amongst the available hardware threads as well as it can.
There is a multi threaded batch processing program, that creates multiple worker threads to process each batch process.
Now to scale the application to handle 100 million records, we need to use a server farm to do the processing of each batch process. Is there native support on C# for handling requests running on a server farm? Any thoughts on how to setup the C# executable to work with this setup?
You can either create a manager that distributes the work like fejesjoco said or you can make your apps smart enough to only grab a certain number of units of work to process on. When they have completed processing of those units, have them contact the db server to get the next batch. Rinse and repeat until done.
As a side note most distributed worker systems run by:
Work is queued in server by batches
Worker Processes check in with server to get a batch to operate on, the available batch is marked as being processed by that worker.
(optional) Worker Processes check back in with server with status report (ie: 10% done, 20% done, etc)
Worker process completes work and submits results.
Go to step 2.
Another option is to have 3 workers process the exact same data set. This would allow you to compare results. If 2 or more have identical results then you accept those results. If all 3 have different results then you know there is a problem and you need to inspect the data/code. Usually this only happens when the workers are outside of your control (like SETI) or you are running massive calculations and want to correct for potential hardware issues.
Sometimes there is a management app which displays current number of workers and progress with entire set. If you know roughly how long an individual batch takes then you can detect when a worker died and can let a new process get the same batch.
This allows you to add or remove as many individual workers as you want without having to recode anything.
I don't think there's builtin support for clustering. In the most simple case, you might try creating a simple manager application which divides the input among the servers, and your processes will not need to know about each other, so no need to rewrite anything.
Why not deploy the app using a distributed framework? I'd recommend CloudIQ Platform You can use the platform to distribute your code to any number of servers. It also handles the load balancing, so you would only need to submit your jobs to the framework, and it will handle job distribution to the individual machines. It also monitors application execution, so if one of the machines suffers a failure, the jobs running there will be restarted on another machine in the group.
Check out the Community link for downloads, forums, etc.