There is a multi threaded batch processing program, that creates multiple worker threads to process each batch process.
Now to scale the application to handle 100 million records, we need to use a server farm to do the processing of each batch process. Is there native support on C# for handling requests running on a server farm? Any thoughts on how to setup the C# executable to work with this setup?
You can either create a manager that distributes the work like fejesjoco said or you can make your apps smart enough to only grab a certain number of units of work to process on. When they have completed processing of those units, have them contact the db server to get the next batch. Rinse and repeat until done.
As a side note most distributed worker systems run by:
Work is queued in server by batches
Worker Processes check in with server to get a batch to operate on, the available batch is marked as being processed by that worker.
(optional) Worker Processes check back in with server with status report (ie: 10% done, 20% done, etc)
Worker process completes work and submits results.
Go to step 2.
Another option is to have 3 workers process the exact same data set. This would allow you to compare results. If 2 or more have identical results then you accept those results. If all 3 have different results then you know there is a problem and you need to inspect the data/code. Usually this only happens when the workers are outside of your control (like SETI) or you are running massive calculations and want to correct for potential hardware issues.
Sometimes there is a management app which displays current number of workers and progress with entire set. If you know roughly how long an individual batch takes then you can detect when a worker died and can let a new process get the same batch.
This allows you to add or remove as many individual workers as you want without having to recode anything.
I don't think there's builtin support for clustering. In the most simple case, you might try creating a simple manager application which divides the input among the servers, and your processes will not need to know about each other, so no need to rewrite anything.
Why not deploy the app using a distributed framework? I'd recommend CloudIQ Platform You can use the platform to distribute your code to any number of servers. It also handles the load balancing, so you would only need to submit your jobs to the framework, and it will handle job distribution to the individual machines. It also monitors application execution, so if one of the machines suffers a failure, the jobs running there will be restarted on another machine in the group.
Check out the Community link for downloads, forums, etc.
Related
I'm working on widening the capabilities of an already existing Selenium-based web application testing system. The system currently can only execute tests one at a time. I'd like to increase the system's throughput by adding some parallelism to test execution.
To that end I've been looking into different technologies and am interested in Microsoft's Orleans. However, after spending a day with Orlean's documentation I couldn't find if the technology supports the scenario where I need to run a specific scalable actor a maximum of n times on one computer.
In my case it would be the actor executing a test. Typically with Selenium, test execution requires creation of a WebDriver object which then opens and controls a web browser instance. Those web browsers all have to be open as separate processes not just new windows of same process. With no upper limit and lets say a cluster of 3 nodes and a test plan of 60 tests I'd have 20 web browsers opening simultaneously on each machine, which would likely lead to major decrease of performance or a complete crash. I'd expect the system to have an upper limit of 3 test executing actors per machine and store the rest of tests in a queue, running them only after another worker actor finishes it's test.
How would I go about implementing that with the Orleans technology?
For Orleans, the right solution is NOT to muck around with MaxActiveThreads. You should not really be changing that, almost ever. Instead, you can either just have 9 regular grains total, use them by just having 9 different grain ids (0...8) and the system will spread them randomly across the 3 servers and when you send a new test request, just send it to a random grain out of those 9.
Alternatively, you have StatelessWorker grain, just one grain, but set its max activations (per server) to 3, and that way you will have exactly 3 activations per server (9 total) and the system will auto load balance.
http://dotnet.github.io/orleans/Documentation/Advanced-Concepts/StatelessWorker.html
There is no build in persistent queueing in Orleans. You will either have to queue on the client/sender side or send all 60 requests at once, and each grain can store the request in its internal in memory queue and start the next one when the work on the previous one has finished.
Orleans will do most of your requirements out of the box.
It creates so many grains parallel as you have config in the silo (default is cpu cores see MaxActiveThreads).
All other request will be queued by Orleans.
You can configure it in the Orleans config-file:
<Scheduler MaxActiveThreads="3"/>
Your problem: You want to parallelize your test-execution.
The solution is to use the Selenium-hub and node. You don't need Orleans for that.
I have a pretty simple queue which happens to have heaps of messages on it (by design). Heaps == .. say ... thousands.
Right now I'm playing around with using Azure Web Jobs with a Queue Trigger to process the messages. Works fine.
I'm worried about performance though. Lets assume my method that processes the message takes 1 sec. With so many messages, this all adds up.
I know I can manually POP a number of messages at the same time, then parallel process them .. but I'm not sure how we do this with web jobs?
I'm assuming the solution is to scale out ?? which means I would create 25 instances of the webjob? Or is there a better way where I can trigger on a message but pop 25 or so messages at once, then parallel them myself.
NOTE: Most of the delay is I/O (ie. a REST call to a 3rd party). not CPU.
I'm thinking -> create 25 tasks and await Task.WhenAll(tasks); to process all the data that I get back.
So - what are my options, please?
NOTE #2: If the solution is scale out .. then I also need to make sure that my web job project only has one function in it, right? otherwise all the functions (read: triggers, etc) will also all be scaled out.
Azure WebJobs have a default configuration of processing 16 messages in parallel and this number is configurable. The WebJobs framework internally creates 16 (or the configured MaxDequeueCount) copies of the your function and runs them in parallel.
Moreover, you can launch multiple instances, i.e. scale out, of the Azure App Service/Website hosting the WebJob, subject to maximum of 20 instances. However, scaling out instances (unlike parallel Dequeue above) has a pricing impact so please check on that.
Thus theoretically you could be processing 24*20 = 480 messages in parallel by configuration alone on a standard WebJob function without any custom code.
I am processing my SSAS Cube programmatically. I process the dimensions in parallel (I manage the parallel calls to .Process() myself) and once they're all finished, I process the measure group partitions in parallel (again managing the parallelism myself).
As far as I can see, this is a direct replication of what I would otherwise do in SSMS (same process types etc.) The only difference I can see is that I'm processing ALL of the dimensions in parallel and ALL of the measure group partitions in parallel thereafter. If you right click process several objects within SSMS, it appears to only process 2 in parallel at any one time (inferred from the text that indicates process has not started in all processing windows other than 2). But if anything, I would expect my code to be faster, not slower than SSMS.
I have wrapped the processing action with "starting" and "finishing" debug messages and everything is as expected. It is the work done by .Process() that seems to be much slower than SSMS.
On a Cube that normally takes just under 1 hour to process, it is taking 7.5 hours.
On a cube that normally takes just under 3 minutes to process, it is taking 6.5 minutes.
As far as I can tell, the processing of dimensions is about the same but the measure groups are significantly slower. However, the latter are much much larger of course so it might just be that the difference is not as obvious to me.
I'm at a loss for ideas and would appreciate any help! Am I missing a setting? Is managing the parallelism myself and processing multiple in parallel as opposed to 2 causing a problem?
If you can provide your code I'm happy to look but my guess is that you are calling dimension.Process() in parallel threads expecting it to process in parallel on the server. It won't. It will process in serial due to locking because you are executing separate processing batches and separate transactions.
Any reason not to process everything (rather than incrementally processing just recent partitions or something)? Let's start simple and see if this is all you need. Can you get the database object and just do a ProcessFull? That will properly process in parallel all dimensions and measure groups.
database.Process(ProcessType.ProcessFull)
If you do need incremental processing then review this link for using ExecuteCaptureLog(true,true) to run multiple ProcessUpdate commands in parallel and in a transaction:
https://jesseorosz.wordpress.com/2006/11/20/how-to-process-dimensions-in-parallel-using-amo/
I would recommend including the partitions you want to process in that transactional batch. It will know the right dependencies automatically. Also make sure to include a ProcessIndexes on the cube object in that batch so flexible aggs and indexes on old partitions get rebuilt after the dimension ProcessUpdate.
I currently have a c# console app where multiple instances run at the same time. The app accesses values in a database and processes them. While a row is being processed it becomes flagged so that no other instance attempts to process it at the same time. My question is what is a efficient and graceful way to unflag those values in the event an instance of the program crashes? So if an instance crashed I would only want to unflag those values currently being processed by that instance of the program.
Thanks
The potential solution will depend heavily on how you start the console applications.
In our case, the applications are started based on configuration records in the database. When one of these applications performs a lock, it uses the primary key from the database configuration record to perform the lock.
When the application starts up, the first thing it does is release all locks on the records that it previously locked.
To control all of the child processes, we have a service that uses the information from the configuration tables to start the processes and then keeps an eye on them, restarting them when they fail.
Each of the processes is also responsible for updating a status table in the database with the last time it was available with a maximum allowed delay of 2 minutes (for heavy processing). This status table is used by sysadmins to watch for problems, but it could also be used to manually release locks in case of a repeating failure in a given process.
If you don't have a structured approach like this, it could be very difficult to automatically unlock records unless you have a solid profile of your application performance that would allow you to know that any lock over 5 minutes old is invalid because it should only take, on average, 15 seconds to process a record with a maximum of 2 minutes.
To be able to handle any kind of crash, even power off I would suggest to timestamp records additionally and after some reasonable timeout treat records as unlocked even if they are flagged.
I have an existing application written in c++ that does a number of tasks currently, reading transactiosn from a database for all customers, processing them and writing the results back.
What I want to do is have multiple versions of this running in parallel on separate machines to increase transaction capacity, by assigning a certain subset of customers to each version of the app so that there is no contention or data sharing required, hence no locking or synchronisation.
What I want to do though is have multiple versions running on the same machine aswell as distributed across other machines, so if I have a quad core box, there would be four instances of the application running, each utilising one of the CPU's.
I will be wrapping the c++ code in a .NET c# interface and managing all these processes - local and distributed from a parent c# management service responsible for creating, starting and stopping the processes, aswell as all communication and management between them.
What I want to know is if I create four instances each on a separate background thread on a quad core box, whether or not the CLR and .NET will automatically take care of spreading the load across the four CPUs on each box or whether I need to do something to make use of the parallel processing capability?
If you mean that you will be running your application in four processes on the same box, then it is the operating system (Windows) which controls how these processes are allocated CPU time. If the processes are doing similar work, then generally they will get roughly equal processor time.
But, have you considered using four threads within a single process? Threads are much more lightweight than processes, and you wouldn't then need a separate management service, i.e., you would have one process (with four threads) instead of 5 processes. Do you come from a unix background by any chance?
You can set the process affinity when launching the process via the Process object (or ProcessThread depending on how you are launching the app).
Here is an SO post which covers the subject (I didn't vote to close as a duplicate (yet) because I'm not 100% sure if this is exactly what you are after).