Parallelization Considerations

Parallelization Considerations - c#

I want to get the community's perspective on this. If I have a process which is heavily DB/IO bound, how smart would it be to parallelize individual process paths using the Task Parallel library?
I'll use an example ... if I have a bunch of items, and I need to do the following operations
Query a DB for a list of items
Do some aggregation operations to group certain items based on a dynamic list of parameters.
For each grouped result, Query the database for something based on the aggregated result.
For each grouped result, Do some numeric calculations (3 and 4 would happen sequentially).
Do some inserts and updates for the result calculated in #3
Do some inserts and updates for each item returned in #1
Logically speaking, I can parallelize into a graph of tasks at steps #3, #5, #6 as one item has no bearing on the result of the previous. However, each of these will be waiting on the database (sql server) which is fine and I understand that we can only process as far as the SQL server will let us.
But I want to logically distribute the task on the local machine so that it processes as fast as the Database lets us without having to wait for anything on our end. I've done some mock prototype where I substitute the db calls with Thread.Sleeps (I also tried some variations with .SpinWait, which was a million times faster), and the parallel version is waaaaay faster than the current implementation which is completely serial and not parallel at all.
What I'm afraid of is putting too much strain on the SQL server ... are there any considerations I should consider before I go too far down this path?

If the parallel version is much faster than the serial version, I would not worry about the strain on your SQL server...unless of course the tasks you are performing are low priority compared to some other significant or time-critical operations that are also performed on the DB server.
Your description of tasks is not well understood by me, but it almost sounds like more of those tasks should have been performed directly in the database (I presume there are details that make that not possible?)

Another option would be to create a pipeline so that step 3 for the second group happening at the same time as step 4 for the first group. And if you can overlap the updates at step 5, do that too. That way you're doing concurrent SQL accesses and processing, but not over-taxing the database because you only have two concurrent operations going on at once.
So you do steps 1 and 2 sequentially (I presume) to get a collection of groups that require further processing. Then. your main thread starts:
for each group
query the database
place the results of the query into the calc queue
A second thread services the results queue:
while not end of data
Dequeue result from calc queue
Do numeric calculations
place the results of the query into the update queue
A third thread services the update queue:
while not end of data
Dequeue result from update queue
Update database
The System.Collections.Concurrent.BlockingCollection<T> is a very effective queue for this kind of thing.
The nice thing here is that if you can scale it if you want by adding multiple calculation threads or query/update threads if the SQL Server can handle more concurrent transactions.
I use something very similar to this in a daily merge/update program, with very good results. That particular process doesn't use SQL server, but rather standard file I/O, but the concepts translate very well.

Related

Improve performance of event sourcing projections to RDBMS (SQL) via .NET

I'm currently working on a prototype in C# that utilises CQRS and event sourcing and I've hit a performance bottleneck in my projections to an SQL database.
My first prototype was built with Entity Framework 6, code first. This choice was made primarily to get going and because the read side would benefit from LINQ.
Every (applicable) event is consumed by multiple projections, which either create or update the corresponding entity.
Such a projection currently look like this:
public async Task HandleAsync(ItemPlacedIntoStock #event)
{
var bookingList = new BookingList();
bookingList.Date = #event.Date;
bookingList.DeltaItemQuantity = #event.Quantity;
bookingList.IncomingItemQuantity = #event.Quantity;
bookingList.OutgoingItemQuantity = 0;
bookingList.Item = #event.Item;
bookingList.Location = #event.Location;
bookingList.Warehouse = #event.Warehouse;
using (var repository = new BookingListRepository())
{
repository.Add(bookingList);
await repository.Save();
}
}
This isn't very well performing, most likely for the reason that I call DbContext.SaveChanges() in the IRepository.Save() method. One for each event.
What options should I explore next? I don't want to spent days chasing ideas that might prove to be only marginally better.
I currently see the following options:
Stick with EF, but batch process the events (i.e. new/save context every X number of events) as long as the projection is running behind.
Try to do more low-level SQL, for example with ADO.NET.
Don't use SQL to store the projections (i.e. use NoSQL)
I expect to see millions of events because we plan to source a large legacy application and migrate data in the form of events. New projections will also be added often enough so the processing speed is an actual issue.
Benchmarks:
The current solution (EF, save after every event) processes ~200 events per second (per projection). It does not scale directly with the number of active projections (i.e. N projections process less than N * 200 events/second).
When the projections aren't saving the context, the number of events/second increases marginally (less than double)
When the projections don't do anything (single return statement), the processing speed of my prototype pipeline is ~30.000 events/second globally
Updated benchmarks
Single-threaded inserts via ADO.NET TableAdapter (new DataSet and new TableAdapter on each iteration): ~2.500 inserts/second. Did not test with projection pipeline but standalone
Single-threaded inserts via ADO.NET TableAdapter that does not SELECT after inserting: ~3.000 inserts/second
Single-threaded ADO.NET TableAdapter batch-insert of 10.000 rows (single dataset, 10.000 rows in-memory): >10.000 inserts/second (my sample size and window was too small)

I've seen performance improvements of several orders of magnitude, even with Entity Framework, when batching the commits and improving my overall projection engine.
Each projection is a separate subscription on the Event Store. This allows each projection to run at its maximum speed. Theoretical maximum of my pipeline on my machine was 40.000 events per second (possibly more, I ran out of events to sample with)
Each projection maintains a queue of events and deserialises the json to POCOs. Multiple deserialisations per projection run in parallel. Also switched to json.net from data contract serialisation.
Each projection supports the notion of a unit of work. The unit of work is committed after processing 1000 events or if the deserialisation-queue is empty (i.e. I am either at the head position or experienced a buffer underrun). This means that a projection commits more often if it is only a few events behind.
Made use of async TPL processing with interleaving of fetching, queueing, processing, tracking and committing.
This was achieved by using the following technologies and tools:
The ordered, queued and parallel deserialisation into POCOs is done via a TPL DataFlow TransformBlock with a BoundedCapacity somewhere over 100. Maximum degree of parallelism was Environment.ProcessorCount (i.e. 4 or 8). I saw a massive increase in performance with a queue size of 100-200 vs. 10: from 200-300 events to 10.000 events per second. This most likely means that a buffer of 10 was causing too many underruns and thus committed the unit of work too often.
Processing is dispatched asynchronously from a linked ActionBlock
Each time an event is deserialised, I increment a counter for pending events
Each time an event is processed, I increment a counter for processed events
The unit of work is committed after 1000 processed events, or whenever the deserialisation buffer runs out (number of pending events = number of processed events). I reduce both counters by the number of processed events. I don't reset them to 0 because other threads might have increased the number of pending events.
The values of a batch size of 1000 events and queue size of 200 are the result of experimentation. This also shows further options for improvement by tweaking these values for each projection independently. A projection that adds a new row for every event slows down considerably when using a batch size of 10.000 - while other projections that merely update a few entities benefit from a larger batch size.
The deserialisation queue size is also vital for good performance.
So, TL;DR:
Entity framework is fast enough to handle up to 10.000 modifications per second - on parallel threads, each.
Utilise your unit of work and avoid committing every single change - especially in CQRS, where the projection is the only thread making any changes to the data.
Properly interleave parallel tasks, don't just blindly async everything.

As the author of Projac, I suggest you have a look at what it has to offer, and steal what feels appropriate. I built it specifically because LINQ/EF are poor choices on the read model/projection side ...

Saving one record at a time to SQL Server is always going to be poorly performing. You have two options;
Table Variable Parameters
Use a table variable to save multiple records to a stored procedure in a single call
ADO Bulk Copy
Use the Bulk Insert ADO library to bulk copy the data in
Neither of which will benefit from being in EF apart from connection handling.
I would do neither if your data is simple key-value pairs; using an RDBMS is probably not a good fit. Probably Mongo\Raven or other flat data store would be better performing.

SSAS Processing via c# (Microsoft.AnalysisServices) Much Slower than SSMS

I am processing my SSAS Cube programmatically. I process the dimensions in parallel (I manage the parallel calls to .Process() myself) and once they're all finished, I process the measure group partitions in parallel (again managing the parallelism myself).
As far as I can see, this is a direct replication of what I would otherwise do in SSMS (same process types etc.) The only difference I can see is that I'm processing ALL of the dimensions in parallel and ALL of the measure group partitions in parallel thereafter. If you right click process several objects within SSMS, it appears to only process 2 in parallel at any one time (inferred from the text that indicates process has not started in all processing windows other than 2). But if anything, I would expect my code to be faster, not slower than SSMS.
I have wrapped the processing action with "starting" and "finishing" debug messages and everything is as expected. It is the work done by .Process() that seems to be much slower than SSMS.
On a Cube that normally takes just under 1 hour to process, it is taking 7.5 hours.
On a cube that normally takes just under 3 minutes to process, it is taking 6.5 minutes.
As far as I can tell, the processing of dimensions is about the same but the measure groups are significantly slower. However, the latter are much much larger of course so it might just be that the difference is not as obvious to me.
I'm at a loss for ideas and would appreciate any help! Am I missing a setting? Is managing the parallelism myself and processing multiple in parallel as opposed to 2 causing a problem?

If you can provide your code I'm happy to look but my guess is that you are calling dimension.Process() in parallel threads expecting it to process in parallel on the server. It won't. It will process in serial due to locking because you are executing separate processing batches and separate transactions.
Any reason not to process everything (rather than incrementally processing just recent partitions or something)? Let's start simple and see if this is all you need. Can you get the database object and just do a ProcessFull? That will properly process in parallel all dimensions and measure groups.
database.Process(ProcessType.ProcessFull)
If you do need incremental processing then review this link for using ExecuteCaptureLog(true,true) to run multiple ProcessUpdate commands in parallel and in a transaction:
https://jesseorosz.wordpress.com/2006/11/20/how-to-process-dimensions-in-parallel-using-amo/
I would recommend including the partitions you want to process in that transactional batch. It will know the right dependencies automatically. Also make sure to include a ProcessIndexes on the cube object in that batch so flexible aggs and indexes on old partitions get rebuilt after the dimension ProcessUpdate.

Best way to execute oracle SQL statements in parallel

I have asked a similar query before but now I would appreciate specifics. I have 5-11 SQL that need ran in a C# .NET 4.5 web application, currently they are done sequentially, which results in slow response times.
Talking to various architects/DBA they all tell me this can be improved by running the queries in parallel, but never give the specifics of how, when I ask they become very vague ;0)
Is there some function available in Oracle that I could call to pass queries to run in parallel?
Or I have been looking into ASYNC/AWAIT functionality, however the examples on the web are confusing (most involve returning control to the UI, then updating some text on the screen when the task finally completes), I would like to know how to call several methods for them to execute their SQL in parallel and then wait for all of them to complete before proceeding.
If anyone could point me in the direction of good documentation or provide specific examples I would appreciate it!!!!
Updated with sample code, could someone point out how to update this to async to wait for all the various calls to complete:
private CDTInspection GetDetailsInner(CDTInspection tInspection)
{
//Call Method one to get data
tInspection = Method1(tInspection);
//Call Method two to get data
Method2(tInspection);
//Call Method three to get data
Method3(tInspection);
//Call Method four to get data
Method4(tInspection);
return tInspection;
}
private void method2(CDTInspection tInspection)
{
//Create the parameter list
//Execute the query
//MarshalResults
}

You can create jobs using DBMS_SCHEDULER to run independently. Read more from the documentation about DBMS_SCHEDULER.
For example, you could run jobs in parallel as:
BEGIN
DBMS_SCHEDULER.RUN_JOB('pkg1.proc1', false);
DBMS_SCHEDULER.RUN_JOB('pkg2.proc2', false);
DBMS_SCHEDULER.RUN_JOB('pkg3.proc3', false);
END;
/

If you would like to run your 5-11 queries in parallel within your application you will have to start multiple threads and execute the queries within the threads in parallel.
However, if you want the database to execute a query in parallel on the database server(s), usually useful if the query is long running and you want to speed up the individual query execution time, then you can use Parallel Execution.
Parallel execution benefits systems with all of the following characteristics:
Symmetric multiprocessors (SMPs), clusters, or massively parallel systems
Sufficient I/O bandwidth
Underutilized or intermittently used CPUs (for example, systems where CPU usage is typically less than 30%)
Sufficient memory to support additional memory-intensive processes, such as sorting, hashing, and I/O buffers
The easiest way to implement parallel execution is via a hint:
SELECT /*+ PARALLEL */ col1, col2, col3 FROM mytable;
However, this might not be the best way as it would change your query and has other downsides (like what if you want to deactivate parallelism again, you would have to change the query again). Another way is to specify on table level:
ALTER TABLE mytable PARALLEL;
That would allow to simply deactivate parallel execution again if it is not wanted anymore without changing the query itself.

Task Scheduling / Load Balance Pattern

I've run into this a few times recently at work. Where we have to develop an application that completes a series of items on a schedule, sometimes this schedule is configurable by the end user, other times its set in Config File. Either way, this task is something that should only be executed once, by a single machine. This isnt generally difficult, until you introduce the need for SOA/Geo Redundancy. In this particular case there are a total of 4 (could be 400) instances of this application running. There are two in each data center on opposite sides of the US.
I'm investigating successful patterns for this sort of thing. My current solution has each physical location determining if it should be active or dormant. We do this by checking a Session object that is maintained to another server. If DataCenter A is the live setup, then the logic auto-magically prevents the instances in DataCenter B from performing any execution. (We dont want the work to traverse the MPLS between DCs)
The two remaining instances in DC A will then query the Database for any jobs that need to be executed in the next 3 hours and cache them. A separate timer runs every second checking for jobs that need executed.
If it finds one it will execute a stored procedure first, that forces a full table lock, queries for the job that needs to be executed, checks the "StartedByInstance" Column for a value, if it doesnt find a value then it marks that record as being executed by InstanceX. Only then will it actually execute the job.
My direct questions are:
Is this a good pattern?
Are there any better patterns?
Are there any libraries/apis that would be of interest?
Thanks!

Parallel processing of database queue

There is small system, where a database table as queue on MSSQL 2005. Several applications are writing to this table, and one application is reading and processing in a FIFO manner.
I have to make it a little bit more advanced to be able to create a distributed system, where several processing application can run. The result should be that 2-10 processing application should be able to run and they should not interfere each other during work.
My idea is to extend the queue table with a row showing that a process is already working on it. The processing application will first update the table with it's idetifyer, and then asks for the updated records.
So something like this:
start transaction
update top(10) queue set processing = 'myid' where processing is null
select * from processing where processing = 'myid'
end transaction
After processing, it sets the processing column of the table to something else, like 'done', or whatever.
I have three questions about this approach.
First: can this work in this form?
Second: if it is working, is it effective? Do you have any other ideas to create such a distribution?
Third: In MSSQL the locking is row based, but after an amount of rows are locked, the lock is extended to the whole table. So the second application cannot access it, until the first application does not release the transaction. How big can be the selection (top x) in order to not lock the whole table, only create row locks?

This will work, but you'll probably find you'll run into blocking or deadlocks where multiple processes try and read/update the same data. I wrote a procedure to do exactly this for one of our systems which uses some interesting locking semantics to ensure this type of thing runs with no blocking or deadlocks, described here.

This approach looks reasonable to me, and is similar to one I have used in the past - successfully.
Also, the row/ table will only be locked while the update and select operations take place, so I doubt the row vs table question is really a major consideration.
Unless the processing overhead of your app is so low as to be negligible, I'd keep the "top" value low - perhaps just 1. Of course that entirely depends on the details of your app.
Having said all that, I'm not a DBA, and so will also be interested in any more expert answers

In regards to your question about locking. You can use a locking hint to force it to lock only rows
update mytable with (rowlock) set x=y where a=b

Biggest problem with this approach is that you increase the number of 'updates' to the table. Try this with just one process consuming (update + delete) and others inserting data in the table and you will find that at around a million records, it starts to crumble.
I would rather have one consumer for the DB and use message queues to deliver processing data to other consumers.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.