Improve performance of event sourcing projections to RDBMS (SQL) via .NET

Improve performance of event sourcing projections to RDBMS (SQL) via .NET - c#

I'm currently working on a prototype in C# that utilises CQRS and event sourcing and I've hit a performance bottleneck in my projections to an SQL database.
My first prototype was built with Entity Framework 6, code first. This choice was made primarily to get going and because the read side would benefit from LINQ.
Every (applicable) event is consumed by multiple projections, which either create or update the corresponding entity.
Such a projection currently look like this:
public async Task HandleAsync(ItemPlacedIntoStock #event)
{
var bookingList = new BookingList();
bookingList.Date = #event.Date;
bookingList.DeltaItemQuantity = #event.Quantity;
bookingList.IncomingItemQuantity = #event.Quantity;
bookingList.OutgoingItemQuantity = 0;
bookingList.Item = #event.Item;
bookingList.Location = #event.Location;
bookingList.Warehouse = #event.Warehouse;
using (var repository = new BookingListRepository())
{
repository.Add(bookingList);
await repository.Save();
}
}
This isn't very well performing, most likely for the reason that I call DbContext.SaveChanges() in the IRepository.Save() method. One for each event.
What options should I explore next? I don't want to spent days chasing ideas that might prove to be only marginally better.
I currently see the following options:
Stick with EF, but batch process the events (i.e. new/save context every X number of events) as long as the projection is running behind.
Try to do more low-level SQL, for example with ADO.NET.
Don't use SQL to store the projections (i.e. use NoSQL)
I expect to see millions of events because we plan to source a large legacy application and migrate data in the form of events. New projections will also be added often enough so the processing speed is an actual issue.
Benchmarks:
The current solution (EF, save after every event) processes ~200 events per second (per projection). It does not scale directly with the number of active projections (i.e. N projections process less than N * 200 events/second).
When the projections aren't saving the context, the number of events/second increases marginally (less than double)
When the projections don't do anything (single return statement), the processing speed of my prototype pipeline is ~30.000 events/second globally
Updated benchmarks
Single-threaded inserts via ADO.NET TableAdapter (new DataSet and new TableAdapter on each iteration): ~2.500 inserts/second. Did not test with projection pipeline but standalone
Single-threaded inserts via ADO.NET TableAdapter that does not SELECT after inserting: ~3.000 inserts/second
Single-threaded ADO.NET TableAdapter batch-insert of 10.000 rows (single dataset, 10.000 rows in-memory): >10.000 inserts/second (my sample size and window was too small)

I've seen performance improvements of several orders of magnitude, even with Entity Framework, when batching the commits and improving my overall projection engine.
Each projection is a separate subscription on the Event Store. This allows each projection to run at its maximum speed. Theoretical maximum of my pipeline on my machine was 40.000 events per second (possibly more, I ran out of events to sample with)
Each projection maintains a queue of events and deserialises the json to POCOs. Multiple deserialisations per projection run in parallel. Also switched to json.net from data contract serialisation.
Each projection supports the notion of a unit of work. The unit of work is committed after processing 1000 events or if the deserialisation-queue is empty (i.e. I am either at the head position or experienced a buffer underrun). This means that a projection commits more often if it is only a few events behind.
Made use of async TPL processing with interleaving of fetching, queueing, processing, tracking and committing.
This was achieved by using the following technologies and tools:
The ordered, queued and parallel deserialisation into POCOs is done via a TPL DataFlow TransformBlock with a BoundedCapacity somewhere over 100. Maximum degree of parallelism was Environment.ProcessorCount (i.e. 4 or 8). I saw a massive increase in performance with a queue size of 100-200 vs. 10: from 200-300 events to 10.000 events per second. This most likely means that a buffer of 10 was causing too many underruns and thus committed the unit of work too often.
Processing is dispatched asynchronously from a linked ActionBlock
Each time an event is deserialised, I increment a counter for pending events
Each time an event is processed, I increment a counter for processed events
The unit of work is committed after 1000 processed events, or whenever the deserialisation buffer runs out (number of pending events = number of processed events). I reduce both counters by the number of processed events. I don't reset them to 0 because other threads might have increased the number of pending events.
The values of a batch size of 1000 events and queue size of 200 are the result of experimentation. This also shows further options for improvement by tweaking these values for each projection independently. A projection that adds a new row for every event slows down considerably when using a batch size of 10.000 - while other projections that merely update a few entities benefit from a larger batch size.
The deserialisation queue size is also vital for good performance.
So, TL;DR:
Entity framework is fast enough to handle up to 10.000 modifications per second - on parallel threads, each.
Utilise your unit of work and avoid committing every single change - especially in CQRS, where the projection is the only thread making any changes to the data.
Properly interleave parallel tasks, don't just blindly async everything.

As the author of Projac, I suggest you have a look at what it has to offer, and steal what feels appropriate. I built it specifically because LINQ/EF are poor choices on the read model/projection side ...

Saving one record at a time to SQL Server is always going to be poorly performing. You have two options;
Table Variable Parameters
Use a table variable to save multiple records to a stored procedure in a single call
ADO Bulk Copy
Use the Bulk Insert ADO library to bulk copy the data in
Neither of which will benefit from being in EF apart from connection handling.
I would do neither if your data is simple key-value pairs; using an RDBMS is probably not a good fit. Probably Mongo\Raven or other flat data store would be better performing.

Related

EF.NET Core: Multiple insert streams within one transaction

I have a lot of rows (300k+) to upsert into SQL server database in a shortest possible period of time, so the idea was to use parallelization and partition the data and use async to pump the data into SQL, X threads at the time, 100 rows per context, with context being recycled to minimize tracking overhead. However, that means more than one connection is to be used in parallel and thus CommittableTransaction/TransactionScope would use distributed transaction which would cause parallelized transaction enlistment operation to return the infamous "This platform does not support distributed transactions." exception.
I do need the ability to commit/rollback the entire set of upserts. Its part of the batch upload process and any error should rollback the changes to previously working/stable condition, application wise.
What are my options? Short of using one connection and no parallelization?
Note: Problem is not so simple as a batch of insert commands, if that was the case, I would just generate inserts and run them on server as query or indeed use SqlBulkCopy. About half of them are updates, half are inserts where new keys are generated by SQL Server which need to be obtained and re-keyed on child objects which would be inserted next, rows are spread over about a dozen tables in a 3-level hierarchy.

Nope. Totally wrong approach. Do NOT use EF for that - bulk insert ETL is not what Object Relational Mappers are made for and a lot of design decisions are not productive for that. You would also not use a small car instead of a truck to transport 20 tons of goods.
300k rows are trivial if you use SqlBulkCopy API in some sort.

Processing large resultset with NHibernate

I have following task to do: calculate interest for all active accounts. In the past I was doing things like this using Ado.Net and stored procedures.
This time I've tried to do that with NHibernate, because it seemed that complex algorithms would be done easier with pure POCO.
So I want to do following (pseudocode):
foreach account in accounts
calculate interest
save account with new interest
I'm aware that NHibernate was not designed for processinge large data volumes. For me it is sufficient to have possibility to organize such a loop without having all accounts in memory at once.
To minimize memory usage I would use IStatelessSession for external loop instead of plain ISession.
I've tried approach proposed by Ayende. There are two problems:
CreateQuery is using "magic strings";
more important: it doesn't work as described.
My program works but after switching on Odbc trace I saw in debugger that all fetches were done before lambda expression in .List was executed for the first time.
I've found myself another solution: session.Query returning .AsEnumerable() which I've used in foreach. Again two problems:
I would prefer IQueryOver over IQueryable
still doesn't work as described (all fetches before first interest calculation).
I don't know why but IQueryOver doesn't have AsEnumerable. It also doesn't have List method with argument (like CreateQuery). I've tried .Future but again:
documentation of Future doesn't describe streaming feature
still doesn't work as I need (all fetches before first interest calculation).
In summary: is there any equivalent in NHibernate to dataReader.Read() from Ado.Net?
My best alternative to pure NHibernate approach would be main loop using dataReader.Read() and then Load account with id from Ado.Net loop. However performance will suffer - reading each account via key is slower than sequence of fetches done in outer loop.
I'm using NHibernate version 4.0.0.4000.

While it is true that NH was not designed with large-valume processing in mind you can always circumvent this restriction with application-layer batch processing. I have found that depending on the size of the object graph of the relevant entity, performance will suffer after a certain amount of objects have been loaded into memory (in one small project I could load 100.000 objects and performance would remain acceptable, in an other with only 1500 objects any additional Load() would crawl).
In the past I have used paging to handle batch processing, when IStatelessSession result sets are too poor (as they don't load proxies etc).
So you make a count query in the beginning, make up some arbitrary batch size and then start doing your work on the batch. This way you can neatly avoid the n+1 select problem, assuming that for each batch you explicitly fetch-join everything needed.
The caveat is that for this to work efficiently you will need to evict the processed entities of each batch from the ISession when you are done. And this means that you will have to commit-transaction on each batch. If you can live with multiple flush+commits then this could work for you.
Else you will have to go by the IStatelessSession although there are no lazy queries there. "from Books" means "select * from dbo.Books" or something equivalent and all results are fetched into memory.

Nhibernate large transactions, flushes vs locks

I am having a challenge of maintaining an incredibly large transaction using Nhibernate. So, let us say, I am saving large number of entities. If I do not flush on a transaction N, let us say 10000, then the performance gets killed due to overcrowded Nh Session. If I do flush, I place locks on DB level which in combination with read committed isolation level do affect working application. Also note that in reality I import an entity whose business logic is one of the hearts of the system and on its import around 10 tables are affected. That makes Stateless session a bad idea due to manual maintaining of cascades.
Moving BL to stored procedure is a big challenge due to to reasons:
there is already complicated OO business logic in the domain
classes of application,
duplicated BL will be introduced.
Ideally I would want to Flush session to some file and only then preparation of data is completed, I would like to execute its contents. Is it possible?
Any other suggestions/best practices are more than welcome.

You scenario is a typical ORM batch problem. In general we can say that no ORM is meant to be used for stuff like that. If you want to have high batch processing performance (not everlasting locks and maybe deadlocks) you should not use the ORM to insert 1000s of records.
Instead use native batch inserts which will always be a lot faster. (like SqlBulkCopy for MMSQL)
Anyways, if you want to use nhibernate for this, try to make use of the batch size setting.
Call save or update to all your objects and only call session.Flush once at the end. This will create all your objects in memory...
Depending on the batch size, nhibernate should try to create insert/update batches with this size, meaning you will have lot less roundtrips to the database and therefore fewer locks or at least it shouldn't take that long...
In general, your operations should only lock the database the moment your first insert statement gets executed on the server if you use normal transactions. It might work differently if you work with TransactionScope.
Here are some additional reads of how to improve batch processing.
http://fabiomaulo.blogspot.de/2011/03/nhibernate-32-batching-improvement.html
NHibernate performance insert
http://zvolkov.com/clog/2010/07/16?s=Insert+or+Update+records+in+bulk+with+NHibernate+batching

Effect of Many(!) Small Queries Being Performed

So I am troubleshooting some performance problems on a legacy application, and I have uncovered a pretty specific problem (there may be others).
Essentially, the application is using an object relational mapper to fetch data, but it is doing so in a very inefficient/incorrect way. In effect, it is performing a series of entity graph fetches to fill a datagrid in the UI, and on databinding the grid (it is ASP.Net Webforms) it is doing additional fetches, which lead to other fetches, etc.
The net effect of this is that many, many tiny queries are being performed. Using SQL Profiler shows that a certain page performs over 10,000 queries (to fill a single grid. No query takes over 10ms to complete, and most of them register as 0ms in Profiler. Each query will use and release one connection, and the series of queries would be single-threaded (per http request).
I am very familiar with the ORM, and know exactly how to fix the problem.
My question is: what is the exact effect of having many, many small queries being executed in an application? In what ways does it/can it stress the different components of the system?
For example, what is the effect on the webserver's CPU and memory? Would it flood the connection pool and cause blocking? What would be the impact on the database server's memory, CPU and I/O?
I am looking for relatively general answers, mainly because I want to start monitoring the areas that are likely to be the most affected (I need to measure => fix => re-measure). Concurrent use of the system at peak would likely be around 100-200 users.

It will depend on the database but generally there is a parse phase for each query. If the query has used bind variables it will probably be cached. If not, you wear the hit of a parse and that often means short locks on resources. i.e. BAD. In Oracle, CPU and blocking are much more prevelant at the parse than the execute. SQL Server less so but it's worse at the execute. Obviously doing 10K of anything over a network is going to be a terrible solution, especially x 200 users. Volume I'm sure is fine but that frequency will really highlight all the overhead in comms latency and stuff like that. Connection pools generally are in the hundreds, not tens of thousands, and now you have 10s of thousands of objects all being created, queued, managed, destroyed, garbage collected etc.
But I'm sure you already know all this deep down. Ditch the ORM for this part and write a stored procedure to execute the single query to return your result set. Then put it on the grid.

Parallelization Considerations

I want to get the community's perspective on this. If I have a process which is heavily DB/IO bound, how smart would it be to parallelize individual process paths using the Task Parallel library?
I'll use an example ... if I have a bunch of items, and I need to do the following operations
Query a DB for a list of items
Do some aggregation operations to group certain items based on a dynamic list of parameters.
For each grouped result, Query the database for something based on the aggregated result.
For each grouped result, Do some numeric calculations (3 and 4 would happen sequentially).
Do some inserts and updates for the result calculated in #3
Do some inserts and updates for each item returned in #1
Logically speaking, I can parallelize into a graph of tasks at steps #3, #5, #6 as one item has no bearing on the result of the previous. However, each of these will be waiting on the database (sql server) which is fine and I understand that we can only process as far as the SQL server will let us.
But I want to logically distribute the task on the local machine so that it processes as fast as the Database lets us without having to wait for anything on our end. I've done some mock prototype where I substitute the db calls with Thread.Sleeps (I also tried some variations with .SpinWait, which was a million times faster), and the parallel version is waaaaay faster than the current implementation which is completely serial and not parallel at all.
What I'm afraid of is putting too much strain on the SQL server ... are there any considerations I should consider before I go too far down this path?

If the parallel version is much faster than the serial version, I would not worry about the strain on your SQL server...unless of course the tasks you are performing are low priority compared to some other significant or time-critical operations that are also performed on the DB server.
Your description of tasks is not well understood by me, but it almost sounds like more of those tasks should have been performed directly in the database (I presume there are details that make that not possible?)

Another option would be to create a pipeline so that step 3 for the second group happening at the same time as step 4 for the first group. And if you can overlap the updates at step 5, do that too. That way you're doing concurrent SQL accesses and processing, but not over-taxing the database because you only have two concurrent operations going on at once.
So you do steps 1 and 2 sequentially (I presume) to get a collection of groups that require further processing. Then. your main thread starts:
for each group
query the database
place the results of the query into the calc queue
A second thread services the results queue:
while not end of data
Dequeue result from calc queue
Do numeric calculations
place the results of the query into the update queue
A third thread services the update queue:
while not end of data
Dequeue result from update queue
Update database
The System.Collections.Concurrent.BlockingCollection<T> is a very effective queue for this kind of thing.
The nice thing here is that if you can scale it if you want by adding multiple calculation threads or query/update threads if the SQL Server can handle more concurrent transactions.
I use something very similar to this in a daily merge/update program, with very good results. That particular process doesn't use SQL server, but rather standard file I/O, but the concepts translate very well.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.