How to remove entities from context concurrently?

How to remove entities from context concurrently? - c#

Assume we have a List of entities to be removed:
var items = this.itemRepository.GetRange(1, 1000).ToList();
Instead of simple loop to delete items, I want to remove them concurrently like this:
items.AsParallel().ForAll(item =>
{
this.itemRepository.Remove(item);
});
this.UnitOfWork.Commit();
How do you suggest me to perform delete like this?

The EF context is not thread safe, so I wouldn't do this.
If you need performance then parallel != faster. Parallel code allows you to do more at the same time, but you tend to find that a single unit of work takes at least the same amount of time, but sometimes longer due to context switching etc, it's just that instead of doing 5 things one at a time, you are doing 5 things 5 at a time. This gets you more effective use of available hardware, and is less about performance and more about scalability and responsiveness.
As of EF 6 it exposes an asynchronous API for async/await support, but this again only allows one "concurrent" action against it. If you want to do multiple asynchronous reads, you will need one context per read action - this then simply puts the onus on the database to manage integrity.

EF6 added a RemoveRange method to the DbSet
This is more efficient than removing a single object at a time because by default Entity Framework calls DetectChanges in the Remove method whereas RemoveRange calls it just once.
However, when you call SaveChanges EF still executes individual delete statements. There is a proposal to Batch CUD in a future version.
References:
Entity Framework 6: The Ninja Edition
RemoveRange - implemented in EF6. (with a bug fixed in 6.1)

Upgrade to EF6 if you are not already using it, then you can do the following:
this.itemRepository.RemoveRange(items);

Related

Best way of dealing with shared state in a real time system in dotnet core background service

I have a background service IHostedService in dotnet core 3.1 that takes requests from 100s of clients(machines in a factory) using sockets (home rolled). My issue is that multiple calls can come in on different threads to the same method on a class which has access to an object (shared state). This is common in the codebase. The requests also have to be processed in the correct order.
The reason that this is not in a database is due to performance reasons (real time system). I know I can use a lock, but I don't want to have locks all over the code base.
What is a standard way to handle this situation. Do you use an in-memory database? In-memory cache? Or do I just have to add locks everywhere?
public class Machine
{
public MachineState {get; set;}
// Gets called by multiple threads from multiple clients
public bool CheckMachineStatus()
{
return MachineState.IsRunning;
}
// Gets called by multiple threads from multiple clients
public void SetMachineStatus()
{
MachineState = Stopped;
}
}
Update
Here's an example. I have a console app that talks to a machine via sockets, for weighing products. When the console app initializes it will load data into memory (information about the products being weighed). All of this is done on the main thread, to keep data integrity.
When a call comes in from the weigh-er on Thread 1, it will get switched to the main thread to access the product information, and to finish any other work like raising events for other parts of the system.
Currently this switching from Thread 1,2, ...N to the main thread is done by a home rolled solution, and was done to avoid having locking code all over the code base. This was written in .Net 1.1 and since moving to dotnet core 3.1. I thought there might be a framework, library, tool, technique etc that might handle this for us, or just a better way.
This is an existing system that I'm still learning. Hope this makes sense.

Using an in-memory database is an option, as long as you are willing to delegate all concurrency-inducing situations to the database, and do nothing using code. For example if you must update a value in the database depending on some condition, then the condition should be checked by the database, not by your own code.
Adding locks everywhere is also an option, that will almost certainly lead to unmaintanable code quite quickly. The code will probably be riddled with hidden bugs from the get-go, bugs that you will discover one by one over time, usually under the most unfortunate of circumstances.
You must realize that you are dealing with a difficult problem, with no magic solutions available. Managing shared state in a multithreaded application has always been a source of pain.
My suggestion is to encapsulate all this complexity inside thread-safe classes, that the rest of your application can safely invoke. How you make these classes thread-safe depends on the situation.
Using locks is the most flexible option, but not always the most efficient because it has the potential of creating contention.
Using thread-safe collections, like the ConcurrentDictionary for example, is less flexible because the thread-safety guarantees they offer are limited to the integrity of their internal state. If for example you must update one collection based on a condition obtained from another collection, then the whole operation can not be made atomic by just using thread-safety collections. On the other hand these collections offer better performance than the simple locks.
Using immutable collections, like the ImmutableQueue for example, is another interesting option. They are less efficient both memory and CPU wise than the concurrent collections (adding/removing is in many cases O(Log n) instead of O(1)), and not more flexible than them, but they are very efficient specifically at providing snapshots of actively processed data. For updating atomically an immutable collection, there is the handy ImmutableInterlocked.Update method available. It updates a reference of an immutable collection with an updated version of the same collection, without using locks. In case of contention with other threads it may invoke the supplied transformation multiple times, until it wins the race.

Multiple simultaneous SaveChangesAsync() with EntityFramework

I have a question guys, what would happen if I (using Entity Framework) call SaveChangesAsync() multiple times simultaneously? What would happen when more than one thread is trying to actually write data in the database at the same time? How is it handled? I'm working a project where I have to fetch-and-save periodically and I'm afraid that the time it takes to process each package of data in some cases might be bigger than the time-interval between calls. I'm new to Entity Framework and I find it fascinating, but I still have my doubts regarding the action behind the scenes. Any help will be much appreciated.

If "simultaneously" meand by multiple threads then you can't do that because EF is not thread safe.
Here are some examples that might help you.
https://github.com/mjrousos/MultiThreadedEFCoreSample
Reference here:
https://learn.microsoft.com/fr-fr/ef/core/miscellaneous/async

Processing large resultset with NHibernate

I have following task to do: calculate interest for all active accounts. In the past I was doing things like this using Ado.Net and stored procedures.
This time I've tried to do that with NHibernate, because it seemed that complex algorithms would be done easier with pure POCO.
So I want to do following (pseudocode):
foreach account in accounts
calculate interest
save account with new interest
I'm aware that NHibernate was not designed for processinge large data volumes. For me it is sufficient to have possibility to organize such a loop without having all accounts in memory at once.
To minimize memory usage I would use IStatelessSession for external loop instead of plain ISession.
I've tried approach proposed by Ayende. There are two problems:
CreateQuery is using "magic strings";
more important: it doesn't work as described.
My program works but after switching on Odbc trace I saw in debugger that all fetches were done before lambda expression in .List was executed for the first time.
I've found myself another solution: session.Query returning .AsEnumerable() which I've used in foreach. Again two problems:
I would prefer IQueryOver over IQueryable
still doesn't work as described (all fetches before first interest calculation).
I don't know why but IQueryOver doesn't have AsEnumerable. It also doesn't have List method with argument (like CreateQuery). I've tried .Future but again:
documentation of Future doesn't describe streaming feature
still doesn't work as I need (all fetches before first interest calculation).
In summary: is there any equivalent in NHibernate to dataReader.Read() from Ado.Net?
My best alternative to pure NHibernate approach would be main loop using dataReader.Read() and then Load account with id from Ado.Net loop. However performance will suffer - reading each account via key is slower than sequence of fetches done in outer loop.
I'm using NHibernate version 4.0.0.4000.

While it is true that NH was not designed with large-valume processing in mind you can always circumvent this restriction with application-layer batch processing. I have found that depending on the size of the object graph of the relevant entity, performance will suffer after a certain amount of objects have been loaded into memory (in one small project I could load 100.000 objects and performance would remain acceptable, in an other with only 1500 objects any additional Load() would crawl).
In the past I have used paging to handle batch processing, when IStatelessSession result sets are too poor (as they don't load proxies etc).
So you make a count query in the beginning, make up some arbitrary batch size and then start doing your work on the batch. This way you can neatly avoid the n+1 select problem, assuming that for each batch you explicitly fetch-join everything needed.
The caveat is that for this to work efficiently you will need to evict the processed entities of each batch from the ISession when you are done. And this means that you will have to commit-transaction on each batch. If you can live with multiple flush+commits then this could work for you.
Else you will have to go by the IStatelessSession although there are no lazy queries there. "from Books" means "select * from dbo.Books" or something equivalent and all results are fetched into memory.

Entity Framework 6 Fails Under High Load

I am stress testing my website. It uses Entity Framework 6.
I have 10 threads. This is what they are doing:
Fetch some data from the web.
Create new database context.
Create/Update records in the database using Database.SqlQuery(sql).ToList() to read and Database.ExecuteSqlCommand(sql) to write (about 200 records/second)
Close context
It crashes within 2 minutes with a database deadlock exception (consistently on a read!).
I have tried wrapping steps 2-4 in a Transaction, but this did not help.
I have read that as of EF6, ExecuteSqlCommand is wrapped in a transaction by default (https://msdn.microsoft.com/en-us/data/dn456843.aspx). How do I turn this behavior off?
I don't even understand why my transactions are deadlocked, they are read/writing independent rows.
Is there a database setting I can flip somewhere increase the size of my pending transaction queue?

I doubt EF has anything to do with it. Even though you are reading/writing independent rows, locks can escalate and lock pages. If you are not careful with your database design, and how you perform the reads and writes (order is important) you can deadlock, with EF or any other access technique.

What transaction type is being used?
.Net's TransactionScope defaults to SERIALIZABLE, at least in my applications which admittedly do not use EF. SERIALIZABLE transactions deadlock much more easily in my experience than other types such as ReadCommitted.

ASP.NET MVC multiple threads access database simultaneously

I am building a ASP.NET MVC 4 app using Entity Framework where multiple threads can access a table at the same time (add, delete row, etc.) Right now I am doing using (UserDBContext db = new UserDBContext()) within each controller (so a new DBContext is created for each request since MVC framework creates a seperate thread for each request). From what i read, this is safe; however, I am curious about:
What happens when two threads access the same table, but not the same row? Are both threads allowed to access simultaneously?
What happens when two threads modify the same row? say, one tries to read while the other tries to delete? Is one thread blocked (put to sleep), then gets waken up automatically when the other is done?
Thanks!

1: Locking in the database. Guaranteeing multi user scenarios is one of the top priority of databases. Learn the basics. There are good books.
2: Locking. Again. One will have to wait.
This is extremely fundamental so I would suggest you take 2 steps back and get something like "SQL for dummies" and learn about ACID conditions that any decent database guarantees. NOTHING in here has to do with EF by the way. This is all done on the database internal level.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.