We have been using Generic Repo pattern, which I see some voices calls it antipattern but its better to start something then to sit and wait for everything to complete :-)
Senario 1
var placeStatus = await _placeService.AddAsync(oPlace, false); // here false just add to context and don't hit Savechanges
var orgStatus = await _organizationService.AddAsync(oOrganization, false);
_unitOfWork.SaveChanges();
Vs
Task<short> placeStatus = _placeService.AddAsync(oPlace, true);
Task<short> orgStatus = _organizationService.AddAsync(oOrganization, true);
await Task.WhenAll(placeTask, orgTask);
With my limited knowledge I assume SaveChanges() maintains rollback internally in first case whereas I will have to handle rollback in 2nd case. I also assume parallel execution from await Task.WhenAll
1) so is SaveChanges() parallel? or performant than the second one if atomicity is not issue or an issue, and am I on right track if I do the second one?
Senario 2
Task<Place> placeTask= _placeCore.SelectByIdAsync(id);
Task<Organization> organizationTask = _organizationCore.SelectByIdAsync(id);
await Task.WhenAll(placeTask, organizationTask);
2) Can I skip joins (which might break whole concept of generic repo) in generic repo pattern using await as on Senario 2.
Any links, books reference or story would be helpful
Thanks
You cannot have two queries running in parallel on the same DataContext. As noted in the comments this won't work in the current version of EF. You either need to create separate data contexts for specific scenarios (which makes code significantly more complex and should not be done without clear benefit) or switch to serial queries.
The proper way to use EF is to use non-async, non-SaveChanges calling Add/Update/Delete methods and async Select methods. Your SaveChanges should be async and call into the DataContext's SaveChangesAsync. SaveChanges will batch your inserts, updates and deletes together.
Here is some multiple inserts via EF 6
foreach (var item in arryTags)
{
// Some Logic
_contentTagMapperRepository.Insert(oContentTagMapper);
}
_unitOfWork.SaveChanges();
using Unit of Work, trace on profiler
So it seems in overall EF does parallel inserts on millisecond time interval so on senario 1, I guess unitofWork is ideal I guess.
On senario 2 most probably joins will do the task
Related
My goal is to speed up a query, and I thought to leverage parallelism, lets assume that I have 2,000 items in ids list, and I split them to 4 lists each one with 500 ids, and I want to open 4 treads that each one will create a DB call and to unite their results, in order to achieve that I used Parallel.ForEach, but it did not improved the performance of the query because apparently it does not well suited to io bound operations: Parallel execution for IO bound operations
The code in the if block uses parallel for each, vs the code in the else block that do it in a regular foreach.
The problem is that the method that contains this query is not async (because it is in a very legacy component) and it can not be change to async, and basically I want to do parallel io bound calculation inside non async method (via Entity Framework).
What are the best practices to achieve this goal? I saw that maybe I can use Task.WaitAll() for that, I do not care to blocking the thread that runs this query, I am more concerned that something will went wrong with the Task.WaitAll() that is called from a non async method
I use Entity Framework as ORM over a SQL database, for each thread I opens a separate context because the context is not thread safe.
Maybe the lock that I use is the one that cause me the problem, I can change it to a ConcurrentDictionary.
The scenario depicted in the code below is simplified from the one I need to improve, in our real application I do need to read the related entities after I loaded there ids, and to perform a complicated calculation on them.
Code:
//ids.Bucketize(bucketSize: 500) -> split one big list, to few lists each one with 500 ids
IEnumerable<IEnumerable<long>> idsToLoad = ids.Bucketize(bucketSize: 500);
if (ShouldLoadDataInParallel())
{
object parallelismLock = new object();
Parallel.ForEach(idsToLoad,
new ParallelOptions { MaxDegreeOfParallelism = 4 },
(IEnumerable<long> bucket) =>
{
List<long> loadedIds = GetIdsQueryResult(bucket);
lock (parallelismLock)
{
allLoadedIds.AddRange(loadedIds );
}
});
}
else
{
foreach (IEnumerable<long> bucket in idsToLoad)
{
List<long> loadedIds = GetIdsQueryResult(bucket);
allLoadedIds.AddRange(loadedIds);
}
}
What are the best practices [for running multiple queries in parallel]?
Parallel.ForEach with seperate DbContext/SqlConnection is a fine approach.
It's just that running your queries in parallel is not really helpful here.
If your 4 queries hit 4 separate databases, then you might get a nice improvement. But there's many reasons why running 4 separate queries in parallel on a single instance might not be faster than running a single large query. Among these are blocking, resource contention, server-side query parallelism, and duplicating work between the queries.
And so
My goal is to speed up a query, and I thought to leverage parallelism
And so this is not usually a good approach to speeding up a query. There are, however, many good ways to speed up queries, so if you post a new question with the details of the query and perhaps some sample data you might get some better suggestions.
I have the following code, intended to break a bulk EF save into smaller chunks, ostensibly to improve performance.
var allTasks = arrayOfConfigLists
.Select(configList =>
Task.Run(() => SaveConfigurations(configList))
.ToArray();
Task.WaitAll(allTasks);
Each call to SaveConfigurations creates a new context that runs to completion.
private static void SaveConfigurations(List<Configuration> configs)
{
using (var dc = new ConfigContext())
{
dc.Configuration.AutoDetectChangesEnabled = false;
dc.SaveConfigurations(configs);
}
}
As it stands, the code runs relatively efficiently, considering this might not be the optimal way of doing things. If one of the SaveConfigurations fails, however, I realized I would need to roll back any other configurations that were saved to the database.
After doing some research, I upgraded my existing frameworks to 4.5.1 and utilized the new TransactionScopeAsyncFlowOption.Enabled option to deal with async calls. I made the following change:
using (var scope =
new TransactionScope(TransactionScopeAsyncFlowOption.Enabled))
{
//... allTasks code snippet from above
scope.Complete();
}
At this point, I started aggregating all kinds of interesting errors:
The operation is not valid for the state of the transaction.
The underlying provider failed on Open.
Network access for Distributed Transaction Manager (MSDTC) has been disabled.
The transaction manager has disabled its support for remote/network transactions.
What I don't understand is why introducing TransactionScope would create so many issues. I assume I have a fundamental misunderstanding of how async calls interact with EF, and how TransactionScope wraps those calls, but I can't figure it out. And I really have no clue what the MSDTC exception pertains to.
Any thoughts as to how I could have rollback functionality with asynchronous calls made to the same database? Is there a better way to handle this situation?
Update:
After reviewing the documentation here, I see that Database.BeginTransaction() is the preferred EF call. However, this assumes that all of my changes will occur within the same context, which it won't. Short of creating a dummy context and passing around a transaction, I don't believe this solves my issue.
This has nothing to do with async. You are writing on multiple connections and want that to be atomic. That requires distributed transactions. There is no way around that.
You also might run into distributed deadlocks this way that will only be resolved by timeouts.
Probably, the best approach is to stop using multiple connections. If performance is such a concern consider making the writes using one of the well known bulk DML techniques which do not involve EF.
You can use MARS to make concurrent writes on the same connection but they are really executed serially on the server. This would provide a small speedup though due to pipelining effects. Likely not worth the trouble
How about this
This will only create one context i.e, and attach entities to the context.
See entity framework bulk insertion
If there is anything goes wrong in the insertion the entire transaction will be roll-backed. If you want more transaction like pattern implement Unit of work pattern
as far as i know Entity framework itself has Unit of work pattern.
public SaveConfigurations(List<Configuration> configs)
{
try
{
using (var dc = new ConfigContext())
{
dc.Configuration.AutoDetectChangesEnabled = false;
foreach(var singleConfig in configs)
{
//Donot invoke dc.SaveChanges on the loop.
//Assuming the SaveConfiguration is your table.
//This will add the entity to DbSet<T> , Will not insert to Db until you invoke SaveChanges
dc.SaveConfiguration.Add(singleConfig);
}
dc.Configuration.AutoDetectChangesEnabled = true;
dc.SaveChanges();
}
}
catch (Exception exception)
{
throw exception
}
}
I'm building a web site using asp.net mvc 5.
Currently i'm using dependency injection for injecting per request dbcontext into my controllers.
But EF is not thread safe so one dbcontext can't be used in parallel queries.
Is it worth to make change to my website so just this page use smt like this?
using(var ctx = new dbcontext) {
//creating a task query like tolistasync
}
using(var ctx2 = new dbcontext) {
//creating a task query like tolistasync
}
using(var ctx3 = new dbcontext) {
//creating a task query like tolistasync
}
.
.
.
.
.
.
.
using(var ctx20 = new dbcontext) {
//creating a task query like tolistasync
}
and then:
Task.WhenAll(t1,t2,t3,......,t20)
or should i just use one dbcontext per request and do somthing like this:
var query1result = await query1.ToListAsync();
var query2result = await query2.ToListAsync();
var query3result = await query3.ToListAsync();
var query4result = await query4.ToListAsync();
.
.
.
.
.
var query19Result = await query19.ToListAsync();
var query20Result = await query20.ToListAsync();
in first case there would be so many opening and closing connection to db.
in second there would be one connection but everything happen sequentially
which case is better an why?
But EF is not thread safe so one dbcontext can't be used in parallel queries.
"Thread safety" is completely different than "supports multiple concurrent operations".
Is it worth to make change to my website so just this page use smt like this?
which case is better an why?
Only you can answer that question.
However, there is some general guidance.
First, operations against a database are generally I/O-bound, not CPU-bound. Note that there are plenty of exceptions to this rule.
Second, if all/most of the operations are hitting the same database, there's a definite contention going on at the file level.
Third, if the database is on a traditional (i.e., not solid-state) hard drive, there's even more contention going on at the disk platter level.
So, all this is to say that if your backend is just a regular SQL Server, then you probably won't see any benefit (i.e., faster response times) from concurrent database operations when the server is under normal load. In fact, in this scenario, you probably won't see any benefit from asynchronous database calls at all (as compared to synchronous calls).
However, if your backend is more modern, say, an Azure SQL instance (especially one running on SSDs), then concurrent database operations may indeed speed up your requests.
If you really have to deal with a lot of queries, you can run them in parallel. I would use Parallel.ForEach.
First of all, ask yourself - do you really have performance problems? If no - do it as usual - in one DbContext. It's easiest and very safe way.
If you have problems? Let's try:
If your queries is read only, you can run many thread in parallel. Creating new DbContext and open new connections - is not a big problem in common. Also, you can call all your read only queries with AsNoTracking. So, EF will not cache entities in context.
But, think twice. It's more difficult to debug and find problems in parallel-executing code. So, your operations must be very simple.
I've got a database entity type Entity, a long list of Thingy and method
private Task<Entity> MakeEntity(Thingy thingy) {
...
}
MakeEntity does lots of stuff, and is CPU bound. I would like to convert all my thingies to entities, and save them in a db.context. Considering that
I don't want to finish as fast as possible
The amount of entities is large, and I want to effectively use the database, so I want to start saving changes and waiting for the remote database to do it's thing
how can I do this performantly? What I would really like is to loop while waiting for the database to do its thing, and offer all the newly made entities so far, untill the database has processed them all. What's the best route there? I've run in to saveChanges throwing if it's called concurrently, so I can't do that. What I'd really like is to have a threadpool of eight threads (or rather, as many threads as I have cores) to do the CPU bound work, and a single thread doing the SaveChanges()
This is a kind of "asynchronous stream", which is always a bit awkward.
In this case (assuming you really do want to multithread on ASP.NET, which is not recommended in general), I'd say TPL Dataflow is your best option. You can use a TransformBlock with MaxDegreeOfParallelism set to 8 (or unbounded, for that matter), and link it to an ActionBlock that does the SaveChanges.
Remember, use synchronous signatures (not async/await) for CPU-bound code, and asynchronous methods for I/O-bound code (i.e., SaveChangesAsync).
You could set up a pipeline of N CPU workers feeding into a database worker. The database worker could batch items up.
Since MakeEntity is CPU bound there is no need to use async and await there. await does not create tasks or threads (a common misconception).
var thingies = ...;
var entities = thingies.AsParallel().WithDOP(8).Select(MakeEntity);
var batches = CreateBatches(entities, batchSize: 100);
foreach (var batch in batches) {
Insert(batch);
}
You need to provide a method that creates batches from an IEnumerable. This is available on the web.
If you don't need batching for the database part you can delete that code.
For the database part you probably don't need async IO because it seems to be a low-frequency operation.
I'm looking for an execution context which plays nicely with async/await and with the TPL at the same time in the following way (expected behavior):
async Task<string> ReadContext(string slot)
{
// Perform some async code
...
return Context.Read(slot);
}
(1) Playing nicely with async/await
Context.Store("slot", "value");
await DoSomeAsync();
Assert.AreEqual("value", Context.Read("slot"));
Context.Store("slot", "value");
var value = await ReadContext("slot");
Assert.AreEqual("value", value);
(2) Playing nicely with Task.Run()
Context.Store("slot", "value");
var task = Task.Run(() => ReadContext("slot"));
Assert.IsNull(task.Result);
(3) Playing nicely with an awaited Task
Context.Store("slot", "value");
var value = await Task.Run(() => ReadContext("slot"));
Assert.AreEqual("value", value);
(3) is not essential but would be nice. I use CallContext now but it fails at (2) as the values stored in it are accessible even in manually ran tasks, even in those ran using Task.Factory.StartNew(..., LongRunning) which should enforce running the task on a separate thread.
Is there any way to accomplish that?
Your real question is in your comment:
I need a place to store NHibernate sessions in ASP.NET application. HttpContext works fine (and respects async/await) if I'm inside a request context, but it is unavailable once I jump into manually ran tasks.
First off, you should be avoiding "manually run tasks" in an ASP.NET application at all; I have a blog post on the subject.
Secondly, storing things in HttpContext.Items is sort of a hack. It can be useful in a handful of situations, but IMO managing NHibernate sessions is something that should be designed properly into your application. That means you should be passing around the session (or a service providing access to the session) either in your method calls or injected into each type that needs it.
So, I really think that a "context" like you're looking for is a wrong solution. Even if it were possible, which it is not.
As #Noseratio noted, requirements (2) and (3) cannot both be met. Either the code executing in the Task.Run has access or it does not; it can't be both.
As you've discovered, requirements (1) and (3) can be met by the logical call context (note to Googlers: this only works under .NET 4.5 and only if you store immutable data; details on my blog).
There is no easy way to satisfy (1) and (2) unless you manually remove the data (FreeNamedDataSlot) at the beginning of the code in Task.Run. I think there may be another solution but it would require custom awaitables at every await, which is completely cumbersome, brittle, and unmaintainable.