I have the following code, intended to break a bulk EF save into smaller chunks, ostensibly to improve performance.
var allTasks = arrayOfConfigLists
.Select(configList =>
Task.Run(() => SaveConfigurations(configList))
.ToArray();
Task.WaitAll(allTasks);
Each call to SaveConfigurations creates a new context that runs to completion.
private static void SaveConfigurations(List<Configuration> configs)
{
using (var dc = new ConfigContext())
{
dc.Configuration.AutoDetectChangesEnabled = false;
dc.SaveConfigurations(configs);
}
}
As it stands, the code runs relatively efficiently, considering this might not be the optimal way of doing things. If one of the SaveConfigurations fails, however, I realized I would need to roll back any other configurations that were saved to the database.
After doing some research, I upgraded my existing frameworks to 4.5.1 and utilized the new TransactionScopeAsyncFlowOption.Enabled option to deal with async calls. I made the following change:
using (var scope =
new TransactionScope(TransactionScopeAsyncFlowOption.Enabled))
{
//... allTasks code snippet from above
scope.Complete();
}
At this point, I started aggregating all kinds of interesting errors:
The operation is not valid for the state of the transaction.
The underlying provider failed on Open.
Network access for Distributed Transaction Manager (MSDTC) has been disabled.
The transaction manager has disabled its support for remote/network transactions.
What I don't understand is why introducing TransactionScope would create so many issues. I assume I have a fundamental misunderstanding of how async calls interact with EF, and how TransactionScope wraps those calls, but I can't figure it out. And I really have no clue what the MSDTC exception pertains to.
Any thoughts as to how I could have rollback functionality with asynchronous calls made to the same database? Is there a better way to handle this situation?
Update:
After reviewing the documentation here, I see that Database.BeginTransaction() is the preferred EF call. However, this assumes that all of my changes will occur within the same context, which it won't. Short of creating a dummy context and passing around a transaction, I don't believe this solves my issue.
This has nothing to do with async. You are writing on multiple connections and want that to be atomic. That requires distributed transactions. There is no way around that.
You also might run into distributed deadlocks this way that will only be resolved by timeouts.
Probably, the best approach is to stop using multiple connections. If performance is such a concern consider making the writes using one of the well known bulk DML techniques which do not involve EF.
You can use MARS to make concurrent writes on the same connection but they are really executed serially on the server. This would provide a small speedup though due to pipelining effects. Likely not worth the trouble
How about this
This will only create one context i.e, and attach entities to the context.
See entity framework bulk insertion
If there is anything goes wrong in the insertion the entire transaction will be roll-backed. If you want more transaction like pattern implement Unit of work pattern
as far as i know Entity framework itself has Unit of work pattern.
public SaveConfigurations(List<Configuration> configs)
{
try
{
using (var dc = new ConfigContext())
{
dc.Configuration.AutoDetectChangesEnabled = false;
foreach(var singleConfig in configs)
{
//Donot invoke dc.SaveChanges on the loop.
//Assuming the SaveConfiguration is your table.
//This will add the entity to DbSet<T> , Will not insert to Db until you invoke SaveChanges
dc.SaveConfiguration.Add(singleConfig);
}
dc.Configuration.AutoDetectChangesEnabled = true;
dc.SaveChanges();
}
}
catch (Exception exception)
{
throw exception
}
}
Related
I'm not sure that the current implementation of our DAL is correct and I think that the number of connections to the DB is low with respect to what it should be.
I've noticed that there's no MARS inside the connection string
and when I try to perform multple calls on the same repository I got
System.InvalidOperationException:A second operation was started on
this context instance before a previous operation completed. This is
usually caused by different threads concurrently using the same
instance of DbContext. For more information on how to avoid threading
issues with DbContext, see
https://go.microsoft.com/fwlink/?linkid=2097913.
This means that I cannot open use the same repository for example executing an operation in parallel (for example getting some details?)
for example
foreach(var i in mainProducts) // retrieved from EF with a .ToListAsync
{
i.Details = await repo.GetDetailsFromIdAsync(i.Id);
}
I know that sometime i can just return a single object from the query, but there're some cases where I can't
Thanks
I'm building a web site using asp.net mvc 5.
Currently i'm using dependency injection for injecting per request dbcontext into my controllers.
But EF is not thread safe so one dbcontext can't be used in parallel queries.
Is it worth to make change to my website so just this page use smt like this?
using(var ctx = new dbcontext) {
//creating a task query like tolistasync
}
using(var ctx2 = new dbcontext) {
//creating a task query like tolistasync
}
using(var ctx3 = new dbcontext) {
//creating a task query like tolistasync
}
.
.
.
.
.
.
.
using(var ctx20 = new dbcontext) {
//creating a task query like tolistasync
}
and then:
Task.WhenAll(t1,t2,t3,......,t20)
or should i just use one dbcontext per request and do somthing like this:
var query1result = await query1.ToListAsync();
var query2result = await query2.ToListAsync();
var query3result = await query3.ToListAsync();
var query4result = await query4.ToListAsync();
.
.
.
.
.
var query19Result = await query19.ToListAsync();
var query20Result = await query20.ToListAsync();
in first case there would be so many opening and closing connection to db.
in second there would be one connection but everything happen sequentially
which case is better an why?
But EF is not thread safe so one dbcontext can't be used in parallel queries.
"Thread safety" is completely different than "supports multiple concurrent operations".
Is it worth to make change to my website so just this page use smt like this?
which case is better an why?
Only you can answer that question.
However, there is some general guidance.
First, operations against a database are generally I/O-bound, not CPU-bound. Note that there are plenty of exceptions to this rule.
Second, if all/most of the operations are hitting the same database, there's a definite contention going on at the file level.
Third, if the database is on a traditional (i.e., not solid-state) hard drive, there's even more contention going on at the disk platter level.
So, all this is to say that if your backend is just a regular SQL Server, then you probably won't see any benefit (i.e., faster response times) from concurrent database operations when the server is under normal load. In fact, in this scenario, you probably won't see any benefit from asynchronous database calls at all (as compared to synchronous calls).
However, if your backend is more modern, say, an Azure SQL instance (especially one running on SSDs), then concurrent database operations may indeed speed up your requests.
If you really have to deal with a lot of queries, you can run them in parallel. I would use Parallel.ForEach.
First of all, ask yourself - do you really have performance problems? If no - do it as usual - in one DbContext. It's easiest and very safe way.
If you have problems? Let's try:
If your queries is read only, you can run many thread in parallel. Creating new DbContext and open new connections - is not a big problem in common. Also, you can call all your read only queries with AsNoTracking. So, EF will not cache entities in context.
But, think twice. It's more difficult to debug and find problems in parallel-executing code. So, your operations must be very simple.
We have been using Generic Repo pattern, which I see some voices calls it antipattern but its better to start something then to sit and wait for everything to complete :-)
Senario 1
var placeStatus = await _placeService.AddAsync(oPlace, false); // here false just add to context and don't hit Savechanges
var orgStatus = await _organizationService.AddAsync(oOrganization, false);
_unitOfWork.SaveChanges();
Vs
Task<short> placeStatus = _placeService.AddAsync(oPlace, true);
Task<short> orgStatus = _organizationService.AddAsync(oOrganization, true);
await Task.WhenAll(placeTask, orgTask);
With my limited knowledge I assume SaveChanges() maintains rollback internally in first case whereas I will have to handle rollback in 2nd case. I also assume parallel execution from await Task.WhenAll
1) so is SaveChanges() parallel? or performant than the second one if atomicity is not issue or an issue, and am I on right track if I do the second one?
Senario 2
Task<Place> placeTask= _placeCore.SelectByIdAsync(id);
Task<Organization> organizationTask = _organizationCore.SelectByIdAsync(id);
await Task.WhenAll(placeTask, organizationTask);
2) Can I skip joins (which might break whole concept of generic repo) in generic repo pattern using await as on Senario 2.
Any links, books reference or story would be helpful
Thanks
You cannot have two queries running in parallel on the same DataContext. As noted in the comments this won't work in the current version of EF. You either need to create separate data contexts for specific scenarios (which makes code significantly more complex and should not be done without clear benefit) or switch to serial queries.
The proper way to use EF is to use non-async, non-SaveChanges calling Add/Update/Delete methods and async Select methods. Your SaveChanges should be async and call into the DataContext's SaveChangesAsync. SaveChanges will batch your inserts, updates and deletes together.
Here is some multiple inserts via EF 6
foreach (var item in arryTags)
{
// Some Logic
_contentTagMapperRepository.Insert(oContentTagMapper);
}
_unitOfWork.SaveChanges();
using Unit of Work, trace on profiler
So it seems in overall EF does parallel inserts on millisecond time interval so on senario 1, I guess unitofWork is ideal I guess.
On senario 2 most probably joins will do the task
I am a new .NET developer and I am using LINQ-to-Entities for my first time. As I am practicing a lot on it, I would like to ask you about the entity framework a question that I could not find its solution on Google. Since I am connecting to the entities in my code as following:
using(MyEntities context = new MyEntities())
Why should I use Using block for connecting to the Entities? Is there any way like defining a Helper class to take care of this?
You use using statements to make sure any resources you are done with get disposed of, so they don't use any memory when they don't need to. If you would have many connections to Entity Framework (EF) open (for instance), your app could consume a whole lot of memory and cause some trouble on the machine it's running on.
A using statement calls Dispose() on an object that implements IDisposable (an interface). You could avoid using the using statement by calling Dispose() explicitly. (But you might forget, so it's easy to just use the using for an object that you want to dispose once you're done with it.)
You could setup a connection in a helper class, by, for instance, using a Singleton Pattern to make sure you only ever have one object with an Entity Connection at the most. But in the case of EF this might cause other problems with tracking by EF. You could also setup a Repository (C# EF Repository Pattern) that is the only class to open and close these connections to EF and keeps it neatly in one place.
using is a handy shortcut to say .Dispose() at the end of object's usage. It wraps your code in try-finally block, so, even if exception will occur, all object resources (like opened connection, etc.) will be disposed. This is necessary, since connection, for example, is really limited resource, and should be treated with care.
So instead of coding manually
//Define context object here
try
{
//Do some job, which might lead to exception
}
finally
{
context.Dispose();
}
you're doing just simple
using(var context = new MyEntities())
{
//Do some job, which might lead to exception
} //Resources will be disposed here
Efficiently it's same. But more easy to write.
Additionally, you can (and mostly should, it's good practice) apply using to object of any class, which implements IDisposable interface.
Some readings:
Entity Framework and context dispose
Entity Framework 4 - lifespan/scope of context in a winform application //though about winforms, still interesting discussion
http://msdn.microsoft.com/en-us/library/system.idisposable%28v=vs.110%29.aspx
Using only makes sure that your unmanaged resources will be safely disposed at the end of using block. You don't need to make manual dispose in finally block.
Normal exception handling is still necessary to organize. It's up do you, how to do it.
try
{
using(var context = new MyEntities()) //Connection is not opened yet...
{
var data = context.Collection.ToList(); //Here Entity will try to open connection.
//If it's impossible, you will have EntityException.
} //Resources will be disposed here
}
//Something about entity
catch(EntityException ex)
{
//Let's log it...
Logger.Log.Error("Some problem with entity...", ex);
//Show message, or notify user in other way
MessageBox.Show("Problem with database.");
}
//You don't know, what happened, but will check log
catch(Exception ex)
{
Logger.Log.Error("Unknown problem...", ex);
//Show message, or notify user in other way
MessageBox.Show("Problem. If it will persist, contact your admins");
}
But it's only stub. What really should be done, depends heavily on your application and db access architecture...
I have C# window service which talks to multiple databases on a MS SQL server. It is multi threaded and has many functions each with long list of database operations, each of these functions run under their own transaction. So a typical function is like
public void DoSomeDBWork()
{
using (TransactionScope ts = new TransactionScope(TransactionScopeOption.RequiresNew))
{
DatabaseUpdate1();
DatabaseUpdate2();
DatabaseUpdate3();
DatabaseUpdate4();
DatabaseUpdate5();
DatabaseUpdate6();
}
}
Under heavy load we are experiencing deadlocks. My question is, if I write some C# code to automatically resubmit the DatabaseUpdate in case of a deadlock will it hold back resources for uncommitted operations? for example , if a deadlock exception occurs in DatabaseUpdate6() and i retry it 3 times with a wait of 3 seconds, during this time will all the uncommitted operations "DatabaseUpdates 1 to 5" hold on to their resources which might further increase chances of more deadlocks ? Is it even a good practice to retry in case of deadlocks.
You are barking up the wrong tree.
Deadlock means the entire transaction scope is undone. Depending on your application, you may be able to restart from the using block, ie. a new TransactionScope, but this is very very very unlikely to be correct. The reason you are seeing a deadlock is that someone else has changed data that you were changing too. Since most of these updated are applying the update to a value previously read from the database, the deadlock is a clear indication that whatever you've read was changed. So applying your updates w/o reading again is going to overwrite whatever was changed by the other transaction, thus causing lost updates. This is why deadlock can almost never be 'automatically' retried, the new data has to be reload from the db, if user action was involved (eg. form edit) then user has to be notified and has to re-validate the changes, and only then the update can be tried again. Only certain type of automatic processing actions can be retires, but they are never retried as in 'try to write again', but they always act in a loop of 'read-update-write' and deadlocks will cause the loop to try again, and since they always start with 'read'. They are automatically self-correcting.
That being said, your code deadlocks most likely because of abusing the serialization isolation level when not required: using new TransactionScope() Considered Harmful. You must overwrite the transaction options to use the ReadCommitted isolation level, serializable is almost never required and is a guaranteed way to achieve deadlocks.
Second issue is Why does serialization deadlock? It deadlocks because of table scans, which indicate you don't have proper indexes in place for your reads and your updates.
Last issue is that you use RequiresNew, which is again, 99% of the cases, incorrect. Unless you have real deep understanding of what's going on and a bulletproof case for requiring a standalone transaction, you should always use Required and enlist in the encompassing transaction of the caller.
This doesn't cover everything in your question but on the subject of retries. The idea of retrying transactions, database or not, is dangerous and you should not read this if the word "idempotent" means nothing to you (frankly, i don't know enough about it either but my management had the final word and off I went to write in retries for deadlocks. I spoke to a couple of the smartest guys I know in this area and they all came back to me with "BAD BAD" so I don't feel good about committing that source. disclaimer aside, had to do it so may as well make it fun..., here's something I wrote recently to retry MySql deadlocks a specified number of times before throwing and returning
Using anonymous method you only have to have one receiver that can dynamically handle method signatures and generic return types. You'll also need a similar one for void return that will just need to use Action() For MSSQL it'll look pretty much identical I think, minus the 'my'
The handler that does the retry:
//
private T AttemptActionReturnObject<T>(Func<T> action)
{
var attemptCount = 0;
do
{
attemptCount++;
try
{
return action();
}
catch (MySqlException ex)
{
if (attemptCount <= DB_DEADLOCK_RETRY_COUNT)
{
switch (ex.Number)
{
case 1205: //(ER_LOCK_WAIT_TIMEOUT) Lock wait timeout exceeded
case 1213: //(ER_LOCK_DEADLOCK) Deadlock found when trying to get lock
Thread.Sleep(attemptCount*1000);
break;
default:
throw;
}
}
else
{
throw;
}
}
} while (true);
}
Wrap your method call with delegate or lambda
public int ExecuteNonQuery(MySqlConnection connection, string commandText, params MySqlParameter[] commandParameters)
{
try
{
return AttemptActionReturnObject( () => MySqlHelper.ExecuteNonQuery(connection, commandText, commandParameters) );
}
catch (Exception ex)
{
throw new Exception(ex.ToString() + " For SQL Statement:" + commandText);
}
}
it may also look like this:
return AttemptActionReturnObject(delegate { return MySqlHelper.ExecuteNonQuery(connection, commandText, commandParameters); });
When SQL detects a deadlock, it kills one thread and reports an error. If your thread is killed it automatically rolls back any uncommitted transactions - in your case ALL of the DatabaseUpdate*() that were already ran during this most recent transaction.
The ways to deal with this depend entirely on your environment. If you have something like a control table, or a string table, which is not updated, but frequently read. You can use NOLOCK... cue kicking and screaming... It is actually quite useful when you aren't worried about time or transaction sensitive information. However when you are dealing with volatile or stateful information you cannot use NOLOCK because it will lead to unexpected behavior.
There are two ways to handle deadlocks that I use. Either straight up restart the transaction from the beginning when you detect a failure. Or you can read in your variables before you use them, and execute afterwards. The second is something of a resource hog and sees significant decrease in performance so it should not be used for high-volume functionality.
I think different database servers may respond to a deadlock differently, howerver with SQL Server if two transactions are deadlocked one is elected by the server to as the deadlock victim (error 1205) and that transaction is rolled back. This means of course that the other transaction is able to proceed.
If you're the deadlock victim, you will have to redo all your database updates, not just update6.
In response to comments about avoiding deadlocks with hints such as NOLOCK, I would strongly recommand against it.
Deadlocks are simply a fact of life. Imagine, two users each submitting a manual journal entry into an accounting system
The first entry does a credit of the bank account & a debit of the receivables.
The second entry does a debit of the ar & credit bank.
Now imagine both transactions play at the same time (something that rarely if ever happens in testing)
transaction 1 locks the bank account
transaction 2 locks the a/r account.
transactions 1 tries to lock receivables and blocks waiting on transaction 2.
transaction 2 tries to lock the bank and a deadlock is automatically and instantly detected.
one of the transactions is elected as a victim of a deadlock and is rolled back. The other transaction proceeds as if nothing happened.
Deadlocks are a reality and the way to respond to them is quite straight forward. "please hang up and try your call again."
See MSDN for more information on Handling Deadlocks with SQL Server