I am having problems with a particular implementation I have been working on.
I have a basic method that creates a new context, queries a table and gets the "LastNumberUsed" from the table, performs some basic checks on this number before finally incrementing and writing back - all within a transaction.
I have written a basic test application that uses Parallel.For to execute this method 5 times.
Using Isolation.Serialization I'm finding I get a lot of Deadlock errors when running this code.
I have read a bit on this subject and tried changing the isolation level to snapshot. I no longer get deadlocks but instead find I get isolation update conflict errors.
I'm really at a loss what to do. Each transaction takes approximately 0.009 seconds to complete so I have been toying with the idea of wrapping the code in a try..catch, checking for a deadlock error and running again but this feels like a messy solution.
Does anybody have any ideas (or preferably experience) of how to deal with this problem?
I have created a console application to demonstrate this.
In program main I run the following code:
Parallel.For(0, totalRequests,
x => TestContract(x, contractId, incrementBy, maxRetries));
The method TestContract looks like this:
//Define the context
using (var context = new Entities())
{
//Define a new transaction
var options = new TransactionOptions {IsolationLevel = IsolationLevel.Serializable};
using (var scope = new TransactionScope(TransactionScopeOption.Required, options))
{
//Get the contract details
var contract = (
from c in context.ContractRanges
where c.ContractId == contractId
select c).FirstOrDefault();
//Simulate activity
Threading.Thread.sleep(50);
//Increment the contract number
contract.Number++;
//Save the changes made to the context
context.SaveChanges();
//Complete the scope
scope.Complete();
}
}
}
Putting the Isolation Level aside for a moment, let's focus on what your code is doing:
You are running 5 Tasks in Parallel that make a call to TestContract passing the same contractId for all of them, right?
In the TestContract you fetch the contract by its id, do some work with it, then increments the Number property of the contract.
All this within a transaction boundary.
Why deadlocks?
In order to understand why you are running into a deadlock, it's important to understand what the Serializable Isolation Level means.
The documentation of SQL Server Isolation Levels says the following about Serializable (emphasis mine):
Statements cannot read data that has been modified but not yet committed by other transactions.
No other transactions can modify data that has been read by the current transaction until the current transaction completes.
Other transactions cannot insert new rows with key values that would fall in the range of keys read by any statements in the current
transaction until the current transaction completes.
Range locks are placed in the range of key values that match the
search conditions of each statement executed in a transaction. This
blocks other transactions from updating or inserting any rows that
would qualify for any of the statements executed by the current
transaction. This means that if any of the statements in a transaction
are executed a second time, they will read the same set of rows. The
range locks are held until the transaction completes. This is the most
restrictive of the isolation levels because it locks entire ranges of
keys and holds the locks until the transaction completes. Because
concurrency is lower, use this option only when necessary. This option
has the same effect as setting HOLDLOCK on all tables in all SELECT
statements in a transaction.
Going back to your code, for the sake of this example, let's say you have only two tasks running in parallel, TaskA and TaskB with contractId=123, all under a transaction with Serializable Isolation Level.
Let's try to describe what is going on with the code in this execution:
TaskA Starts
TaskB Starts
TaskA Creates a Transaction 1234 with Serializable Isolation Level
TaskB Creates Transaction 5678 with Serializable Isolation Level
TaskA makes a SELECT * FROM ContractRanges WHERE ContractId = 123.
At this point. SQL Server puts a lock in the ContractRanges table, in the row where ContractId = 123 to prevent other transactions from mutating that data.
TaskB makes the same SELECT statement and also puts a lock in the ContractId = 123 row of the ContractRanges table.
So, at this point, we have two locks on that same row, one for each transaction that you created.
TaskA then increment the Number of the contract
TaskB increment the Number property of the contract
TaskA calls, SaveChanges which, in turn, tries to commit the transaction.
So, when you try to commit transaction 1234, we are trying to modify the Number value in a row that has a lock created by transaction 5678 so, SQL Servers starts to wait for the lock to be release in order to commit the transaction like you requested.
TaskB, then, also calls SaveChanges, and like it happened with TaskA, it is trying to increment the Number of the Contract 123. In this case, it finds a lock on that row created by transaction 1234 from TaskA.
Now we have Transaction 1234 from TaskA waiting on the lock from Transaction 5678 to be released and Transaction 5678 waiting on the lock from Transaction 1234 to be released. Which means that we are on a deadlock as neither transaction will never be able to finish as they are blocking each other.
When SQL Server identifies that it is in a deadlock situation, it chooses one of the transactions as a victim, kill it and allow the other one to proceed.
Going back to the Isolation Level, I don't have enough details about what you are trying to do for me to have an opinion if you really need Serializable, but there is a good chance that you don't need it. Serializable is the most safe and strict isolation level and it achieves that by sacrificing concurrency, like we saw.
If you really need Serializable guarantees you really should not be trying to update the Number of the same contract concurrently.
The Snapshot Isolation alternative
You said:
I have read a bit on this subject and tried changing the isolation level to snapshot. I no longer get deadlocks but instead find I get isolation update conflict errors.
That's exactly the behavior that you want, should you choose to use Snapshot Isolation. That's because Snapshot uses an Optimistic Concurrency model.
Here is how it's defined on the same MSDN docs (again, emphasis mine):
Specifies that data read by any statement in a transaction will be the
transactionally consistent version of the data that existed at the
start of the transaction. The transaction can only recognize data
modifications that were committed before the start of the transaction.
Data modifications made by other transactions after the start of the
current transaction are not visible to statements executing in the
current transaction. The effect is as if the statements in a
transaction get a snapshot of the committed data as it existed at the
start of the transaction.
Except when a database is being recovered, SNAPSHOT transactions do not request locks when reading data. SNAPSHOT transactions reading data do not block other transactions from writing data. Transactions writing data do not block SNAPSHOT transactions from reading data.
During the roll-back phase of a database recovery,
SNAPSHOT transactions will request a lock if an attempt is made to
read data that is locked by another transaction that is being rolled
back. The SNAPSHOT transaction is blocked until that transaction has
been rolled back. The lock is released immediately after it has been
granted.
The ALLOW_SNAPSHOT_ISOLATION database option must be set to
ON before you can start a transaction that uses the SNAPSHOT isolation
level. If a transaction using the SNAPSHOT isolation level accesses
data in multiple databases, ALLOW_SNAPSHOT_ISOLATION must be set to ON
in each database.
A transaction cannot be set to SNAPSHOT isolation
level that started with another isolation level; doing so will cause
the transaction to abort. If a transaction starts in the SNAPSHOT
isolation level, you can change it to another isolation level and then
back to SNAPSHOT. A transaction starts the first time it accesses
data.
A transaction running under SNAPSHOT isolation level can view
changes made by that transaction. For example, if the transaction
performs an UPDATE on a table and then issues a SELECT statement
against the same table, the modified data will be included in the
result set.
Let's try to describe what is going on with the code when it executes under Snapshot Isolation:
Let's say the initial value of Number is 2 for contract 123
TaskA Starts
TaskB Starts
TaskA Creates a Transaction 1234 with Snapshot Isolation Level
TaskB Creates Transaction 5678 with Snapshot Isolation Level
In both snapshots, Number = 2 for Contract 123.
TaskA makes a SELECT * FROM ContractRanges WHERE ContractId = 123. As we are running under Snapshot isolation, there are no locks.
TaskB makes the same SELECT statement and also does not put any locks.
TaskA then increment the Number of the contract to 3
TaskB increment the Number property of the contract to 3
TaskA calls, SaveChanges which, in turn, causes SQL Server to compare the Snapshot created when the transaction was created and the current state of the DB as well as of the uncommitted changes that were made under this transaction. As it doesn't find any conflicts, it commits the transaction, and now Number has a value of 3 in the database.
TaskB, then, also calls SaveChanges, and tries to commit its transaction. When SQL Server compares the transactions Snapshot values with the ones currently at the DB it sees a conflict. In the Snapshot, Number had a value of 2 and now it has a value of 3. It, then, throws the Update Exception.
Again, there were no deadlocks, but TaskB failed this time because TaskA mutated the data that was also being used in TaskB.
How to fix this
Now that we covered what is going on with your code when you run it under Serializable and Snapshot Isolation Levels, what can you do to fix it.
Well, the first thing you should consider is if really makes sense for you to be concurrently mutating the same Contract record. This is the first big smell that I saw in your code and I would try to understand that first. You probably need to discuss this with your business to understand if they really need this concurrency on the contract.
Assuming you really need this to happen concurrently, as we saw, you can't really use Serializable as that would incur in deadlocks like you saw. So, we are left with Snapshot isolation.
Now, when you catch an OptmisticConcurrencyException it is really up to you handle depends on you and your business to decide.
For example, one way to handle it is to simply delegate to the user to decide what to do by displaying an error message to the user informing that the data they are trying to change have been modified and ask them if they want to refresh the screen to get the latest version of the data and, if needed, try to perform the same action again.
If that is not the case, and it's OK for you to retry, another option is for you to have a retry logic in your code that would retry performing the operation when a OptmitisticConcurrencyException is thrown. This is based on the assumption that at this second time, there won't be a concurrent transaction mutating the same data and the operation will now succeed.
Related
I am calling few stored procedures (functions in postgresql) using ExecuteNonQuery inside C# transaction,My SPs looks like
CREATE OR REPLACE FUNCTION public.setuserstatus(
par_userid integer,
par_status character varying)
RETURNS void
LANGUAGE 'plpgsql'
COST 100
VOLATILE PARALLEL UNSAFE
AS $BODY$
BEGIN
UPDATE public.user
SET status = par_status
WHERE userid = par_userid;
END;
$BODY$;
So to call this function what should be the best practice? is below code enough or I should use this inside transaction and use commit and rollback? please suggest?
using (var conn = new NpgsqlConnection(_connectionString))
{
await conn.OpenAsync(ct);
using (var cmd = new NpgsqlCommand("irnutil.setcrdsstatusforapmaccount", conn))
{
cmd.CommandType = CommandType.StoredProcedure;
cmd.Parameters.AddWithValue(ApmAccountIdParameterName, accountId);
cmd.Parameters.AddWithValue(CrdsStatusParameterName, crdsStatus.ToString());
await cmd.ExecuteNonQueryAsync(ct);
}
}
Also, the Npgsql documentations says "However, for maximum portability it's recommended to set the transaction on your commands.", is that mean we should use transaction here?
https://www.npgsql.org/doc/transactions.html
It depends on what you want to achieve.
The general idea of transaction is - A unit of work that is performed against a Database.
However, you will not want to use a transaction over select queries.
for Example:
select * from some table where condition = {condition}
Will not be a candidate for a proper transaction block.
Your example of update, is a good candidate.
Now comes the question, whether I should use it at code level or Database level?
The answer is again a question - What do you want to achieve?
If you perform it Database level, you will lose your ability to log errors. Yes, the transaction, if executed correctly to satisfy
your conditions will rollback the operation but you will have no
further details, meaning that you will have to use a profiler to
determine the cause for failure. On the other hand, Your transaction
will be safe and secured from the 'outside world' - If the chance
happen and someone changes your code or an harmful attack is taking
place, they will not be able to change the transaction and the data
will be safer.
If you perform it code level, You will be able to log the exception and review the cause for failure, it will be easier to
handle programatically however - It will be exposed to code changes
and possible malware attacks as I mentioned above.
A deadlock situation may occur during the execution of a transaction, meaning that if you execute a block of sql wrapped in transaction, you cannot execute it again until the transaction finished it's job or, you will encounter a dead lock and throw an exception. In that case it is recommended to create a queue to monitor your calls to the transaction and check whether you are clear to make another execution or not.
Again - this is all your decision according to your needs.
I want to know what are the practical differences of executing a transaction in the same database context between these 3 ways:
1) Multiple operations with one single SaveChanges(), without explicitly using a sql transaction
using (TestDbContext db = new TestDbContext())
{
// first operation
// second operation
db.SaveChanges();
}
2) Multiple operations with one single SaveChanges(), using a sql transaction
using (TestDbContext db = new TestDbContext())
using (DbContextTransaction trans = db.Database.BeginTransaction())
{
// operation 1
// operation 2
db.SaveChanges();
trans.commit();
}
3) Multiple operations with multiple SaveChanges(), using a sql transaction
using (TestDbContext db = new TestDbContext())
using (DbContextTransaction trans = db.BeginTransaction())
{
// operation 1
db.SaveChanges();
// operation 2
db.SaveChanges();
trans.commit();
}
In (2) and (3), if commit() is supposed to actually execute requested sql queries to database, is it really different, say, save changes for each operation or save changes for all operation at once?
And if (1) can also allow multiple operations to be safely executed in the same database context so what's the main use of manually starting a transaction? I'd say we can manually provide try/catch block to roll back the transaction if something bad happens, but AFAIK, SaveChanges() also covers it, automatically, at least with SQLServer.
** UPDATED: Another thing is: Should I make db context and transaction variables class-level or these should be local to containing methods only?
If you do not start a transaction, it is implicit. Meaning, all SaveChanges() you perform will be available in the database immediately after the call.
If you start a transaction, SaveChanges() still performs the updates, but the data is not available to other connections until a commit is called.
You can test this yourself by setting break points, creating new objects, adding them to the context, and performing a SaveChanges(). You will see the ID property will have a value after that call, but there will be no corresponding row in the database until you perform a commit on the transaction.
As far as your second question goes, it really depends on concurrency needs, what your class is doing and how much data you're working with. It's not so much a scoping issue as it is a code execution issue.
Contexts are not thread safe, so as long as you only have one thread in your application access the context, you can make it at a broader scope. But then, if other instances of the application are accessing the data, you're going to have to make sure you refresh the data to the latest model. You also should consider that the more of the model you have loaded into memory, the slower saves are going to be over time.
I tend to create my contexts as close to the operations that are to be performed as possible, and dispose them soon after.
Your question doesn't really seem to be about entity framework at all, and is more regarding sql transactions. A sql transaction is a single 'atomic' change. That is to say that either all the changes are committed, or none are committed.
You don't really have an example which covers the scenario, but if you added another example like:
using (TestDbContext db = new TestDbContext())
{
// operation 1
db.SaveChanges();
// operation 2
db.SaveChanges();
}
...in this example, if your first operation saved successfully, but the second operation failed, you could have a situation where data committed at the first step is potentially invalid.
That's why you would use a sql transaction, to wrap both SaveChanges into a single operation that means either all data is committed, or none is committed.
Here is a quote " So, if you are working with only one object context then you have already built-in support for database transactions when using the ObjectContext.SaveChanges method." I found here http://www.luisrocha.net/2011/08/managing-transactions-with-entity.html
So according to that, I don't have to use TransactionScope in a code below, right?
if (isLastCallSuccess)
{
if (condition1) //it's clear, no transaction needed
{
product.Property1 = true;
context.SaveChanges();
}
else if (condition2)
{
using (TransactionScope scope = new TransactionScope()) //do I need it?
{
context.DeleteObject(item); //deleting
context.AddObject("product", new product //adding
{
Id = oldObject.Id,
Property1 = true
});
context.SaveChanges(System.Data.Objects.SaveOptions.DetectChangesBeforeSave);
scope.Complete();
context.AcceptAllChanges();
}
}
What the quote means is that a single call to SaveChanges is automatically wrapped in a transaction, so it's atomic. You, however, are calling SaveChanges multiple times, so in order for the larger operation to be atomic you'll need to use the transaction as you currently have it.
So yes, you need it.
I would personally keep TransactionScope in so everything commits as a whole unit or rollsback upon an error (I.e. your save or add fails). If concurrency is a major part of your application using this will benefit your users, ensuring the integrity of the data is consistent.
I believe in your scenario you do need to use a transaction. SaveChanges creates an implicit transaction such that when it goes to persist a change to any of the objects, and that change cannot be persisted, it rolls back all other changes it attempted to make. But the transaction created by SaveChanges only lives as long as the call itself. If you are calling SaveChanges twice and want the actions of the first call to rollback if the second call fails, then yes, you need a transaction that wraps both calls, which the code you posted does just that.
I disagree; because you have multiple operations on your data, and you would want to make sure that the operations either succeed completely or fail completely (atomic). It also is good practice to make sure you are atomic.
If your delete worked, but your add failed, you would be left with a database in a bad state. At least if you had a transaction, the database would be back to the original state before you attempted the first operation.
EDIT:
Just for completion, inside a transaction, having the ability to rollback a transaction at any point is crucial, when you start to manipulate multiple tables in the same method/process.
From how I am reading it, you are worried about the delete and adding not committing to the database and if there is a fail then rolling the transaction back.
I dont think you need to wrap your insert and delete in a transaction, because as mentioned above it is all happening on one savechanges() which implicitly has transaction management. so if it did fail the changes would be rolled back.
I'm using TransactionScope to manage transactions in EF, i need a ReadCommited behavior but it doesn't works as expected :
using (var trans = new TransactionScope(TransactionScopeOption.Required,
new TransactionOptions()
{ IsolationLevel = IsolationLevel.ReadCommitted}))
{
var c1 = customerRepository.Get(1);
c1.FirstName = "Modified";
customerRepository.Save();
var c2 = customerRepository.Get(1);
Assert.AreNotEqual("Modified", c2.FirstName);
trans.Complete();
}
while i still didn't committed the transaction when getting the second instance, it's FirstName is already modified.
You're inside the same transaction. The transaction isolation level refers to different transactions.
You can't isolate a translation from itself, but of other different transactions.
Try opening two different transaction scopes (i.e. with two apps running at the same time) and you'll see the effect os isolation between them. You can do this debugging two different apps at the same time, and pausing them before commiting the scope.
Look SET TRANSACTION ISOLATION LEVEL (Transact-SQL)
As you can see, when each transaction isolation level is explained, it always refers to other transactions:
READ UNCOMMITTED
Specifies that statements can read rows that have been modified by other transactions but not yet committed.
READ COMMITTED
Specifies that statements cannot read data that has been modified but not committed by other transactions.
REPEATABLE READ
Specifies that statements cannot read data that has been modified but not yet committed by other transactions and ...
and so on.
I have C# window service which talks to multiple databases on a MS SQL server. It is multi threaded and has many functions each with long list of database operations, each of these functions run under their own transaction. So a typical function is like
public void DoSomeDBWork()
{
using (TransactionScope ts = new TransactionScope(TransactionScopeOption.RequiresNew))
{
DatabaseUpdate1();
DatabaseUpdate2();
DatabaseUpdate3();
DatabaseUpdate4();
DatabaseUpdate5();
DatabaseUpdate6();
}
}
Under heavy load we are experiencing deadlocks. My question is, if I write some C# code to automatically resubmit the DatabaseUpdate in case of a deadlock will it hold back resources for uncommitted operations? for example , if a deadlock exception occurs in DatabaseUpdate6() and i retry it 3 times with a wait of 3 seconds, during this time will all the uncommitted operations "DatabaseUpdates 1 to 5" hold on to their resources which might further increase chances of more deadlocks ? Is it even a good practice to retry in case of deadlocks.
You are barking up the wrong tree.
Deadlock means the entire transaction scope is undone. Depending on your application, you may be able to restart from the using block, ie. a new TransactionScope, but this is very very very unlikely to be correct. The reason you are seeing a deadlock is that someone else has changed data that you were changing too. Since most of these updated are applying the update to a value previously read from the database, the deadlock is a clear indication that whatever you've read was changed. So applying your updates w/o reading again is going to overwrite whatever was changed by the other transaction, thus causing lost updates. This is why deadlock can almost never be 'automatically' retried, the new data has to be reload from the db, if user action was involved (eg. form edit) then user has to be notified and has to re-validate the changes, and only then the update can be tried again. Only certain type of automatic processing actions can be retires, but they are never retried as in 'try to write again', but they always act in a loop of 'read-update-write' and deadlocks will cause the loop to try again, and since they always start with 'read'. They are automatically self-correcting.
That being said, your code deadlocks most likely because of abusing the serialization isolation level when not required: using new TransactionScope() Considered Harmful. You must overwrite the transaction options to use the ReadCommitted isolation level, serializable is almost never required and is a guaranteed way to achieve deadlocks.
Second issue is Why does serialization deadlock? It deadlocks because of table scans, which indicate you don't have proper indexes in place for your reads and your updates.
Last issue is that you use RequiresNew, which is again, 99% of the cases, incorrect. Unless you have real deep understanding of what's going on and a bulletproof case for requiring a standalone transaction, you should always use Required and enlist in the encompassing transaction of the caller.
This doesn't cover everything in your question but on the subject of retries. The idea of retrying transactions, database or not, is dangerous and you should not read this if the word "idempotent" means nothing to you (frankly, i don't know enough about it either but my management had the final word and off I went to write in retries for deadlocks. I spoke to a couple of the smartest guys I know in this area and they all came back to me with "BAD BAD" so I don't feel good about committing that source. disclaimer aside, had to do it so may as well make it fun..., here's something I wrote recently to retry MySql deadlocks a specified number of times before throwing and returning
Using anonymous method you only have to have one receiver that can dynamically handle method signatures and generic return types. You'll also need a similar one for void return that will just need to use Action() For MSSQL it'll look pretty much identical I think, minus the 'my'
The handler that does the retry:
//
private T AttemptActionReturnObject<T>(Func<T> action)
{
var attemptCount = 0;
do
{
attemptCount++;
try
{
return action();
}
catch (MySqlException ex)
{
if (attemptCount <= DB_DEADLOCK_RETRY_COUNT)
{
switch (ex.Number)
{
case 1205: //(ER_LOCK_WAIT_TIMEOUT) Lock wait timeout exceeded
case 1213: //(ER_LOCK_DEADLOCK) Deadlock found when trying to get lock
Thread.Sleep(attemptCount*1000);
break;
default:
throw;
}
}
else
{
throw;
}
}
} while (true);
}
Wrap your method call with delegate or lambda
public int ExecuteNonQuery(MySqlConnection connection, string commandText, params MySqlParameter[] commandParameters)
{
try
{
return AttemptActionReturnObject( () => MySqlHelper.ExecuteNonQuery(connection, commandText, commandParameters) );
}
catch (Exception ex)
{
throw new Exception(ex.ToString() + " For SQL Statement:" + commandText);
}
}
it may also look like this:
return AttemptActionReturnObject(delegate { return MySqlHelper.ExecuteNonQuery(connection, commandText, commandParameters); });
When SQL detects a deadlock, it kills one thread and reports an error. If your thread is killed it automatically rolls back any uncommitted transactions - in your case ALL of the DatabaseUpdate*() that were already ran during this most recent transaction.
The ways to deal with this depend entirely on your environment. If you have something like a control table, or a string table, which is not updated, but frequently read. You can use NOLOCK... cue kicking and screaming... It is actually quite useful when you aren't worried about time or transaction sensitive information. However when you are dealing with volatile or stateful information you cannot use NOLOCK because it will lead to unexpected behavior.
There are two ways to handle deadlocks that I use. Either straight up restart the transaction from the beginning when you detect a failure. Or you can read in your variables before you use them, and execute afterwards. The second is something of a resource hog and sees significant decrease in performance so it should not be used for high-volume functionality.
I think different database servers may respond to a deadlock differently, howerver with SQL Server if two transactions are deadlocked one is elected by the server to as the deadlock victim (error 1205) and that transaction is rolled back. This means of course that the other transaction is able to proceed.
If you're the deadlock victim, you will have to redo all your database updates, not just update6.
In response to comments about avoiding deadlocks with hints such as NOLOCK, I would strongly recommand against it.
Deadlocks are simply a fact of life. Imagine, two users each submitting a manual journal entry into an accounting system
The first entry does a credit of the bank account & a debit of the receivables.
The second entry does a debit of the ar & credit bank.
Now imagine both transactions play at the same time (something that rarely if ever happens in testing)
transaction 1 locks the bank account
transaction 2 locks the a/r account.
transactions 1 tries to lock receivables and blocks waiting on transaction 2.
transaction 2 tries to lock the bank and a deadlock is automatically and instantly detected.
one of the transactions is elected as a victim of a deadlock and is rolled back. The other transaction proceeds as if nothing happened.
Deadlocks are a reality and the way to respond to them is quite straight forward. "please hang up and try your call again."
See MSDN for more information on Handling Deadlocks with SQL Server