I am executing several long-running SQL queries as part of a reporting module. These queries are constructed dynamically at run-time. Depending on the user's input, they may be single or multi-statement, have one or more parameters and operate on one or more database tables - in other words, their form cannot be easily anticipated.
Currently, I am just executing these statements on an ordinary SqlConnection, i.e.
using (SqlConnection cn = new SqlConnection(ConnectionString)) {
cn.Open();
// command 1
// command 2
// ...
// command N
}
Because these queries (really query batches) can take a while to execute, I am concerned about locks on tables holding up reads/writes for other users. It is not a problem if the data for these reports changes during the execution of the batch; the report queries should never take precedence over other operations on those tables, nor should they lock them.
For most long-running/multi-statement operations that involve modifying data, I would use transactions. The difference here is that these report queries are not modifying any data. Would I be correct in wrapping these report queries in an SqlTransaction in order to control their isolation level?
i.e:
using (SqlConnection cn = new SqlConnection(ConnectionString)) {
cn.Open();
using (SqlTransaction tr = cn.BeginTransaction(IsolationLevel.ReadUncommitted)) {
// command 1
// command 2
// ...
// command N
tr.Commit();
}
}
Would this achieve my desired outcome? Is it correct to commit a transaction, even though no data has been modified? Is there another approach?
Another approach might be to issue, against the connection:
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
which achieves the same intent, without messing with a transaction. Or you could use the WITH(NOLOCK) hint on the tables in your query, which has the advantage of not changing the connection at all.
Importantly, note that (unusually): however it gets changed (transaction, transaction-scope, explicit SET, etc), the isolation level is not reset between uses of the same underlying connection when fetching it from the pool. This means that if your code changes the isolation level (directly or indirectly), then none of your code knows what the isolation level of a new connection is:
using(var conn = new SqlConnection(connectionString)) {
conn.Open();
// isolation level here could be **ANYTHING**; it could be the default
// if it is a brand new connection, or could be whatever the last
// connection was when it finished
}
Which makes the WITH(NOLOCK) quite tempting.
I agree with Marc, but alternatively you could use the NOLOCK query hint on the affected tables. This would give you the ability to control it on a table by table level.
The problem with running any queries without taking shared locks is that you leave yourself open to "non-deterministic" results, and business decisions should not be made on this data.
A better approach may be to investigate either SNAPSHOT or READ_COMMITED_SNAPSHOT isolation levels. These give you protection against transactional anommolies without taking locks. The trade off is that they increase IO against TempDB. Either of these levels can be applied either to the session as Marc suggested or the table as I suggested.
Hope this helps
Related
I have many classes all associated to a specific table. On a higher level i have parent transactions that call various methods so the child methods must be able to work inside a transient transaction and often it is not known beforehand if a method at some stage will be part of a transaction or not since we try to keep them generic.
Then in some cases we need Dapper queries e.g. to turn identity on / off . I understood that Dapper requires passing a Transaction as parameter otherwise it will not be enlisted in the transaction (turns out i was wrong see below).
The DbContext(Pooling) is set per "component/dll" so since a connection is only enlisted when it its opened inside a transaction a scope is used of context to ensure it is opened for this transaction. Furthmore that helps when calling these same methods from e.g. HealthChecks who otherwise will complain about too many open connections when many of them call the same connections opened by services. Have this scope in methods helps also with calling these methods in parallel work so that they are more nicely run in parallel threads.
In other words in this way these methods can be called from these parents which can be parallel job parents or singletons requiring a service scope or parent transactions that require a transients hierarchy.
The problem was: For some reason transaction in the following transaction is always null.
try {
using TransactionScope scope = new TransactionScope(TransactionScopeOption.Required,
System.TimeSpan.FromMinutes(10), TransactionScopeAsyncFlowOption.Enabled);
using Context localcontext = new Context(new DbContextOptionsBuilder<Context>()
.UseSqlServer(_options.ConnectionString).Options);
// just for safety:
localcontext.Database.GetDbConnection().Open();
// the following line is only for dapper input:
IDbContextTransaction transaction = localcontext.Database.CurrentTransaction;
await localcontext.Database.GetDbConnection()
.ExecuteAsync("SET IDENTITY_INSERT [dbo].[Whatever] ON",
null, (System.Data.IDbTransaction)transaction);
}
(which i took from here: Pass current transaction to DbCommand and here https://github.com/zzzprojects/Dapper.Transaction )
UPDATE / SOLUTION:
Ok. So... when using transaction scope the transaction parameter does not have to be passed to Dapper to ensure it enlists in the transaction. That was the clue.
Remove this line
IDbContextTransaction transaction = localcontext.Database.CurrentTransaction;
If there's an active TrasnactionScope your SqlConnection will be automatically enlisted in it. The whole point of TransactionScope is that your data access methods can be completely free of transaction handling. Then in some outer business layer or controller method, the transaction is orchestrated.
The reason CurrentTransaction is null is that there are two different ways to handle transactions. If you want the current System.Transactions.Transaction, you get it with System.Transactions.Transaction.Current.
Stepping back, there are 3 separate ways to manage transactions with SqlConnection.
TSQL Transactions: You can use TSQL API directly issuing BEGIN TRAN, COMMIT TRAN, etc.
ADO.NET Transactions: SqlConnection.BeginTrasaction, IDbTransaction , SqlTransaction, etc. This is a wrapper over the TSQL API, and is a PITA because it introduces a useless requirement to pass the SqlTransaction to each SqlCommand that you want to enlist in the Transaction. But enlisting TSQL commands in the current transaction is not optional, and never has been. And that's a pain because methods that user SqlCommand may not know whether there is a transaction. Dapper and EF both wrap this API in their transaction handling methods.
System.Transactions Transactions: Partly because of this System.Transactions was introduced in .NET 2.0 as a new and unified way to handle transactions in .NET, and SqlClient added support for it. The main innovation of System.Transactions was adding "ambient" transactions. So code could be agnositc about whether there's a transaction and the right thing will just happen. When opening a SqlConnection if there is a current Transaction, the SqlConnection will be enlisted in it, and the changes made using the SqlConnection will not be committed until the Transaction is committed. And there is no need for your ADO.NET code to know about the Transaction. Dapper and EF are both built on top of ADO.NET and SqlClient, so this all just works.
It's easier to explain what's wrong by showing what the code should be:
using(var connection=new SqlConnection(_connectionString))
{
await connection.ExecuteAsync("SET IDENTITY_INSERT [dbo].[Whatever] ON");
}
Where ExecuteAsync comes from Dapper.
There's no reason to create a transaction, much less a transaction scope, to execute a single command.
There's no reason to create a DbContext just to open a connection to the database either, or to execute raw SQL commands. DbContext isn't a database connection, it's job is to Map Objects to Relational data. There are no objects involved here.
To execute multiple commands there's no reason to use multiple connections. Just execute the commands one after the other. If it's really necessary, use an explicit database transaction around those commands. Or create the connection inside a single transaction scope.
Let's say you have an array with those commands, eg something read from a script file :
string[] commands=new[]{...};
using(var connection=new SqlConnection(_connectionString))
{
await connection.OpenAsync();
using (var transaction = connection.BeginTransaction())
{
foreach(var sql in commands)
{
await connection.ExecuteAsync(sql,transaction:transaction);
}
transaction.Commit();
}
}
Doing the same thing using a TransactionScope only requires opening the connection inside the transaction scope.
string[] commands=new[]{...};
using( var scope = new TransactionScope(TransactionScopeOption.Required,
System.TimeSpan.FromMinutes(10), TransactionScopeAsyncFlowOption.Enabled)
using(var connection=new SqlConnection(_connectionString))
{
await connection.OpenAsync();
foreach(var sql in commands)
{
await connection.ExecuteAsync(sql);
}
scope.Complete();
}
I am calling few stored procedures (functions in postgresql) using ExecuteNonQuery inside C# transaction,My SPs looks like
CREATE OR REPLACE FUNCTION public.setuserstatus(
par_userid integer,
par_status character varying)
RETURNS void
LANGUAGE 'plpgsql'
COST 100
VOLATILE PARALLEL UNSAFE
AS $BODY$
BEGIN
UPDATE public.user
SET status = par_status
WHERE userid = par_userid;
END;
$BODY$;
So to call this function what should be the best practice? is below code enough or I should use this inside transaction and use commit and rollback? please suggest?
using (var conn = new NpgsqlConnection(_connectionString))
{
await conn.OpenAsync(ct);
using (var cmd = new NpgsqlCommand("irnutil.setcrdsstatusforapmaccount", conn))
{
cmd.CommandType = CommandType.StoredProcedure;
cmd.Parameters.AddWithValue(ApmAccountIdParameterName, accountId);
cmd.Parameters.AddWithValue(CrdsStatusParameterName, crdsStatus.ToString());
await cmd.ExecuteNonQueryAsync(ct);
}
}
Also, the Npgsql documentations says "However, for maximum portability it's recommended to set the transaction on your commands.", is that mean we should use transaction here?
https://www.npgsql.org/doc/transactions.html
It depends on what you want to achieve.
The general idea of transaction is - A unit of work that is performed against a Database.
However, you will not want to use a transaction over select queries.
for Example:
select * from some table where condition = {condition}
Will not be a candidate for a proper transaction block.
Your example of update, is a good candidate.
Now comes the question, whether I should use it at code level or Database level?
The answer is again a question - What do you want to achieve?
If you perform it Database level, you will lose your ability to log errors. Yes, the transaction, if executed correctly to satisfy
your conditions will rollback the operation but you will have no
further details, meaning that you will have to use a profiler to
determine the cause for failure. On the other hand, Your transaction
will be safe and secured from the 'outside world' - If the chance
happen and someone changes your code or an harmful attack is taking
place, they will not be able to change the transaction and the data
will be safer.
If you perform it code level, You will be able to log the exception and review the cause for failure, it will be easier to
handle programatically however - It will be exposed to code changes
and possible malware attacks as I mentioned above.
A deadlock situation may occur during the execution of a transaction, meaning that if you execute a block of sql wrapped in transaction, you cannot execute it again until the transaction finished it's job or, you will encounter a dead lock and throw an exception. In that case it is recommended to create a queue to monitor your calls to the transaction and check whether you are clear to make another execution or not.
Again - this is all your decision according to your needs.
How are the connections handled in a single transaction when we open close connections for each statement in C#?
The scenario is with the same connection string and the connection is opened and closed multiple types, once for every statement.
Consider the following example
void updateSomething() {
using (SqlConnection connection = new SqlConnection(
"Integrated Security=SSPI;Initial Catalog=Northwind"))
{
connection.Open();
// Execute the statements
connection.Close();
}
}
When I'm executing the following code:
void SomeMethod()
{
using(TransactionScope scope = new TransactionScope())
{
for(int i=0; i < 10; i++)
{
this.updateSomething();
}
scope.Complete();
}
}
The recommendation is to use a connection Open/Close for each statement. That is because we are not actually creating connections, we are just using one from the pool.
Why is this the case? I get it that we hold the connection for as little time as we can, but the thing is that in most transactions, we are going to get it in the next moment during the next statement.
Is it only to avoid the demanding code computation time in between the statements if some such exists (which it shouldn't as it would lock the database in the transaction state for much longer that needed).
Wouldn't it make sense to keep one connection open for the duration of the transaction?
The recommendation is to use a connection Open/Close for each statement.
Without seeing that comment in context, I'm guessing this recommendation is because of what you said: creating and destroying a SqlConnection objects does not mean you are creating and destroying network connections.
I think the motive behind "use one for each statement" is to just not worry about trying to be efficient about when you create SqlConnection objects. Don't go out of your way to keep one alive and pass it around all throughout your code thinking you are avoiding tearing down a network connection. There's just no point.
In your example, it won't really make a difference. You can use the same SqlConnection object for each query if you'd like, as long as you are making them sequentially and not in parallel. It's probably even slightly more efficient since you save the computational time of creating an SqlConnection object. But that saved time will likely not even be noticeable.
I want to know what are the practical differences of executing a transaction in the same database context between these 3 ways:
1) Multiple operations with one single SaveChanges(), without explicitly using a sql transaction
using (TestDbContext db = new TestDbContext())
{
// first operation
// second operation
db.SaveChanges();
}
2) Multiple operations with one single SaveChanges(), using a sql transaction
using (TestDbContext db = new TestDbContext())
using (DbContextTransaction trans = db.Database.BeginTransaction())
{
// operation 1
// operation 2
db.SaveChanges();
trans.commit();
}
3) Multiple operations with multiple SaveChanges(), using a sql transaction
using (TestDbContext db = new TestDbContext())
using (DbContextTransaction trans = db.BeginTransaction())
{
// operation 1
db.SaveChanges();
// operation 2
db.SaveChanges();
trans.commit();
}
In (2) and (3), if commit() is supposed to actually execute requested sql queries to database, is it really different, say, save changes for each operation or save changes for all operation at once?
And if (1) can also allow multiple operations to be safely executed in the same database context so what's the main use of manually starting a transaction? I'd say we can manually provide try/catch block to roll back the transaction if something bad happens, but AFAIK, SaveChanges() also covers it, automatically, at least with SQLServer.
** UPDATED: Another thing is: Should I make db context and transaction variables class-level or these should be local to containing methods only?
If you do not start a transaction, it is implicit. Meaning, all SaveChanges() you perform will be available in the database immediately after the call.
If you start a transaction, SaveChanges() still performs the updates, but the data is not available to other connections until a commit is called.
You can test this yourself by setting break points, creating new objects, adding them to the context, and performing a SaveChanges(). You will see the ID property will have a value after that call, but there will be no corresponding row in the database until you perform a commit on the transaction.
As far as your second question goes, it really depends on concurrency needs, what your class is doing and how much data you're working with. It's not so much a scoping issue as it is a code execution issue.
Contexts are not thread safe, so as long as you only have one thread in your application access the context, you can make it at a broader scope. But then, if other instances of the application are accessing the data, you're going to have to make sure you refresh the data to the latest model. You also should consider that the more of the model you have loaded into memory, the slower saves are going to be over time.
I tend to create my contexts as close to the operations that are to be performed as possible, and dispose them soon after.
Your question doesn't really seem to be about entity framework at all, and is more regarding sql transactions. A sql transaction is a single 'atomic' change. That is to say that either all the changes are committed, or none are committed.
You don't really have an example which covers the scenario, but if you added another example like:
using (TestDbContext db = new TestDbContext())
{
// operation 1
db.SaveChanges();
// operation 2
db.SaveChanges();
}
...in this example, if your first operation saved successfully, but the second operation failed, you could have a situation where data committed at the first step is potentially invalid.
That's why you would use a sql transaction, to wrap both SaveChanges into a single operation that means either all data is committed, or none is committed.
Consider this scenario; suppose I have WPF window which have four objects bonded to its controls
(Schedule, Customer, Contract, ScheduleDetails and Signer specifically ) which represent four database tables in the backend database. I want when the user request save for the information he/she entered to be joined in atom operation in another words all saving operation be in one transaction so either all operations success or all failed.
My question is what is the most efficient way to represent transaction operation in this scenario
The most efficient is to use BeginTransaction etc on your DbConnection, but this isn't convenient, as you must use the same connection, and each DbCommand needs the transaction setting etc.
The simplest is TransactionScope, and if you are on SQL-Server 2005 or above, you will rarely notice a significant performance difference between this and BeginTransaction:
using(var tran = new TransactionScope()) {
SaveA();
SaveB();
SaveC();
SaveD();
tran.Complete();
}
Here it doesn't matter if SaveA etc use the same connection, as SqlConnection will enlist into a TransactionScope automatically.
Alternatively, let an ORM handle this for you; most will create transactions for saving a group of changes.
One thing to watch; TransactionScope relies on services (DTC) that may not be available on all clients (since you mention WPF, which is client-side).