How are the connections handled in a single transaction when we open close connections for each statement in C#?
The scenario is with the same connection string and the connection is opened and closed multiple types, once for every statement.
Consider the following example
void updateSomething() {
using (SqlConnection connection = new SqlConnection(
"Integrated Security=SSPI;Initial Catalog=Northwind"))
{
connection.Open();
// Execute the statements
connection.Close();
}
}
When I'm executing the following code:
void SomeMethod()
{
using(TransactionScope scope = new TransactionScope())
{
for(int i=0; i < 10; i++)
{
this.updateSomething();
}
scope.Complete();
}
}
The recommendation is to use a connection Open/Close for each statement. That is because we are not actually creating connections, we are just using one from the pool.
Why is this the case? I get it that we hold the connection for as little time as we can, but the thing is that in most transactions, we are going to get it in the next moment during the next statement.
Is it only to avoid the demanding code computation time in between the statements if some such exists (which it shouldn't as it would lock the database in the transaction state for much longer that needed).
Wouldn't it make sense to keep one connection open for the duration of the transaction?
The recommendation is to use a connection Open/Close for each statement.
Without seeing that comment in context, I'm guessing this recommendation is because of what you said: creating and destroying a SqlConnection objects does not mean you are creating and destroying network connections.
I think the motive behind "use one for each statement" is to just not worry about trying to be efficient about when you create SqlConnection objects. Don't go out of your way to keep one alive and pass it around all throughout your code thinking you are avoiding tearing down a network connection. There's just no point.
In your example, it won't really make a difference. You can use the same SqlConnection object for each query if you'd like, as long as you are making them sequentially and not in parallel. It's probably even slightly more efficient since you save the computational time of creating an SqlConnection object. But that saved time will likely not even be noticeable.
Related
I have many classes all associated to a specific table. On a higher level i have parent transactions that call various methods so the child methods must be able to work inside a transient transaction and often it is not known beforehand if a method at some stage will be part of a transaction or not since we try to keep them generic.
Then in some cases we need Dapper queries e.g. to turn identity on / off . I understood that Dapper requires passing a Transaction as parameter otherwise it will not be enlisted in the transaction (turns out i was wrong see below).
The DbContext(Pooling) is set per "component/dll" so since a connection is only enlisted when it its opened inside a transaction a scope is used of context to ensure it is opened for this transaction. Furthmore that helps when calling these same methods from e.g. HealthChecks who otherwise will complain about too many open connections when many of them call the same connections opened by services. Have this scope in methods helps also with calling these methods in parallel work so that they are more nicely run in parallel threads.
In other words in this way these methods can be called from these parents which can be parallel job parents or singletons requiring a service scope or parent transactions that require a transients hierarchy.
The problem was: For some reason transaction in the following transaction is always null.
try {
using TransactionScope scope = new TransactionScope(TransactionScopeOption.Required,
System.TimeSpan.FromMinutes(10), TransactionScopeAsyncFlowOption.Enabled);
using Context localcontext = new Context(new DbContextOptionsBuilder<Context>()
.UseSqlServer(_options.ConnectionString).Options);
// just for safety:
localcontext.Database.GetDbConnection().Open();
// the following line is only for dapper input:
IDbContextTransaction transaction = localcontext.Database.CurrentTransaction;
await localcontext.Database.GetDbConnection()
.ExecuteAsync("SET IDENTITY_INSERT [dbo].[Whatever] ON",
null, (System.Data.IDbTransaction)transaction);
}
(which i took from here: Pass current transaction to DbCommand and here https://github.com/zzzprojects/Dapper.Transaction )
UPDATE / SOLUTION:
Ok. So... when using transaction scope the transaction parameter does not have to be passed to Dapper to ensure it enlists in the transaction. That was the clue.
Remove this line
IDbContextTransaction transaction = localcontext.Database.CurrentTransaction;
If there's an active TrasnactionScope your SqlConnection will be automatically enlisted in it. The whole point of TransactionScope is that your data access methods can be completely free of transaction handling. Then in some outer business layer or controller method, the transaction is orchestrated.
The reason CurrentTransaction is null is that there are two different ways to handle transactions. If you want the current System.Transactions.Transaction, you get it with System.Transactions.Transaction.Current.
Stepping back, there are 3 separate ways to manage transactions with SqlConnection.
TSQL Transactions: You can use TSQL API directly issuing BEGIN TRAN, COMMIT TRAN, etc.
ADO.NET Transactions: SqlConnection.BeginTrasaction, IDbTransaction , SqlTransaction, etc. This is a wrapper over the TSQL API, and is a PITA because it introduces a useless requirement to pass the SqlTransaction to each SqlCommand that you want to enlist in the Transaction. But enlisting TSQL commands in the current transaction is not optional, and never has been. And that's a pain because methods that user SqlCommand may not know whether there is a transaction. Dapper and EF both wrap this API in their transaction handling methods.
System.Transactions Transactions: Partly because of this System.Transactions was introduced in .NET 2.0 as a new and unified way to handle transactions in .NET, and SqlClient added support for it. The main innovation of System.Transactions was adding "ambient" transactions. So code could be agnositc about whether there's a transaction and the right thing will just happen. When opening a SqlConnection if there is a current Transaction, the SqlConnection will be enlisted in it, and the changes made using the SqlConnection will not be committed until the Transaction is committed. And there is no need for your ADO.NET code to know about the Transaction. Dapper and EF are both built on top of ADO.NET and SqlClient, so this all just works.
It's easier to explain what's wrong by showing what the code should be:
using(var connection=new SqlConnection(_connectionString))
{
await connection.ExecuteAsync("SET IDENTITY_INSERT [dbo].[Whatever] ON");
}
Where ExecuteAsync comes from Dapper.
There's no reason to create a transaction, much less a transaction scope, to execute a single command.
There's no reason to create a DbContext just to open a connection to the database either, or to execute raw SQL commands. DbContext isn't a database connection, it's job is to Map Objects to Relational data. There are no objects involved here.
To execute multiple commands there's no reason to use multiple connections. Just execute the commands one after the other. If it's really necessary, use an explicit database transaction around those commands. Or create the connection inside a single transaction scope.
Let's say you have an array with those commands, eg something read from a script file :
string[] commands=new[]{...};
using(var connection=new SqlConnection(_connectionString))
{
await connection.OpenAsync();
using (var transaction = connection.BeginTransaction())
{
foreach(var sql in commands)
{
await connection.ExecuteAsync(sql,transaction:transaction);
}
transaction.Commit();
}
}
Doing the same thing using a TransactionScope only requires opening the connection inside the transaction scope.
string[] commands=new[]{...};
using( var scope = new TransactionScope(TransactionScopeOption.Required,
System.TimeSpan.FromMinutes(10), TransactionScopeAsyncFlowOption.Enabled)
using(var connection=new SqlConnection(_connectionString))
{
await connection.OpenAsync();
foreach(var sql in commands)
{
await connection.ExecuteAsync(sql);
}
scope.Complete();
}
When implementing the repository pattern using Dapper ORM I am currently doing the following:
private readonly ConnectionStrings _connectionStrings;
private IDbConnection _db;
public CustomerRepository(IOptions<ConnectionStrings> connectionStrings)
{
_connectionStrings = connectionStrings.Value;
_db = new SqlConnection(_connectionStrings.DefaultConnection);
}
public Customer Find(int id)
{
return this._db.Query<Customer>("SELECT * FROM Contacts WHERE Id = #Id", new { id }).SingleOrDefault();
}
Can someone please tell me if I should be doing it this way or if I should be using a using statement with a new SqlConnection in every single repository function.
I am assuming my above code will need something like UnitOfWork to be effective right? And also some way of disposing the connection when done running all of the repository functions needed.
The recommended approach is to use using statements. User paulwhit explained their usage great in this answer:
The reason for the "using" statement is to ensure that the object is disposed as soon as it goes out of scope, and it doesn't require explicit code to ensure that this happens.
The essential difference between having using statements in your methods and having the connection be a class member is that the using statement makes sure that once you're done with your operations, and have exited the block, your connection is closed and disposed of properly. This removes any possibility of error on the part of the developer, and generally makes everything neater.
An important additional benefit of the using statement in this situation is that it ensures the connection is disposed of, even if there is an exception (though it is worth noting that this isn't strictly the only way to achieve this). According to the documentation:
The using statement ensures that Dispose is called even if an exception occurs while you are calling methods on the object.
If you were to have the connection be a class member, then an unhandled exception in the middle of a method that caused your program to exit early would possibly leave the connection open. This is of course not a good thing.
So to sum up, unless you have a very good reason not to, go with the using statement.
In general when a type implements IDisposable (and hence works with using) it can sometimes be useful to wrap it in another type, having that other type also implement IDisposable and have its Dispose() call the wrapped object, and then use using (or another mechanism to call Dispose()) on it.
The question is whether this is one of those sometimes.
It's not. In particular note that SqlConnection implements pooling behind the scenes, so (unless you explicitly opt-out in your connection string) instead of Dispose() shutting down the entire connection to the server what actually happens is that an object internal to the assembly of SqlConnection that handles the details of the connection is put into a pool to use again the next time an SqlConnection with the same connection string is opened.
This means that your application will get as efficient use of as few connections as possible over many uses of the SqlConnection class. But you are stymying that by keeping the connections out of the pool by not returning to the pool as promptly as possible.
If I am calling some data access methods from multiple threads, do I need to lock the code around the DB calls to ensure consistency, or are the using statements below atomic?
public static DataRow GetData(Int32 id)
{
using (SqlConnection con = new SqlConnection(connectionString);)
{
con.Open();
SqlCommand cmd = ...
cmd.CommandType = CommandType.StoredProcedure;
cmd.Parameters.Add(...)
cmd.Parameters.Add(...)
DataTable dt = new DataTable();
return new SqlDataAdapter(cmd).FillWithRetry(dt, sqlGetEmail.CommandText);
}
}
I don't think one thread can affect the connection object defined and 'in use' from another.
Using statements have nothing to do with thread-safety (or lack of).
They merely ensure that Dispose method of the used object is called when the block ends; but are otherwise equivalent to a manual try..finally Dispose.
In this particular case: since a new connection is opened on each thread then it is 'thread safe'. It still might not be atomic wrt. the database or other shared state.
I don't think one thread can affect the connection object defined and 'in use' from another.
So I suppose you are worrying your connection's concurrent access. The using statement is not a lock. If you want to use exclusively a connection or any other instance you should use:
lock(myConnection)
{
// your code
}
What the using keyword is for that's described here
However I think there is other misunderstandings here:. In your example your connection is a local variable what is instantiated as many times as the control flow enters to your GetData method (even in different threads). So even the control flow reenters multiple times (and in different threads) to the method, no shared connection instance will be used, each entering to the method creates its own instance.
It would be different the case if the connection instance would be a parameter. Then you should worry about concurrency, and use a lock.
Conclusion: In your sample you do NOT need to worry about your connection instance concurrent access, and you do not need to use any locking semantics.
Interestingly in your sample using of using is indeed correct, because Connection is IDisposable, so you should apply a deterministic dispose guard around its usage with try/finally or better with its shortcut: the using keyword.
I am executing several long-running SQL queries as part of a reporting module. These queries are constructed dynamically at run-time. Depending on the user's input, they may be single or multi-statement, have one or more parameters and operate on one or more database tables - in other words, their form cannot be easily anticipated.
Currently, I am just executing these statements on an ordinary SqlConnection, i.e.
using (SqlConnection cn = new SqlConnection(ConnectionString)) {
cn.Open();
// command 1
// command 2
// ...
// command N
}
Because these queries (really query batches) can take a while to execute, I am concerned about locks on tables holding up reads/writes for other users. It is not a problem if the data for these reports changes during the execution of the batch; the report queries should never take precedence over other operations on those tables, nor should they lock them.
For most long-running/multi-statement operations that involve modifying data, I would use transactions. The difference here is that these report queries are not modifying any data. Would I be correct in wrapping these report queries in an SqlTransaction in order to control their isolation level?
i.e:
using (SqlConnection cn = new SqlConnection(ConnectionString)) {
cn.Open();
using (SqlTransaction tr = cn.BeginTransaction(IsolationLevel.ReadUncommitted)) {
// command 1
// command 2
// ...
// command N
tr.Commit();
}
}
Would this achieve my desired outcome? Is it correct to commit a transaction, even though no data has been modified? Is there another approach?
Another approach might be to issue, against the connection:
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
which achieves the same intent, without messing with a transaction. Or you could use the WITH(NOLOCK) hint on the tables in your query, which has the advantage of not changing the connection at all.
Importantly, note that (unusually): however it gets changed (transaction, transaction-scope, explicit SET, etc), the isolation level is not reset between uses of the same underlying connection when fetching it from the pool. This means that if your code changes the isolation level (directly or indirectly), then none of your code knows what the isolation level of a new connection is:
using(var conn = new SqlConnection(connectionString)) {
conn.Open();
// isolation level here could be **ANYTHING**; it could be the default
// if it is a brand new connection, or could be whatever the last
// connection was when it finished
}
Which makes the WITH(NOLOCK) quite tempting.
I agree with Marc, but alternatively you could use the NOLOCK query hint on the affected tables. This would give you the ability to control it on a table by table level.
The problem with running any queries without taking shared locks is that you leave yourself open to "non-deterministic" results, and business decisions should not be made on this data.
A better approach may be to investigate either SNAPSHOT or READ_COMMITED_SNAPSHOT isolation levels. These give you protection against transactional anommolies without taking locks. The trade off is that they increase IO against TempDB. Either of these levels can be applied either to the session as Marc suggested or the table as I suggested.
Hope this helps
A while back I wrote an ORM layer for my .net app where all database rows are represented by a subclass of DatabaseRecord. There are a number of methods like Load(), Save() etc. In my initial implementation I created a connection to the DB in the constructor of DatabaseRecord e.g.
connection = new SqlConnection(
ConfigurationManager.ConnectionStrings["ConnectionName"].ConnectionString
);
I then call Open() and Close() on that SqlConnection at the beginning and end of my methods which access the database. This seemed to me (as someone who was familiar with programming but new to c# and .net) to be the most efficient way to do things - have one connection and open/ close it where necessary within the class.
I've just been doing some reading though and it appears that this pattern is recommended in a number of places:
using (var connection = new SqlConnection(...)) {
connection.Open();
// Stuff with the connection
connection.Close();
}
I can see why it's desirable - the connection is automatically Dispose()d even if the stuff you do in the middle causes an uncaught exception. I was just wondering what the overhead is for calling new SqlConnection() potentially many times like this.
Connection Pooling is on so I imagine the overhead is minimal and the second approach should be best practice but I just wanted to make sure my assumptions are right.
Yes, it is best practice. The using makes your call to Close() exception-safe.
And the overhead of creating a (any) object is indeed minimal, and smallest for short-lived objects (that stay in GC generation 0).
Note that you don't have to call Close() at the end of the using-block anymore, it is automatically done for you (Dispose==Close).
This is partially a matter of taste. As long as you employ connection pooling the overhead of creating a new (recycling a pooled connection) will be minimal, so generally the recommended pattern is to create new connection objects as needed.
If you run several commands immediately after each other then I see no reason to create new connections for each of them, but you should avoid holding on to open connections for a long time.
Also, you should note that the Dispose method will close the connection for you. So there is no need to call both Close and Dispose. Since the using clause will call dispose when it ends there is normally no need to call Close.
If you're unsure about the cost of opening/closing connection, have the SqlConnection a member variable of your class, but make the class IDisposable and dispose of the SqlConnection when the class is disposed