I'm trying to understand how EF Core transaction with a lot of inner commands use memory.
Let's say I have code like this:
using var reader = new MyReader(myStream);
using var context = new BloggingContext();
using var transaction = context.Database.BeginTransaction();
try
{
while (!reader.EndOfStream())
{
var myObj = reader.ReadNextObject();
context.Database.ExecuteSqlRaw("INSERT INTO [MyTable] ([Col1], [Col2]) VALUES ({0}, {1})",
myObj.prop1, myObj.prop2);
}
transaction.Commit();
}
catch (Exception)
{
// Exception handling
}
MyReader is reading a very large (millions of rows) collection of records streamed from some source.
My questions are:
Can objects referenced by myObj variable be garbage collected before we commit the transaction, or are we effectively loading all of them into memory?
Are all SQL commands we set to execute stored in memory until we commit the transaction, or are they sent to the database immediately?
Am I understanding correctly, that this would lock [MyTable] until we commit the transaction?
Can objects references by myObj variable be garbage collected before we commit the transaction
Yes. Once ExecuteSqlRaw completes, EF will not hold any references to the object that myObj refers to, and it can be collected.
Are all SQL commands we set to execute stored in memory until we commit the transaction, or are they sent to the database immediately?
They are sent to the database immediately by ExecuteSqlRaw, the row is inserted and transaction log records written to the in-memory log buffer. The transaction just prevents the statement from waiting until the log records are hardened to disk, and allows you to roll back the whole transaction. And even the log records are not kept in memory on the server; they will be hardened to the log files asynchronously, and read back from there in the case of a rollback.
Am I understanding correctly, that this would lock [MyTable] until we commit the transaction?
SQL Server will lock only the inserted rows, unless you inserted so many that it triggered lock escalation.
Related
I have many classes all associated to a specific table. On a higher level i have parent transactions that call various methods so the child methods must be able to work inside a transient transaction and often it is not known beforehand if a method at some stage will be part of a transaction or not since we try to keep them generic.
Then in some cases we need Dapper queries e.g. to turn identity on / off . I understood that Dapper requires passing a Transaction as parameter otherwise it will not be enlisted in the transaction (turns out i was wrong see below).
The DbContext(Pooling) is set per "component/dll" so since a connection is only enlisted when it its opened inside a transaction a scope is used of context to ensure it is opened for this transaction. Furthmore that helps when calling these same methods from e.g. HealthChecks who otherwise will complain about too many open connections when many of them call the same connections opened by services. Have this scope in methods helps also with calling these methods in parallel work so that they are more nicely run in parallel threads.
In other words in this way these methods can be called from these parents which can be parallel job parents or singletons requiring a service scope or parent transactions that require a transients hierarchy.
The problem was: For some reason transaction in the following transaction is always null.
try {
using TransactionScope scope = new TransactionScope(TransactionScopeOption.Required,
System.TimeSpan.FromMinutes(10), TransactionScopeAsyncFlowOption.Enabled);
using Context localcontext = new Context(new DbContextOptionsBuilder<Context>()
.UseSqlServer(_options.ConnectionString).Options);
// just for safety:
localcontext.Database.GetDbConnection().Open();
// the following line is only for dapper input:
IDbContextTransaction transaction = localcontext.Database.CurrentTransaction;
await localcontext.Database.GetDbConnection()
.ExecuteAsync("SET IDENTITY_INSERT [dbo].[Whatever] ON",
null, (System.Data.IDbTransaction)transaction);
}
(which i took from here: Pass current transaction to DbCommand and here https://github.com/zzzprojects/Dapper.Transaction )
UPDATE / SOLUTION:
Ok. So... when using transaction scope the transaction parameter does not have to be passed to Dapper to ensure it enlists in the transaction. That was the clue.
Remove this line
IDbContextTransaction transaction = localcontext.Database.CurrentTransaction;
If there's an active TrasnactionScope your SqlConnection will be automatically enlisted in it. The whole point of TransactionScope is that your data access methods can be completely free of transaction handling. Then in some outer business layer or controller method, the transaction is orchestrated.
The reason CurrentTransaction is null is that there are two different ways to handle transactions. If you want the current System.Transactions.Transaction, you get it with System.Transactions.Transaction.Current.
Stepping back, there are 3 separate ways to manage transactions with SqlConnection.
TSQL Transactions: You can use TSQL API directly issuing BEGIN TRAN, COMMIT TRAN, etc.
ADO.NET Transactions: SqlConnection.BeginTrasaction, IDbTransaction , SqlTransaction, etc. This is a wrapper over the TSQL API, and is a PITA because it introduces a useless requirement to pass the SqlTransaction to each SqlCommand that you want to enlist in the Transaction. But enlisting TSQL commands in the current transaction is not optional, and never has been. And that's a pain because methods that user SqlCommand may not know whether there is a transaction. Dapper and EF both wrap this API in their transaction handling methods.
System.Transactions Transactions: Partly because of this System.Transactions was introduced in .NET 2.0 as a new and unified way to handle transactions in .NET, and SqlClient added support for it. The main innovation of System.Transactions was adding "ambient" transactions. So code could be agnositc about whether there's a transaction and the right thing will just happen. When opening a SqlConnection if there is a current Transaction, the SqlConnection will be enlisted in it, and the changes made using the SqlConnection will not be committed until the Transaction is committed. And there is no need for your ADO.NET code to know about the Transaction. Dapper and EF are both built on top of ADO.NET and SqlClient, so this all just works.
It's easier to explain what's wrong by showing what the code should be:
using(var connection=new SqlConnection(_connectionString))
{
await connection.ExecuteAsync("SET IDENTITY_INSERT [dbo].[Whatever] ON");
}
Where ExecuteAsync comes from Dapper.
There's no reason to create a transaction, much less a transaction scope, to execute a single command.
There's no reason to create a DbContext just to open a connection to the database either, or to execute raw SQL commands. DbContext isn't a database connection, it's job is to Map Objects to Relational data. There are no objects involved here.
To execute multiple commands there's no reason to use multiple connections. Just execute the commands one after the other. If it's really necessary, use an explicit database transaction around those commands. Or create the connection inside a single transaction scope.
Let's say you have an array with those commands, eg something read from a script file :
string[] commands=new[]{...};
using(var connection=new SqlConnection(_connectionString))
{
await connection.OpenAsync();
using (var transaction = connection.BeginTransaction())
{
foreach(var sql in commands)
{
await connection.ExecuteAsync(sql,transaction:transaction);
}
transaction.Commit();
}
}
Doing the same thing using a TransactionScope only requires opening the connection inside the transaction scope.
string[] commands=new[]{...};
using( var scope = new TransactionScope(TransactionScopeOption.Required,
System.TimeSpan.FromMinutes(10), TransactionScopeAsyncFlowOption.Enabled)
using(var connection=new SqlConnection(_connectionString))
{
await connection.OpenAsync();
foreach(var sql in commands)
{
await connection.ExecuteAsync(sql);
}
scope.Complete();
}
I want to know what are the practical differences of executing a transaction in the same database context between these 3 ways:
1) Multiple operations with one single SaveChanges(), without explicitly using a sql transaction
using (TestDbContext db = new TestDbContext())
{
// first operation
// second operation
db.SaveChanges();
}
2) Multiple operations with one single SaveChanges(), using a sql transaction
using (TestDbContext db = new TestDbContext())
using (DbContextTransaction trans = db.Database.BeginTransaction())
{
// operation 1
// operation 2
db.SaveChanges();
trans.commit();
}
3) Multiple operations with multiple SaveChanges(), using a sql transaction
using (TestDbContext db = new TestDbContext())
using (DbContextTransaction trans = db.BeginTransaction())
{
// operation 1
db.SaveChanges();
// operation 2
db.SaveChanges();
trans.commit();
}
In (2) and (3), if commit() is supposed to actually execute requested sql queries to database, is it really different, say, save changes for each operation or save changes for all operation at once?
And if (1) can also allow multiple operations to be safely executed in the same database context so what's the main use of manually starting a transaction? I'd say we can manually provide try/catch block to roll back the transaction if something bad happens, but AFAIK, SaveChanges() also covers it, automatically, at least with SQLServer.
** UPDATED: Another thing is: Should I make db context and transaction variables class-level or these should be local to containing methods only?
If you do not start a transaction, it is implicit. Meaning, all SaveChanges() you perform will be available in the database immediately after the call.
If you start a transaction, SaveChanges() still performs the updates, but the data is not available to other connections until a commit is called.
You can test this yourself by setting break points, creating new objects, adding them to the context, and performing a SaveChanges(). You will see the ID property will have a value after that call, but there will be no corresponding row in the database until you perform a commit on the transaction.
As far as your second question goes, it really depends on concurrency needs, what your class is doing and how much data you're working with. It's not so much a scoping issue as it is a code execution issue.
Contexts are not thread safe, so as long as you only have one thread in your application access the context, you can make it at a broader scope. But then, if other instances of the application are accessing the data, you're going to have to make sure you refresh the data to the latest model. You also should consider that the more of the model you have loaded into memory, the slower saves are going to be over time.
I tend to create my contexts as close to the operations that are to be performed as possible, and dispose them soon after.
Your question doesn't really seem to be about entity framework at all, and is more regarding sql transactions. A sql transaction is a single 'atomic' change. That is to say that either all the changes are committed, or none are committed.
You don't really have an example which covers the scenario, but if you added another example like:
using (TestDbContext db = new TestDbContext())
{
// operation 1
db.SaveChanges();
// operation 2
db.SaveChanges();
}
...in this example, if your first operation saved successfully, but the second operation failed, you could have a situation where data committed at the first step is potentially invalid.
That's why you would use a sql transaction, to wrap both SaveChanges into a single operation that means either all data is committed, or none is committed.
I'm using Nhibernate for the first time and I 've noticed that when I call BeginTransaction method it lock my Database.
Instead, entity framework (ObjectContext or DbContext too) keeps all changes in memory and SaveChange method work perfectly if no error occurs without lock anything on db.
Has Nhibernate some feautres like EF?
If you are using optimistic concurrency, then you could do something like this:
MyEntity myEntity;
using(var scope = new TransactionScope(TransactionScopeOption.Suppress))
using(var session = sessionFactory.OpenSession())
{
myEntity = session.Get<MyEntity>(id);
scope.Complete();
}
// No longer in a transaction...
myEntity.Add(something);
myEntity.Update(somethingElse);
// Later, possibly in another request...
using(var scope = new TransactionScope(TransactionScopeOption.Required))
using(var session = sessionFactory.OpenSession())
{
session.Update(myEntity);
scope.Complete();
}
As long as a transaction is open (depdending on your isolation level as noted above) you will likely have shared locks on the tables and keys involved in the initial selects, which will block updates to those tables until the transaction completes. If you want to avoid having those locks, you can suppress the transaction for the read, perform modifications, and then attempt to update the object later. The version number on the entity should protect you from lost updates.
Note that you don't have to suppress the read transaction. If you want to block until all writes are committed, you can still require a transaction around the read as long as it is separate from the update transaction and is completed as quickly as possible.
I am executing several long-running SQL queries as part of a reporting module. These queries are constructed dynamically at run-time. Depending on the user's input, they may be single or multi-statement, have one or more parameters and operate on one or more database tables - in other words, their form cannot be easily anticipated.
Currently, I am just executing these statements on an ordinary SqlConnection, i.e.
using (SqlConnection cn = new SqlConnection(ConnectionString)) {
cn.Open();
// command 1
// command 2
// ...
// command N
}
Because these queries (really query batches) can take a while to execute, I am concerned about locks on tables holding up reads/writes for other users. It is not a problem if the data for these reports changes during the execution of the batch; the report queries should never take precedence over other operations on those tables, nor should they lock them.
For most long-running/multi-statement operations that involve modifying data, I would use transactions. The difference here is that these report queries are not modifying any data. Would I be correct in wrapping these report queries in an SqlTransaction in order to control their isolation level?
i.e:
using (SqlConnection cn = new SqlConnection(ConnectionString)) {
cn.Open();
using (SqlTransaction tr = cn.BeginTransaction(IsolationLevel.ReadUncommitted)) {
// command 1
// command 2
// ...
// command N
tr.Commit();
}
}
Would this achieve my desired outcome? Is it correct to commit a transaction, even though no data has been modified? Is there another approach?
Another approach might be to issue, against the connection:
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
which achieves the same intent, without messing with a transaction. Or you could use the WITH(NOLOCK) hint on the tables in your query, which has the advantage of not changing the connection at all.
Importantly, note that (unusually): however it gets changed (transaction, transaction-scope, explicit SET, etc), the isolation level is not reset between uses of the same underlying connection when fetching it from the pool. This means that if your code changes the isolation level (directly or indirectly), then none of your code knows what the isolation level of a new connection is:
using(var conn = new SqlConnection(connectionString)) {
conn.Open();
// isolation level here could be **ANYTHING**; it could be the default
// if it is a brand new connection, or could be whatever the last
// connection was when it finished
}
Which makes the WITH(NOLOCK) quite tempting.
I agree with Marc, but alternatively you could use the NOLOCK query hint on the affected tables. This would give you the ability to control it on a table by table level.
The problem with running any queries without taking shared locks is that you leave yourself open to "non-deterministic" results, and business decisions should not be made on this data.
A better approach may be to investigate either SNAPSHOT or READ_COMMITED_SNAPSHOT isolation levels. These give you protection against transactional anommolies without taking locks. The trade off is that they increase IO against TempDB. Either of these levels can be applied either to the session as Marc suggested or the table as I suggested.
Hope this helps
I'm using TransactionScope to manage transactions in EF, i need a ReadCommited behavior but it doesn't works as expected :
using (var trans = new TransactionScope(TransactionScopeOption.Required,
new TransactionOptions()
{ IsolationLevel = IsolationLevel.ReadCommitted}))
{
var c1 = customerRepository.Get(1);
c1.FirstName = "Modified";
customerRepository.Save();
var c2 = customerRepository.Get(1);
Assert.AreNotEqual("Modified", c2.FirstName);
trans.Complete();
}
while i still didn't committed the transaction when getting the second instance, it's FirstName is already modified.
You're inside the same transaction. The transaction isolation level refers to different transactions.
You can't isolate a translation from itself, but of other different transactions.
Try opening two different transaction scopes (i.e. with two apps running at the same time) and you'll see the effect os isolation between them. You can do this debugging two different apps at the same time, and pausing them before commiting the scope.
Look SET TRANSACTION ISOLATION LEVEL (Transact-SQL)
As you can see, when each transaction isolation level is explained, it always refers to other transactions:
READ UNCOMMITTED
Specifies that statements can read rows that have been modified by other transactions but not yet committed.
READ COMMITTED
Specifies that statements cannot read data that has been modified but not committed by other transactions.
REPEATABLE READ
Specifies that statements cannot read data that has been modified but not yet committed by other transactions and ...
and so on.