What approach to use when storing/updating data across multiple DbContex'ts?
Performance comparison: and in which case to when use what:
using (var scope = new TransactionScope())
{
using (var context1 = new ItemContext(userOptions))
{
context1.Items.Add(item);
context1.SaveChanges();
}
using (var context2 = new OrderContext(orderOptions))
{
testOrder.ItemId = item.Id;
context2.Orders.Add(testOrder);
context2.SaveChanges();
}
if(testOrder.SunIsShining){
using (var context1 = new ItemContext(itemOptions))
{
item.SunIsShining= true;
context1.Items.Update(item);
context1.SaveChanges();
}
}
scope.Complete();
}
A. The example above using Transaction scope and changing /updating records in between different Context's in the same physical server.
B. Use messages. For example after saving a user to db. Call messaging service that would insert info msg to a queue (item info records).
Then an ItemOrderProcessingService would read that queue say every 10s taking all new records in it.
And in batches would create orders, perhaps if needed update some item statuses too in the item table also in batches.
(have logic to rollback changes if needed)
When to use which approach, what are the benefits/ drawbacks? f.e. If we have ~2k transactions a day which is rather low..
How many transactions per day one should one have to use B. queue approach?
It's completely different approaches. When you use transaction it's much simpler to do a rollback for all contexts. It will take some time in the message-based system until your messages will be handled by all services and also you should handle service fail and check data consistency
Message-based requires more effort from the team to maintain and support microservices handling messages and for a small application, like yours, using monolith application is quite ok.
Related
We have the following processes that can modify the same dataset:
Web site (Asp.NET Web API that modifies some parent/child dataset)
Azure Web Job (C# cod that modifies the same parent/child dataset)
What are the recommended ways to ensure that we keep data integrity when the Parent/Child datasets are modified by the two processes simultaneously (C# Lock statement won't work because it's code running in different processes).
Currently they are using Entity Framework, and the process will load the dataset in memory, work on the data, and then save it. The problem is that the data may change by the other process A after it is initially read by process B.
The data is in a SQL Azure database.
Can I create a blocking transaction on the parent table record (Id = XXXX) and so other processes just have to wait until the lock is released. How best to do that?
Otherwise, perhaps some other ideas off the top of my head might be setting a "Locked" field on the parent record, or checking the MAX RowVersion for the parent UNION with each child table (for the Parent ID) and check this RowVersion before and after each change?
The sp_getapplock stored procedure seems to be exactly what you need.
It is very useful when you want to synchronize processes that need to get exclusive access to a part of the database.
This should be something like below. Where ctx is a DbContext and each of the processes you want to synchronize is using the same shared logical name string "MYLOCKNAME", or anything you want.
using (var tran = ctx.Database.BeginTransaction())
{
try
{
const string lockName = "MYLOCKNAME";
var resourceParam = new SqlParameter("Resource", SqlDbType.NVarChar, 255) { Value = lockName };
var lockModeParam = new SqlParameter("LockMode", SqlDbType.NVarChar, 32) { Value = "Transaction" };
ctx.Database.ExecuteSqlCommand("EXEC sp_getapplock #Resource, #LockMode", resourceParam, lockModeParam);
// Do stuff with ctx
// ...
tran.Commit();
}
catch
{
tran.Rollback();
}
}
There is a .NET 4.7 WebAPI application working with SQL Server using Entity Framework and hosting NServiceBus endpoint with MSMQ transport.
Simplified workflow can be described by a controller action:
[HttpPost]
public async Task<IHttpActionResult> SendDebugCommand()
{
var sample = new Sample
{
State = SampleState.Initial,
};
_dataContext.Set<Sample>().Add(sample);
await _dataContext.SaveChangesAsync();
sample.State = SampleState.Queueing;
var options = new TransactionOptions
{
IsolationLevel = IsolationLevel.ReadCommitted,
};
using (var scope = new TransactionScope(TransactionScopeOption.Required, options, TransactionScopeAsyncFlowOption.Enabled))
{
await _dataContext.SaveChangesAsync();
await _messageSession.Send(new DebugCommand {SampleId = sample.Id});
scope.Complete();
}
_logger.OnCreated(sample);
return Ok();
}
And DebugCommand handler, that is sent to the same NServiceBus endpoint:
public async Task Handle(DebugCommand message, IMessageHandlerContext context)
{
var sample = await _dataContext.Set<Sample>().FindAsync(message.SampleId);
if (sample == null)
{
_logger.OnNotFound(message.SampleId);
return;
}
if (sample.State != SampleState.Queueing)
{
_logger.OnUnexpectedState(sample, SampleState.Queueing);
return;
}
// Some work being done
sample.State = SampleState.Processed;
await _dataContext.SaveChangesAsync();
_logger.OnHandled(sample);
}
Sometimes, message handler retrieves the Sample from the DB and its state is still Initial, not Queueing as expected. That means that distributed transaction initiated in the controller action is not yet fully complete. That is also confirmed by time-stamps in the log file.
The 'sometimes' happens quite rarely, under heavier load and network latency probably affects. Couldn't reproduce the problem with local DB, but easily with a remote DB.
I checked DTC configurations. I checked there is escalation to a distributed transaction for sure. Also if scope.Complete() is not called then there will be no DB update neither message sending happening.
When the transaction scope is completed and disposed, intuitively I expect both DB and MSMQ to be settled before a single further instruction is executed.
I couldn't find definite answers to questions:
Is this the way DTC work? Is this normal for both transaction parties to do commits, while completion is not reported back to the coordinator?
If yes, does it mean I should overcome such events by altering logic of the program?
Am I misusing transactions somehow? What would be the right way?
In addition to the comments mentioned by Evk in Distributed transaction with MSMQ and SQL Server but sometimes getting dirty reads here's also an excerpt from the particular documentation page about transactions:
A distributed transaction between the queueing system and the persistent storage guarantees atomic commits but guarantees only eventual consistency.
Two additional notes:
NServiceBus uses IsolationLevel.ReadCommitted by default for the transaction used to consume messages. This can be configured although I'm not sure whether setting it to serialized on the consumer would really solve the issue here.
In general, it's not advised to use a shared database between services as this highly increases coupling and opens the door for issues like you're experiencing here. Try to pass relevant data as part of the message and keep the database an internal storage for one service. Especially when using web servers, a common pattern is to add all the relevant data to a message and fire it while confirming success to the user (as the message won't be lost) while the receiving endpoint can store the data to it's database if necessary. To give more specific recommendations, this requires more knowledge about your domain and use case. I can recommend the particular discussion community to discuss design/architectural question like this.
There are many articles here on EF taking a long time to save, but I've looked through them and used their answers and still seem to get very slow results.
My code looks like so:
using (MarketingEntities1 db = new MarketingEntities1())
{
//using (var trans = db.Database.BeginTransaction(IsolationLevel.ReadUncommitted))
//{
int count = 0;
db.Configuration.AutoDetectChangesEnabled = false;
db.Configuration.ValidateOnSaveEnabled = false;
while (count < ranges.Count)
{
if (bgw != null)
{
bgw.ReportProgress(0, "Saving count: " + count.ToString());
}
db.Set<xGeoIPRanx>().AddRange(ranges.Skip(count).Take(BATCHCOUNT));
db.SaveChanges();
count+=BATCHCOUNT;
}
//trans.Commit();
//}
}
Each batch takes 30+ seconds to complete. BatchCount is 1000. i know EF isn't that slow. You can see that I've stopped using transaction, I've taken tracking off, none of it seemed to help.
Some more info:
xGeoIpRanx is an empty table, with no PK(I'm not sure how much it would help). I'm trying to insert about 10 mil ranges.
Edit:
i feel stupid but im trying to use bulkInsert and i keep getting this entity doesnt exist errors, i look at this code
using (var ctx = GetContext())
{
using (var transactionScope = new TransactionScope())
{
// some stuff in dbcontext
ctx.BulkInsert(entities);
ctx.SaveChanges();
transactionScope.Complete();
}
}
What is "entities" I tried a list of my entities, that doesnt work, what data type is that?
nvm it works as expected it was a strange error due to how i generated the edmx file
Pause the debugger 10 times under load and look at the stack including
external code. Where does it stop most often?
.
Its taking a long time on the .SaveChanges(). just from some quick tests, ADO.net code
That means network latency and server execution time are causing this. For inserts server execution time is usually not that high. You cannot do anything about network latency with EF because it sends one batch per insert. (Yes, this is a deficiency of the framework.).
Don't use EF for bulk work. Consider using table-values parameters or SqlBulkCopy or any other means of bulk inserting such as Aducci's proposal from the comments.
I have a fairly big database with tables created for different business modules.
We decided to create different edmx-files for different modules respectively.
However, how can I prevent the usage of MSDTC when trying to implement a TransactionScope for a logical action that will incur writing to multiple tables in different edmx? Again, the underlying database is the same, I wouldn't want to use MSDTC for this scenario.
Is there any way to pass in an opened SQL connection with active transaction?
Thanks for help in advance.
Regards,
William
TransactionScope enlists the MSDTC when the databases are different and/or the connection strings are different.
Rick Strahl has a great article on this (his perspective is LINQ to SQL, but it's applicable to EF). The money paragraphs:
TransactionScope is a high level Transaction wrapper that makes it
real easy to wrap any code into a transaction without having to track
transactions manually. Traditionally TransactionScope was a .NET
wrapper around the Distributed Transaction Coordinator (DTC) but it’s
functionality has expanded somewhat. One concern is that the DTC is
rather expensive in terms of resource usage and it requires that the
DTC service is actually running on the machine (yet another service
which is especially bothersome on a client installation).
However, recent updates to TransactionScope and the SQL Server Client
drivers make it possible to use TransactionScope class and the ease of
use it provides without requiring DTC as long as you are running
against a single database and with a single consistent connection
string. In the example above, since the transaction works with a
single instance of a DataContext, the transaction actually works
without involving DTC. This is in SQL Server 2008.
See also this SO question/answer where I found the link to Rick's blog.
So if you're connecting to the same database and are using the same connection string, the DTC should not be involved.
thanks for all replies above!
by the way, just managed to find a solution which is to use EntityConnection and EntityTransaction explicitly. A sample is like this:
string theSqlConnStr = "data source=TheSource;initial catalog=TheCatalog;persist security info=True;user id=TheUserId;password=ThePassword";
EntityConnectionStringBuilder theEntyConnectionBuilder = new EntityConnectionStringBuilder();
theEntyConnectionBuilder.Provider = "System.Data.SqlClient";
theEntyConnectionBuilder.ProviderConnectionString = theConnectionString;
theEntyConnectionBuilder.Metadata = #"res://*/";
using (EntityConnection theConnection = new EntityConnection(theEntyConnectionBuilder.ToString()))
{
theConnection.Open();
theET = null;
try
{
theET = theConnection.BeginTransaction();
DataEntities1 DE1 = new DataEntities1(theConnection);
//DE1 do somethings...
DataEntities2 DE2 = new DataEntities2(theConnection);
//DE2 do somethings...
DataEntities3 DE3 = new DataEntities3(theConnection);
//DE3 do somethings...
theET.Commit();
}
catch (Exception ex)
{
if (theET != null) { theET.Rollback(); }
}
finally
{
theConnection.Close();
}
}
with explicit use of EntityConnection & EntityTransaction, I can achieve the sharing of single connection and transaction for multiple ObjectContexts for a single database, yet without the need to incur the usage of MSDTC.
Hope this info is helpful. Gd luck!!
Have you got some good advices to use EF in a multithread program ?
I have 2 layers :
a EF layer to read/write into my database
a multithread service which uses my entities (read/write) and makes some computations (I use Task Parallel Library in the framework)
How can I synchronize my object contexts in each thread ?
Do you know a good pattern to make it work ?
Good advice is - just don't :-) EF barely manages to survive one thread - the nature of the beast.
If you absolutely have to use it, make the lightest DTO-s, close OC as soon as you have the data, repack data, spawn your threads just to do calculations and nothing else, wait till they are done, then create another OC and dump data back into DB, reconcile it etc.
If another "main" thread (the one that spawns N calculation threads via TPL) needs to know when some ther thread is done fire event, just set a flag in the other thread and then let it's code check the flag in it's loop and react by creating new OC and then reconciling data if it has to.
If your situation is more simple you can adapt this - the key is that you can only set a flag and let another thread react when it's ready. That means that it's in a stable state, has finished a round of whatever it was doing and can do things without risking race conditions. Reset the flag (an int) with interchaged operations and keep some timing data to make sure that your threads don't react again within some time T - otherwire they can spend their lifetime just querying DB.
This is how I implemented it my scenario.
var processing= new ConcurrentQueue<int>();
//possible multi threaded enumeration only processed non-queued records
Parallel.ForEach(dataEnumeration, dataItem=>
{
if(!processing.Contains(dataItem.Id))
{
processing.Enqueue(dataItem.Id);
var myEntityResource = new EntityResource();
myEntityResource.EntityRecords.Add(new EntityRecord
{
Field1="Value1",
Field2="Value2"
}
);
SaveContext(myEntityResource);
var itemIdProcessed = 0;
processing.TryDequeue(out itemIdProcessed );
}
}
public void RefreshContext(DbContext context)
{
var modifiedEntries = context.ChangeTracker.Entries()
.Where(e => e.State == EntityState.Modified || e.State == EntityState.Deleted);
foreach (var modifiedEntry in modifiedEntries)
{
modifiedEntry.Reload();
}
}
public bool SaveContext(DbContext context,out Exception error, bool reloadContextFirst = true)
{
error = null;
var saved = false;
try
{
if (reloadContextFirst)
this.RefreshContext(context);
context.SaveChanges();
saved = true;
}
catch (OptimisticConcurrencyException)
{
//retry saving on concurrency error
if (reloadContextFirst)
this.RefreshContext(context);
context.SaveChanges();
saved = true;
}
catch (DbEntityValidationException dbValEx)
{
var outputLines = new StringBuilder();
foreach (var eve in dbValEx.EntityValidationErrors)
{
outputLines.AppendFormat("{0}: Entity of type \"{1}\" in state \"{2}\" has the following validation errors:",
DateTime.Now, eve.Entry.Entity.GetType().Name, eve.Entry.State);
foreach (var ve in eve.ValidationErrors)
{
outputLines.AppendFormat("- Property: \"{0}\", Error: \"{1}\"", ve.PropertyName, ve.ErrorMessage);
}
}
throw new DbEntityValidationException(string.Format("Validation errors\r\n{0}", outputLines.ToString()), dbValEx);
}
catch (Exception ex)
{
error = new Exception("Error saving changes to the database.", ex);
}
return saved;
}
I think Craig might be right about your application no needing to have threads.. but you might look for the uses of ConcurrencyCheck in your models to make sure you don't "override" your changes
I don't know how much of your application is actually number crunching. If speed is the motivation for using multi-threading then it might pay off to take a step back and gather data about where the bottle next is.
In a lot of cases I have found that the limiting factor in applications using a database server is the speed of the I/O system for your storage. For example the speed of the hard drive disk(s) and their configuration can have a huge impact. A single hard drive disk with 7,200 RPM can handle about 60 transactions per second (ball park figure depending on many factors).
So my suggestion would be to first measure and find out where the bottle next is. Chances are you don't even need threads. That would make the code substantially easier to maintain and the quality is much higher in all likelihood.
"How can I synchronize my object contexts in each thread ?"
This is going to be tough. First of all SP or the DB queries can have parallel execution plan. So if you also have parallelism on object context you have to manually make sure that you have sufficient isolation but just enough that you dont hold lock too long that you cause deadlocks.
So I would say dont need to do it .
But that might not be the answer you want. So Can you explain a bit more what you want to achieve using this mutithreading. Is it more compute bound or IO bound. If it is IO bound long running ops then look at APM by Jeff Richter.
I think your question is more about synchronization between threads and EF is irrelevvant here. If I understand correctly you want to notify threads from one group when the main thread performed some operation - in this case "SaveChanges()" operation. The threads here are like client-server applications, where one thread is a server and other threads are clients and you want client-threads to react on server activity.
As someone noticed you probably do not need threads, but let's leave it as it is.
There is no fear of dead locks as long as you are going to use separate OC per thread.
I also assume that your client threads are long-running thread in some kind of loop. If you want your code to be executed on client thread you can't use C# events.
class ClientThread {
public bool SomethingHasChanged;
public MainLoop()
{
Loop {
if (SomethingHasChanged)
{
refresh();
SomethingHasChanged = false;
}
// your business logic here
} // End Loop
}
}
Now the question is how you will set the flag in all your client-threads? You could keep references to client threads in your main thread and loop through them and set all flags to true.
Back when I used EF, I simply had one ObjectContext, to which I synchronized all access.
This isn't ideal. Your database layer would effectively be singlethreaded. But, it did keep it thread-safe in a multithreaded environment. In my case, the heavy computation was not in the database code at all - this was a game server, so game logic was of course the primary resource hog. So, I didn't have any particular need for a multithreaded DB layer.