Getting DbUpdateConcurrencyException, but only in production. Cannot reproduce in dev

Getting DbUpdateConcurrencyException, but only in production. Cannot reproduce in dev - c#

I'm working on a Quartz.NET hosted job as part of a Blazor Server web application. This job downloads a list of products from a warehouse, and caches them in a local DbContext. In order to optimize performance due to the large number of entries, I'm inserting the new items in batches of 1000 (with DbSet.AddRange()), and after each batch I submit the changes with database.SaveChangesAsync().
The following code is a simplified version of this process:
public async Task Execute(IJobExecutionContext context)
{
// databaseFactory is injected as a dependency (IDbContextFactory<AppDatabase>)
await using var database = await databaseFactory.CreateDbContextAsync();
int page = 0;
while(true)
{
List<WarehouseEntry> dbItems = warehouse.QueryPage(page, 1000); // Retrieves 1000 entries
if(dbItems.Count == 0)
break;
database.ProductWarehouseCache.AddRange(dbItems.Select(p => new CachedWarehouseProduct {
Sku = p.Name,
Barcode = p.Barcode,
Stock = p.Stock,
Price = p.Price
}));
await database.SaveChangesAsync(); // <---- DbUpdateConcurrencyException here
page++;
}
}
Note that there is absolutely no concurrency on the code above. The IJob class has the [DisallowConcurrentExecution] attribute meaning that even if I accidentally trigger this procedure multiple times simultaneously, only one instance will be executing at any given time, so despite the exception message, this is not a concurrency issue. It's also important to note that nothing else is updating/querying the database while this code is running.
This works as intended on my local development machine. However when I tried to deploy the application to a production server for the first time, I've found that this specific part of the code fails with a DbUpdateConcurrencyException. Normally with an exception like this, I would look for concurrency issues, or DbContexts that are used by multiple threads at the same time, or aren't disposed properly. However, as I have explained above, this is not the case here.
The following is the full exception message:
Microsoft.EntityFrameworkCore.DbUpdateConcurrencyException:
The database operation was expected to affect 1 row(s), but actually affected 0 row(s);
data may have been modified or deleted since entities were loaded.
See http://go.microsoft.com/fwlink/?LinkId=527962 for information on
understanding and handling optimistic concurrency exceptions.
What could be causing an exception like this, when there is no concurrency whatsoever? And what could cause this to only happen on the production server, but never in the development workspace.
Additional information:
dotnet 6
EF Core 6.0.6
Local/Dev Database: MySQL 8.0.31
Local/Dev OS: Windows 11
Remote/Prod Database+OS: MySQL 8.0.30-0ubuntu0.20.04.2

I have fixed the issue. I was using DataGrip's built-in import/export tools to clone my database's DDL from local dev DB to remote prod DB. Apparently these tools don't replicate the DDL exactly as they should, which leads to EF core throwing random unexpected errors such as this.
To fix it, I rewrote my deployment pipeline to use the dotnet ef migrations script --idempotent script to generate an .sql file that automatically applies any missing migrations to the production database. By using this dotnet tool, I am no longer getting the exception.

Related

SSIS Package Created and Executed via Code Requires Service to be Restarted after Multiple Runs

we have existing C# code to dynamically create SSIS packages that reads from different files and into a SQL Server 2016 database. It does what we need it to, but stumbled into an issue that we remain unable to resolve: being unable to keep running the .Execute command without having to restart our custom Windows service.
Right now the limit is at 2 files. If we run our code that calls the Execute command the third time, it will get stuck up on post validate based from what we're logging via the Component and Run Event handlers, and we couldn't proceed until we restart the Windows service and run again. We cannot just always restart the service because there could be other processes that can be disrupted if we go with that approach.
Things we've tried so far:
Extract the dtsx package that the code creates and attempt to run it using dtexec or BIDS locally. No errors / warnings raised, and we can keep re-running the same package over and over without fail.
Run the same code locally. Same result as #1.
Run a SQL trace with the help of our DBA. Confirmed that we have no queries locking the destination table and the database itself. We did observe some SPIDs being retained after the third run but all are in a sleeping state.
Modify our RunEventHandler and ComponentEventHandler to log which step is the process in. Also tried enabling logging via Event Viewer. No errors, really just getting stuck at post-validate as mentioned earlier come the third run.
Explicitly call the SSIS dispose methods, even tried explicitly disposing the connection managers themselves.
Play around the DelayValidations and ValidateExternalMetadata properties.
Any chance others have encountered this before and were able figure out what's causing this?

To expound on the latest comment, we've found that the issue stems from the fact that we're calling a separate AppDomain to take care of running the job and consequently executing the package. The way this separate AppDomain is created is via CreateInstanceAndUnwrap,using the following code:
private AppDomain CreateAppDomain(string applicationName)
{
var uid = Guid.NewGuid();
var fi = new FileInfo(Assembly.GetExecutingAssembly().Location);
var info = new AppDomainSetup
{
ConfigurationFile = #"D:\Deploy\App\web.config",
ApplicationBase = #"D:\Deploy\App",
ApplicationName = applicationName,
CachePath = fi.Directory.Parent.FullName + #"\Temp",
ShadowCopyDirectories = #"D:\Deploy\App\bin",
PrivateBinPath = "bin",
ShadowCopyFiles = "true"
};
var domain = AppDomain.CreateDomain(uid.ToString(), null, info);
return domain;
}
How we're able to test out this theory is by calling AppDomain.Unload(domain) against that created domain, and while we get an DomainUnloadException, this does prevent the job from freezing after being run twice.
We still haven't determined exactly what's within the domain that's getting locked up and preventing us to run the job more than twice, and guidance to learn more about that will be helpful. In the meantime, we're using this workaround of unloading the app domain for now.

can there be a stale EF model in IIS application pool?

We made a very minor change to our Entity Framework Code First model by adding a column to an already existing Entity called "EntityLog". After deployment we verified that the data was going into the EntityLog along with the data for that new column.
We are using WebBackgrounder, which schedules a web based background job as part of the web application and runs every few minutes and detects if there is something to be recorded to the EntityLog and logs it. It has an Execute method which creates an async Task which in turn creates a repository to the EF model.
Around 10 hours after deployment the application started throwing exceptions when the WebBackgrounder scheduler would try to log to EntityLog saying the new column was invalid. The web application itself was fine, and we could trigger actions in it that also logged to Entity Log, including the new column. It's just the Background scheduler that kept failing. And it didn't consistently fail. It succeeded sometimes.
I couldn't really figure out what was going on except from the stack trace I could see what exactly it was trying to put in the Log. It stopped throwing exceptions around 15-16 hours after that and I could see that it had finally logged to the EntityLog.
The whole web application runs under a single application pool. So it's not as if the Background job is running in another application pool. The weird part to me is that the Background job had succeeded in putting other messages into the Database earlier. My question is, could there have been a worker process in the application pool that had a stale DBContext that it kept thinking the new column in the table was invalid? Has anybody else experienced this?
EDIT: I was talking about async tasks provided in .net System.Threading.Tasks:
public override Task Execute()
{
return new Task(() =>
{
try
{
_logger.Debug("Background processing started");
using (var repository = RepositoryFactory.CreateRepository())
{
var cacheProvider = new CacheProvider(new MemoryCacheAdapter());
using (var managerContainer = new ManagerContainer(repository, cacheProvider))
{
try
{

AutomaticDataLossException thrown after initial migration runs in EF 6 in SQL Server but not MySQL

Background info:
I am about to start a new project written in C# (vs2015) which will be using Entity Framework 6 within its DAL. One of the core requirements is that the system must be able to run against either SQL Server, Azure SQL server, MySQL or Oracle as chosen by the user simply changing the connection string in the web.config.
So prior to the project kick-off I am trying to get more familiar with Entity Framework as I have not used it outside of a few tutorials before.
I have written a small proof of concept app, the purpose of which is to determine how I can use EF to quickly swap a web api to use a different underlying DB. For the PoC I am using SQL Server express v12 and MySQL v5.6, C#, Entity Framework 6 via a code first approach.
I have hit a road block with my PoC, I am seeing different behaviour when running against MySQL compared to when I am running against SQL Server. I am using the following EF context:
<context type="POC.Core.Api.DataAccessLayer.SchoolContext, POC.Core.Api" disableDatabaseInitialization="false">
<databaseInitializer type="System.Data.Entity.MigrateDatabaseToLatestVersion`2[[POC.Core.Api.DataAccessLayer.SchoolContext, POC.Core.Api], [POC.Core.Api.Migrations.Configuration, POC.Core.Api]], EntityFramework"></databaseInitializer>
</context>
I have two migration scripts, one which is the initial creation and another which just adds a property to an object.
I have some test data being created in the seed method in the configuration class which looks like this (:
public Configuration()
{
AutomaticMigrationsEnabled = true;
this.SetSqlGenerator("MySql.Data.MySqlClient", new MySql.Data.Entity.MySqlMigrationSqlGenerator());
}
protected override void Seed(WeldOffice.Core.Api.DataAccessLayer.SchoolContext context)
{
var students = new List<Student>
{
new Student { FirstMidName = "Carson", LastName = "Alexander",
EnrollmentDate = DateTime.Parse("2010-09-01") },
...
};
Both my MySQL and SQL servers do not yet have the database present at all yet.
As I want the user to be able to specify the DB I am relying on in-code methods to create the DB structure etc.
Given this scenario, my understanding is that the following occurs when the db context is first hit when running the app:
Check for existence of the database - it is not present so it will be created.
Check if the initial creation script has been run - it has not so run it.
Check if the add property script has been run - it has not so run it.
Run the seed method to ensure data is present and correct.
In MySQL this all works as expected, the database is created, the tables set up and the data populated. The two migration scripts are listed in the migration table.
In SQL Server, steps 1,2,3 all work and the database is created and the tables set up. The two migration scripts are listed in the migration table. However, step 4 is not run, the Seed method is never hit and instead a AutomaticDataLossException exception is thrown.
If I set AutomaticMigrationDataLossAllowed = true then step 4 also works and everything is as expected. But, I would not really want to have AutomaticMigrationDataLossAllowed set to true and don't see why this is necessary in SQL server but not MySQL. Why is this the case? I feel I am missing something fundamental.
Please keep in mind this is just a proof on concept, in production there will be no seed data anyway, and I would rather a dba create the db via sql scripts etc.

OptimisticConcurrencyException: Multiple EF based applications using shared AppFabric cache and same database

I am using a web application and a windows service on the same machine as Appfabric.
Both applications reuse same DAL code (dll) which is EF (Entity Framework) Code-First based and accessing the same cache in Appfabric. The code in the windows service is implemented as a Job as part of Quartz.Net
The web application has to support multiple requests off course, and the windows service multiple threads( scheduler and events).
For both, the shared DAL dll creates a DbContext object per http session and thread ContextID or just Thread ContextID for the later. The DAL uses the EFCachingProviders from here. Also, my EF solution uses Optimistic concurrency with a timestamp columns and IsRowVersion in the mapping.
As stated here, the benefit of having a 2nd level cache is to have access to a representation of the original state across processes! But that does not seem to work for me, I get 'OptimisticConcurrencyException' in my use case as following:
restart cache cluster, restart windows service, restart iis -> clean slate :)
Using web app (firefox), I insert a new object A with reference to existing object B. I can see the new row in the database. All ok.
Using webapp in another browser (chrome) = new session, i can see the new object.
Next, the windows service tries to do some background processing and tries to update object B. This results in an 'OptimisticConcurrencyException'. Apparently the process in the windows service is holding a version of Object B with a dated rowversion.
If i restart the windows service, it tries the same logic again and works with no exception....
So both applications are multithreaded, use same DAL code, connect to same database, and same cache cluster and same cache. I would expect the update and insert to be in the appfabric cache. I would expect the EF context of the windows service to use the newest information. Somehow, it seems, that it's 1st level cache in holding on old information...
or something else is going wrong.
Please advice...
Update
Ok, after digging around, i fixed the Update problem of my windows service. Each Manager object with queries the DAL uses a DbContext bound to its Process ID + Thread ID. So in the Execute function of my Quartz Job, all Managers (of different object types) should share the same DbContext which is created by the first Manager.
The problem was, that after the function finished, the DbContext was not Disposed (which happens automatically in the HTTP Session based DbContext manager). So the next time the Job was executed, the same DbContext was found and used, which by that time was dated already (old first level cache???). The 2nd level cache should not be a problem, because that is shared and SHOULD contain newest objects... if any.
So this part is fixed.
New problem
So the web-app creates a new object A, updates an existing object B, the windows-service now works and is able to update the existing (changed) object B with no problem.
Problem:
When i do a refresh of the webapp, it does not see the changes (by the windows service) of object B....
So if the webapp changed a count to 5, 10 minutes later the windows service change that count to 6 and I open the web-app in same or new window/browser, i still see 5, not 6!
A restart of the webapp (iis) does not help, also an iisreset doesn't.
When i do Restart-CacheCluster.... it works and shows 6....
So it looks like the item is in the cache. The windows service updates it, but does not invalidate the item, which is old and used by the webapp....
Or... although the same object, the webapp has its own entry in the cache and the win-app has its own entry (which does get invalidated)....
Which one?
Solution
I solved this myself. The EF wrapper uses the query string as a key to store items in the cache, it seems. So 2 different queries (does not matter if they originate from 2 different application sharing same distributed cache or same application) referencing the same data in the database will have different keys (different query string) and so different places in the cache. Perhaps its not this black-and-white but something like this...
I don't think internally some way of algorithm is used to check if a query touches existing cached objects.
This causes my problem where my windows service does an update and the webapp still sees the old one from the cache which could only be solved by doing a Restart-CacheCluster command.
So how i fixed this:
My windows Service is a batch job triggered by the Quartz Scheduler. After it is done
I clear the whole cache:
private void InvalidateCache()
{
try
{
DataCache myCache = ...
foreach (String region in myCache.GetSystemRegions())
{
myCache.ClearRegion(region);
}
}
catch (Exception ex)
{
eventLog.WriteEntry("InvalidateCache exception : " + ex.Message);
}
}

I don't have an answer, but I hope the thoughts below might point you into the right direction.
If this is only an issue on updates, I would go for reading a fresh instance of the record on every update from the database, and update that. This would avoid optimistic concurrency errors. Note that the DbContext is not thread safe - I don't know if this would cause the issue, but reading every time new would address it.
If you are having this issue on reads, then you would have to track down where the various caches are and which one is not getting updated and why. I am guessing there are various configuration options for caching at each point of usage. Good luck with that.... :)

Disparity between SqlReader execution times on same .NET project but different computers

I'm working on a team project that reads data from a MSSQL server. We are using an asynchronous call to fetch the data.
SqlConnection conn = new SqlConnection(System.Configuration.ConfigurationManager.ConnectionStrings["DefaultConnection"].ConnectionString);
SqlCommand cmdData = new SqlCommand("get_report", conn);
cmdData.CommandType = CommandType.StoredProcedure;
conn.Open();
IAsyncResult asyResult = cmdData.BeginExecuteReader();
try
{
SqlDataReader drResult = cmdData.EndExecuteReader(asyResult);
table.Load(drResult);
}
catch (SqlException ex)
{
throw;
}
The project itself uses TFS source control with gated check-ins, and we have verified that both computers are running the exact same version of the project. We are also using the same user login and executing the stored procedure with the exact same parameters (which are not listed for brevity).
The stored procedure itself takes 1:54 to return 42000 rows under SQL Server Management Studio. While running on Windows 7 x86, the .NET code takes roughly the same amount of time to execute as on SSMS, and the code snippet above executes perfectly. However, on my computer running Windows 7 x64, the code above encounters an error at the EndExecuteReader line at the 0:40 mark. The error returned is "Invalid Operation. The connection has been closed."
Adding cmdData.CommandTimeout = 600 allows the execution to proceed, but the data takes over 4 minutes to be returned, and we are at a loss as to explain what might be going on.
Some things we considered: my computer has .NET 4.5 Framework installed, is running 64-bit OS against 32-bit assemblies, may be storing information in the local project file that isn't being synchronized to the TFS server. But we can't seem to figure out exactly what might actually be causing the disparity in times.
Anyone have any ideas as to why this disparity exists or can give me suggestions of where to look to isolate the problem?

Invalid Operation error is received when EndExecuteReader was called more than once for a single command execution, or the method was mismatched against its execution method.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.