Unique constraint violations on concurrent insert in EF Core

Unique constraint violations on concurrent insert in EF Core - c#

I'm writing an ETL/data ingestion process for an ASP.NET Core 2.0 application. The workflow is as follows:
CSV is uploaded to Amazon S3
Initial lambda function is triggered that "chunks" the csv into
many small files with 1000 records each. The reason for this is to
work around Lambda's 5 minute runtime limit.
Each of these "chunk" files trigger another lambda function that
actually processes the records and inserts/updates the Postgres
database. This function uses a Parallel.ForEach loop thus
increasing the concurrency even further.
The data being loaded basically amounts to user data for a business application, but spans several tables. The problem I'm running in to is that when two users with an identical new related entity are imported at essentially the same time, I end up with a unique constraint error because they both fail to find the related entity and attempt to create it, but of course one wins and then the second fails when it also attempts to create it.
I'm familiar with how to deal with concurrency when updating records using row-level locking and the like, but for inserts I'm not sure how best to handle the situation beyond maybe trying to catch the error and then look up the existing entity and attach it to the new user. In other languages and frameworks I've used CreateOrUpdate sort of functionality to deal with this, but I can't find anything similar in EF Core.

Related

Adding Entity Objects to LINQ-to-SQL Data Context at Runtime - SQL, C#(WPF)

I've hit a wall when it comes to adding a new entity object (a regular SQL table) to the Data Context using LINQ-to-SQL. This isn't regarding the drag-and-drop method that is cited regularly across many other threads. This method has worked repeatedly without issue.
The end goal is relatively simple. I need to find a way to add a table that gets created during runtime via stored procedure to the current Data Context of the LINQ-to-SQL dbml file. I'll then need to be able to use the regular LINQ query methods/extension methods (InsertOnSubmit(), DeleteOnSubmit(), Where(), Contains(), FirstOrDefault(), etc...) on this new table object through the existing Data Context. Essentially, I need to find a way to procedurally create the code that would otherwise be automatically generated when you do use the drag-and-drop method during development (when the application isn't running), but have it generate this same code while the application is running via command and/or event trigger.
More Detail
There's one table that gets used a lot and, over the course of an entire year, collects many thousands of rows. Each row contains a timestamp and this table needs to be divided into multiple tables based on the year that the row was added.
Current Solution (using one table)
Single table with tens of thousands of rows which are constantly queried against.
Table is added to Data Context during development using drag-and-drop, so there are no additional coding issues
Significant performance decrease over time
Goals (using multiple tables)
(Complete) While the application is running, use C# code to check if a table for the current year already exists. If it does, no action is taken. If not, a new table gets created using a stored procedure with the current year as a prefix on the table name (2017_TableName, 2018_TableName, 2019_TableName, and so on...).
(Incomplete) While the application is still running, add the newly created table to the active LINQ-to-SQL Data Context (the same code that would otherwise be added using drag-and-drop during development).
(Incomplete) Run regular LINQ queries against the newly added table.
Final Thoughts
Other than the above, my only other concern is how to write the C# code that references a table that may or may not already exist. Is it possible to use a variable in place of the standard 'DB_DataContext.2019_TableName' methodology in order to actually get the table's data into a UI control? Is there a way to simply create an Enumerable of all the tables where the name is prefixed with a year and then select the most current table?
From what I've read so far, the most likely solution seems to involve the use of a SQL add-on like SQLMetal or Huagati which (based solely from what I've read) will generate the code I need during runtime and update the corresponding dbml file. I have no experience using these types of add-ons, so any additional insight into these would be appreciated.
Lastly, I've seen some references to LINQ-to-Entities and/or LINQ-to-Objects. Would these be the components I'm looking for?
Thanks for reading through a rather lengthy first post. Any comments/criticisms are welcome.

The simplest way to achieve what you want is to redirect in SQL Server, and leave your client code alone. At design-time create your L2S Data Context, or EF DbContex referencing a database with only a single table. Then at run-time substitue a view or synonym for that table that points to the "current year" table.
HOWEVER this should not be necessary in the first place. SQL Server supports partitioning, so you can store all the data in a physically separate data structures, but have a single logical table. And SQL Server supports columnstore tables, which can compress and store many millions of rows with excellent performance.

Transactional Systems: archiving data from databases with Entity Framework

I've written a small tool for archiving data from my Entity Framework code-first driven database.
I'm testing it thoroughly and I'm coming to a point where I'm trying it with large amounts of data. Where it comes to some problems. For example I got timeouts or exceptions like this sometimes:
Deadlock found when trying to get lock; try restarting transaction.
I know what transactions are and I guess Entity Framework is making them for all of its changes in one DbContext so in case any of them or the entire thing fails when SaveChanges() is called nothing is actually changed (short side questions: can I then simply run SaveChanges() again?)
What I want to know is this: since I need to delete different batches of information throughout my database (after exporting it) I'm constantly creating dbcontext's for each of those batches.
Should I create transactions manually for every batch and commit them all at once at the very end?
I'm studying Informatics and learn about transactional information systems in one of my courses. How is it possible with Entity Framework to create a meta transaction for all my single transactions when deleting batches of data, so all the data that is spread throughout the database is only really deleted when everything worked well, like this:
Or is there a better way to solve the entire thing?

How to implement a C# Winforms database application with synchronisation?

Background
I am developing a C# winforms application - currently up to about 11000 LOC and the UI and logic is about 75% done but there is no persistence yet. There are hundreds of attributes on the forms. There are 23 entities/data classes.
Requirement
The data needs to be kept in an SQL database. Most of the users operate remotely and we cannot rely on them having a connection so we need a solution that maintains a database locally and keeps it in synch with the central database.
Edit: Most of the remote users will only require a subset of the database in their local copy. This is because if they don't have access permissions (as defined and stored in my application) to view other user's records, they will not receive copies of them during synchronisation.
How can I implement this?
Suggested Solution
I could use the Microsoft Entity Framework to create a database and the link between database and code. This would save a lot of manual work as there are hundreds of attributes. I am new to this technology but have done a "hello world" project in it.
For data synch, each entity would have an integer primary key ID. Additionally it would have a secondary ID column which relates to the central database. This secondary column would contain nulls in the central database but would be populated in the local databases.
For synchronisation, I would write code which copies the records and assigns the IDs accordingly. I would need to handle conflicts.
Can anyone foresee any stumbling blocks to doing this? Would I be better off using one of the recommended solutions for data sychronisation, and if so would these work with the entity framework?

Synching data between relational databases is a pain. Your best course of action is probably dependent on: how many users will there be? How probably are conflicts (i.e. that the users will work offline on the same data). Also possibly what kind of manpower do you have (do you have proper DBAs/Sql Server devs standing by to assist with the SQL part, or are you just .NET devs).
I don't envy you this task, it smells of trouble. I'd especially be worried about data corruption and spreading that corruption to all clients rapidly. I'd put extreme countermeasures in place before any data in the remote DB gets updated.
If you predict a lot of conflicts - the same chunk of data gets modified many times by multiple users - I'd probably at least consider creating an additional 'merge' layer to figure out, what is the correct order of operations to perform on the remote db.
One thought - it might be very wrong and crazy, but just the thing that popped in my mind - would be to use JSON Patch on the entities, be it actual domain objects or some configuration containers. All the changes the user makes are recorded as JSON Patch statements, then applied to the local db, and when the user is online - submitted - with timestamps! - to merge provider. The JSON Patch statements from different clients could be grouped by the entity id and sorted by timestamp, and user could get feedback on what other operations from different users are queued - and manually make amends to it. Those grouped statments could be even stored in a files in a git repo. Then at some pre-defined intervals, or triggered manually, the update would be performed on a server-side app and saved to the remote db. After this the users local copies would be refreshed from server.
It's just a rough idea, but I think that you need something with similar capability - it doesn't have to be JSON Patch + Git, you can do it in probably hundreds of ways. I don't thing though, that you will get away with just going through the local/remote db and making updates/merges. Imagine the scenario, where user updates some data (let's say, 20 fields) offline, another makes completely different updates to 20 fields, and 10 of those are common between the users. Now, what should the synch process do? Apply earlier and then latter changes? I'm fairly certain that both users would be furious, because their input was 'atomic' - either everything is changed, or nothing is. The latter 'commit' must be either rejected, or users should have an option to amend it in respect of the new data. That highly depends what your data is, and as I said - what will be number/behaviour of users. Duh, even time-zones become important here - if you have users all in one time-zone you might get away with having predefined times of day when system synchs - but no way you'll convince people with many different business hours that the 'synch session' will happen at e.g. 11 AM, when they are usually giving presentation to management or sth ;)

Continuous delivery and database schema changes with entity framework

We want to progress towards being able to do continuous delivery of of our application into production. We currently deploy to azure and use table/blob storage and have a azure sql database, which we access with the entity.
As the database schema changes we want to be able to automatically apply the schema changes to the production database, but as this will happen whilst the application is live and the code changes are being deployed to many nodes at the same time we are not sure what the correct approach is.
After some reading it seems (and this makes sense) that the application needs to be tolerant of the 2 different database schema versions, so that it doesn't matter if its an old version of the code or a new version of the code which sees the database, however I'm not sure what the best way to approach handling this in the application is, using the entity framework.
Should we have versioned instances of the EF generated classes in the code which know how to access a specific version of the schema? What happens when the schema is updated and an old version of the code is running against the database?
Our entity framework classes are mapped to views in specific schemas in the db and nothing is mapped to the underlying tables, so potentially this could allow us to create v1 views which the old code uses and v2 views which the new code uses, but maintaining this feels like it would be a bit of a nightmare (its already enough of a pain simply maintaining the EF mappings to views rather than tables)
So what are best practices in this area? What do others do to solve this problem?

Whether you use EF or not, maintaining the code's ability to work with 2 consecutive versions of the database is a good (and perhaps the only viable) approach here.
Here are some ways we handle specific types of migrations:
When adding a column, we can typically just add the column (with a default constraint if non-nullable) and not worry about the code. EF will never issue a "SELECT *", so it will be able to continue to function properly while ignoring the new column. Similarly, adding a table is easy.
When removing a column or table, simply keep that column around 1 version longer than you would have otherwise.
For more complex migrations (e. g. completely changing the structure for a table or segment of the data model), deploy the new model alongside backwards-compatibility views (or tables with triggers to keep them in-sync), which will live as long as does the code that references them. As you say, this can a lot of work depending on the complexity of the migration, but it sounds like you are already well-positioned to do this because your EF entities point to views anyway. On the other hand, the benefit of this work is that you have more time to do the code migration. If you have a large codebase, this could be really beneficial in allowing you to migrate the data model to fit the needs of new features while still supporting old features without major code changes.
As a side-note, the difficulty of data migration often makes us push developing a finalized data model as far back as possible in the development schedule. With EF, you can write and test a lot of code before the data model is finalized (we use code-first to generate a sample SQLExpress database in a unit tests, even though our production database is not maintained by code-first). That way, we make fewer incremental changes to the production data model once a new feature is released.

Entity Framework and ADO.NET with Unit of Work pattern

We have a system built using Entity Framework 5 for creating, editing and deleting data but the problem we have is that sometimes EF is too slow or it simply isn't possible to use entity framework (Views which build data for tables based on users participating in certain groups in database, etc) and we are having to use a stored procedure to update the data.
However we have gotten ourselves into a problem where we are having to save the changes to EF in order to have the data in the database and then call the stored procedures, we can't use ITransactionScope as it always elevates to a distributed transaction and/or locks the table(s) for selects during the transaction.
We are also trying to introduce a DomainEvents pattern which will queue events and raise them after the save changes so we have the data we need in the DB but then we may end up with the first part succeeding and the second part failing.
Are there any good ways to handle this or do we need to move away from EF entirely for this scenario?

I had similar scenario . Later I break the process into small ones and use EF only, and make each small process short. Even overall time is longer, but system is easier to maintain and scale. Also I minimized joins, only update entity itself, disable EF'S AutoDetectChangesEnabled and ValidateOnSaveEnabled.
Sometimes if you look your problem in different ways, you may have better solution.
Good luck!

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.