Transactional Systems: archiving data from databases with Entity Framework - c#

I've written a small tool for archiving data from my Entity Framework code-first driven database.
I'm testing it thoroughly and I'm coming to a point where I'm trying it with large amounts of data. Where it comes to some problems. For example I got timeouts or exceptions like this sometimes:
Deadlock found when trying to get lock; try restarting transaction.
I know what transactions are and I guess Entity Framework is making them for all of its changes in one DbContext so in case any of them or the entire thing fails when SaveChanges() is called nothing is actually changed (short side questions: can I then simply run SaveChanges() again?)
What I want to know is this: since I need to delete different batches of information throughout my database (after exporting it) I'm constantly creating dbcontext's for each of those batches.
Should I create transactions manually for every batch and commit them all at once at the very end?
I'm studying Informatics and learn about transactional information systems in one of my courses. How is it possible with Entity Framework to create a meta transaction for all my single transactions when deleting batches of data, so all the data that is spread throughout the database is only really deleted when everything worked well, like this:
Or is there a better way to solve the entire thing?

Related

Entity framework deadlock issue using SQL Server

I know there are many threads for this issue but I just need to ask what is best suitable solution for my problem of entity framework deadlock issue.
Application: .Net MVC with Entity Framework 6 on SQL Server 2019.
Problem: The application is processing around 100,000 rows of a table on daily basis. The major usage is during daytime. As the entity framework is executing selects on c# entities, the insert/update either from entity or from stored proc is running into deadlocks and exception throws ad it timed out.
Possible solutions we have in mind to implement are:
SnapShot Isolation on Sql server: This is the easiest fix but reading articles as it comes with huge price. Current size of our database is around 2TB and if tempdb grows rapidly then we might run into other side effects that will be too much to handle.
Wrapping each Select of c# entities with Transaction Isolation Scope with ReadCommitted. This is feasible but needs a lot of code changes may be around 600 lines to be wrapped.
Convert all the Select statements of c# entities into stored procs and place a "nolock".
Along with above points can anybody please suggest or point us to the solution with the current scenario?

Unique constraint violations on concurrent insert in EF Core

I'm writing an ETL/data ingestion process for an ASP.NET Core 2.0 application. The workflow is as follows:
CSV is uploaded to Amazon S3
Initial lambda function is triggered that "chunks" the csv into
many small files with 1000 records each. The reason for this is to
work around Lambda's 5 minute runtime limit.
Each of these "chunk" files trigger another lambda function that
actually processes the records and inserts/updates the Postgres
database. This function uses a Parallel.ForEach loop thus
increasing the concurrency even further.
The data being loaded basically amounts to user data for a business application, but spans several tables. The problem I'm running in to is that when two users with an identical new related entity are imported at essentially the same time, I end up with a unique constraint error because they both fail to find the related entity and attempt to create it, but of course one wins and then the second fails when it also attempts to create it.
I'm familiar with how to deal with concurrency when updating records using row-level locking and the like, but for inserts I'm not sure how best to handle the situation beyond maybe trying to catch the error and then look up the existing entity and attach it to the new user. In other languages and frameworks I've used CreateOrUpdate sort of functionality to deal with this, but I can't find anything similar in EF Core.

Application DAL design

Hello and thanks for looking.
I have a DAL question for an application I'm working on. The app is going to extract some data from 5-6 tables from a production RDBMS that serves a much more critical role in the org. What the app has to do is use the data in these tables, analyze, apply some business logic/rules and then present.
The restrictions are that since the storage model is critical in nature to the org, I need to restrict how the app will request the data. Since the tables are relatively small, I created my data access to use DataTables to load the entirety of the db tables on a fixed interval using a timer.
My questions are really around my current design and the potential use of EF or LINQtoSQL
Can EF/LS work around the restrictions of the RDBMS. Most tutorials I've seen, the storage exists solely for the application. Can access to the storage be controlled and/or can EF use DataTables rather than An RDBMS?
Since the entirety of the tables are going to be loaded, is there a best practice for creating classes to consume the data within these tables? I will have to do in memory joins and querying/logic to get at the actual data I need.
Sorry if I'm being generic. I'm more just looking for thoughts and opinions as opposed to a solution to my problem. Please done hesitate to share your thoughts. Thanks.
For your first question, yes Entity Framework can use a existing DB as it's source, the term to search for when looking for Entity Framework tutorials on this topic is called "Database First"
For your second question let me first preface it with a warning: many ORMs are not designed around using it to load the entire data table and do bulk operations on them, especially if you will be modifying the result set and pushing the data back to the server in large quanties. The updates will be row based not set based because you did the modifications in C# code, not in a T-SQL query. Most ORMs are built around the expectation that you will be doing CRUD operations on the row level, not ETL operations or set level CRUD operations (except for Read which most ORMs will do as a set operation).
If you will not be updating the data, only pulling out using Entity Framework and building reports and whatnot off of the data you should be fine. If you are bulk inserting in to the database, things get more problematic. See this SO question for more information.

Transaction across multiple databases (C#)

I'm working on an application which will be updating multiple databases (SQL Server 2008 and Oracle 11g). TransactionScope seemed like the logical way to ensure updates were always committed correctly, but it seems that installing MSDTC is not going to be an option. In the future, it's also possible this application could be using data sources which don't support distributed transactions.
I've spent many hours trying to come up with another solution but nothing seems like it will work. All searches point back to TransactionScope and distributed transactions.
The application is written in C#, using the Entity Framework. Anyone have any suggestions, which won't require being escalated to distributed transactions? Here's a list of ideas I've had which have gone nowhere.
+TransactionScope: Can't use MSDTC. Future data sources may not support distributed transactions.
+Manually track and rollback transactions: I haven't found a good way to do this within Entity Framework.
+Queue/log failures so they can be re-committed by another process: Can't come up with a good way to store the failed commits generically. Also need to make sure the re-commit doesn't overwrite newer data.
#ThinkJet. That related link is an interesting opinion. In my case a small failure, like what is described, would not be a huge deal. We currently have other stuff in place which tries to keep all these systems in sync (not always successfully). If one or two transactions did fail it should be picked up by these processes.
After reading through these comments, I might try to have this library write the data to it's own database. Then, sync those changes to the other sources so that the other applications can see the changes. It would cause a slight delay in some updates but even that would be better than what we have now.

Entity Framework and ADO.NET with Unit of Work pattern

We have a system built using Entity Framework 5 for creating, editing and deleting data but the problem we have is that sometimes EF is too slow or it simply isn't possible to use entity framework (Views which build data for tables based on users participating in certain groups in database, etc) and we are having to use a stored procedure to update the data.
However we have gotten ourselves into a problem where we are having to save the changes to EF in order to have the data in the database and then call the stored procedures, we can't use ITransactionScope as it always elevates to a distributed transaction and/or locks the table(s) for selects during the transaction.
We are also trying to introduce a DomainEvents pattern which will queue events and raise them after the save changes so we have the data we need in the DB but then we may end up with the first part succeeding and the second part failing.
Are there any good ways to handle this or do we need to move away from EF entirely for this scenario?
I had similar scenario . Later I break the process into small ones and use EF only, and make each small process short. Even overall time is longer, but system is easier to maintain and scale. Also I minimized joins, only update entity itself, disable EF'S AutoDetectChangesEnabled and ValidateOnSaveEnabled.
Sometimes if you look your problem in different ways, you may have better solution.
Good luck!

Categories

Resources