What is a good way to archive data with the identity column using EF Core? - c#

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 hour ago.
Improve this question
I'm using .NET 6 and EF Core 6.0.13. I have two databases Foo and FooArchive with identical schemas. I need to archive (migrate) data that are older than a year from Foo to FooArchive for 7 tables. What's the best way to do this with EF Core? I will describe below what I tried and the issues I'm running into.
NOTE: There are no foreign keys or any relationships defined for any table in both DBs and so no navigation properties, etc.
There are FooContext and FooArchiveContext classes both using the same entity models but different connections and injected into the repository class
Query for the customer ids whose account is older than 365 days on a different transaction
var customerIds = GetCustomerIds();
Loop through CustomerIds collection and archive one customer at a time
foreach(var customerId in customerIds)
{
using(var fooTx = _fooContext.Database.BeginTransaction())
using(var fooArchiveTx = _fooArchiveContext.Database.BeginTransaction())
{
//Series of left joins to get the data from 7 tables
var recordsToArchive = (from cust in _fooContext.Customers
join ord in _fooContext.Orders on ord.CustId equals cust.Id into co
from ord in co.DefaultIfEmpty()
join odt in _fooContext.OrderDetails on odt.OrderId equals ord.id into ordt
.
where cust.id = customerId
select new {
Customer = cust,
Order = ord,
OrderDetail = odt,
.
.
}).ToList();
var customer = recordsToArchive.Select(x => x.Customer).Distinct().First();
var orders = recordsToArchive.Select(x => x.Order).Where(x != null).Distinct();
var orderDetails = recordsToArchive.Select(x => x.OrderDetail).Where(x != null).Distinct();
.
.
// Check if the record to be migrated is already in FooArchiveContext
var existingRecord = _fooArchiveContext.Customers.FirstOrDefault(x => x.Id == customerId);
if(existingRecord == null)
{
_fooArchiveContext.Customers.Add(customer);
_fooArchiveContext.Orders.AddRange(orders);
_fooArchiveContext.OrderDetails.AddRange(orderDetails);
.
}
else
{
_fooArchiveContext.Customers.Update(customer);
_fooArchiveContext.Orders.UpdateRange(orders);
_fooArchiveContext.OrderDetails.UpdateRange(orderDetails);
.
}
_fooArchiveContext.SaveChanges();
//Remove the record from fooContext
_fooContext.OrderDetails.RemoveRange(orderDetails);
_fooContext.Orders.RemoveRange(orders);
_fooContext.Customers.Remove(customer);
.
_fooContext.SaveChanges();
fooArchiveTx.Commit();
fooTx.Commit();
}
}
Is what I'm doing the right approach? I think I may have to use the AutoMapper to copy entities in between two contexts. It works in the InMemory database but fails when I try it against the actual SQL Server instances. I get an error
Cannot insert explicit value for identity column in table 'Orders' when IDENTITY_INSERT is set to OFF
I would like to keep the same Ids as in the original db instance (fooContext).
I guess I can remove the Id in the entity object and save. Then query for the new Id and update the related entities but sounds tackier than the code I already have. I've seen SO answers where EF core is turning identity insert option on and off before and after calling SaveChanges() like below but haven't tried.
db.Users.Add(user);
db.Database.ExecuteSqlRaw("SET IDENTITY_INSERT MyDB.Users ON");
db.SaveChanges();
db.Database.ExecuteSqlRaw("SET IDENTITY_INSERT MyDB.Users OFF");
transaction.Commit();
Thanks for your help.

If I understand your problem correctly
Your approach of looping through each customer and archiving their records one at a time seems reasonable. However, there are a few areas where you can improve your implementation.
Firstly, you should avoid querying the database multiple times for the same data. In your code, you are querying the same data multiple times to get the customers, orders, and order details. This can be improved by using the Include method to eagerly load the related entities along with the primary entity.
Secondly, you should avoid duplicating code. In your code, you have duplicate code for adding and updating the entities in the archive database. You can reduce the duplication by using the Attach method to attach the entities to the context and then calling Update or Add depending on whether the entity is already in the context or not.
Thirdly, you should use a bulk insert/update operation instead of adding/updating the entities one at a time. EF Core does not have built-in support for bulk operations, but you can use third-party libraries like Entity Framework Extensions or Z.EntityFramework.Plus to perform bulk operations.
Finally, you should avoid setting the identity column values explicitly. Instead, let the database generate the identity values for you. To do this, you can remove the identity column from your entity models or use the ValueGeneratedOnAdd() method in your entity configuration.
With these improvements in mind, here's an example implementation of your code:
using (var fooTx = _fooContext.Database.BeginTransaction())
using (var fooArchiveTx = _fooArchiveContext.Database.BeginTransaction())
{
var cutoffDate = DateTime.UtcNow.AddYears(-1);
var customerIds = _fooContext.Customers
.Where(c => c.CreatedAt < cutoffDate)
.Select(c => c.Id)
.ToList();
foreach (var customerId in customerIds)
{
var customer = _fooContext.Customers
.Include(c => c.Orders)
.ThenInclude(o => o.OrderDetails)
.FirstOrDefault(c => c.Id == customerId);
if (customer != null)
{
if (_fooArchiveContext.Customers.Any(c => c.Id == customerId))
{
_fooArchiveContext.Attach(customer);
_fooArchiveContext.Update(customer);
}
else
{
_fooArchiveContext.Add(customer);
}
_fooArchiveContext.SaveChanges();
_fooArchiveContext.Orders.BulkInsert(customer.Orders);
_fooArchiveContext.OrderDetails.BulkInsert(customer.Orders.SelectMany(o => o.OrderDetails));
_fooContext.OrderDetails.RemoveRange(customer.Orders.SelectMany(o => o.OrderDetails));
_fooContext.Orders.RemoveRange(customer.Orders);
_fooContext.Customers.Remove(customer);
_fooContext.SaveChanges();
}
}
fooArchiveTx.Commit();
fooTx.Commit();
}
In this code, we first get the list of customer IDs whose accounts are older than a year. We then loop through each customer and retrieve their orders and order details using the Include method. We then check if the customer already exists in the archive database and use the Attach and Update methods to update the existing customer, or the Add method to add a new customer.
We then use the BulkInsert method from the Entity Framework Extensions library to insert the orders and order details in bulk. We also remove the orders, order details, and customer from the source database using the RemoveRange method.
Finally, we call SaveChanges on the archive and source contexts and commit the transactions.

Related

Bulk Update in Entity Framework Core

I pull a bunch of timesheet entries out of the database and use them to create an invoice. Once I save the invoice and have an Id I want to update the timesheet entries with the invoice Id. Is there a way to bulk update the entities without loading them one at a time?
void SaveInvoice(Invoice invoice, int[] timeEntryIds) {
context.Invoices.Add(invoice);
context.SaveChanges();
// Is there anything like?
context.TimeEntries
.Where(te => timeEntryIds.Contains(te.Id))
.Update(te => te.InvoiceId = invoice.Id);
}
Disclaimer: I'm the owner of the project Entity Framework Plus
Our library has a Batch Update feature which I believe is what you are looking for
This feature supports EF Core
// Is there anything like? YES!!!
context.TimeEntries
.Where(te => timeEntryIds.Contains(te.Id))
.Update(te => new TimeEntry() { InvoiceId = invoice.Id });
Wiki: EF Batch Update
EDIT: Answer comment
does it supports contains as in your example? I think this is coming from EF Core which is not supported feature in 3.1 version even
EF Core 3.x support contains: https://dotnetfiddle.net/DAdIO2
EDIT: Answer comment
this is great but this requires to have zero parameter public constructors for classes. which is not a great. Any way to get around this issue?
Anonymous type is supported starting from EF Core 3.x
context.TimeEntries
.Where(te => timeEntryIds.Contains(te.Id))
.Update(te => new { InvoiceId = invoice.Id });
Online example: https://dotnetfiddle.net/MAnPvw
As of EFCore 7.0 you will see the built-in BulkUpdate() and BulkDelete methods:
context.Customers.Where(...).ExecuteDelete();
context.Customers.Where(...).ExecuteUpdate(c => new Customer { Age = c.Age + 1 });
context.Customers.Where(...).ExecuteUpdate(c => new { Age = c.Age + 1 });
context.Customers.Where(...).ExecuteUpdate(c => c.SetProperty(b => b.Age, b => b.Age + 1));
Are you after the performance of simplified syntax?
I would suggest to use direct SQL query,
string query = "Update TimeEntries Set InvoiceId = <invoiceId> Where Id in (comma separated ids)";
context.Database.ExecuteSqlCommandAsync(query);
For comma separated ids you can do string.Join(',', timeEntryIds)
It depends on what you actually need. If you want to go with Linq, then you need to iterate through each object.
If TimeEntry has an association to Invoice (check the navigation properties), you can probably do something like this:
var timeEntries = context.TimeEntries.Where(t => timeEntryIds.Contains(te.Id)).ToArray();
foreach(var timeEntry in timeEntries)
invoice.TimeEntries.Add(timeEntry);
context.Invoices.Add(invoice);
//save the entire context and takes care of the ids
context.SaveChanges();
The IQueryable.ToQueryString method introduced in Entity Framework Core 5.0 may help with this scenario. This method will generate SQL that can be included in a raw SQL query to perform a bulk update of records identified by that query.
For example:
void SaveInvoice(Invoice invoice, int[] timeEntryIds) {
context.Invoices.Add(invoice);
context.SaveChanges();
var query = context.TimeEntries
.Where(te => timeEntryIds.Contains(te.Id))
.Select(te => te.Id);
var sql = $"UPDATE TimeEntries SET InvoiceId = {{0}} WHERE Id IN ({query.ToQueryString()})";
context.Database.ExecuteSqlRaw(sql, invoice.Id);
}
The major drawback of this approach is that you end up with raw SQL appearing in your code. However I don't know of any reasonable way to avoid that with current Entity Framework Core capabilities - you're stuck with this caveat, or the caveats of other answers posted here such as:
Introducing a dependency on another library such as Entity Framework Plus or ELinq.
Using DbContext.SaveChanges() which will involve the execution of multiple SQL queries to retrieve and update records one at a time rather than doing a bulk update.
In entity framework core , you can do with update range method. you can see some samples usage here .
using (var context = new YourContext())
{
context.UpdateRange(yourModifiedEntities);
// or the followings are also valid
//context.UpdateRange(yourModifiedEntity1, yourModifiedEntity2, yourModifiedEntity3);
//context.YourEntity.UpdateRange(yourModifiedEntities);
//context.YourEntity.UpdateRange(yourModifiedEntity1, yourModifiedEntity2,yourModifiedEntity3);
context.SaveChanges();
}
Bulk update supported with EF 7:
context
.TimeEntries
.Where(te => timeEntryIds.Contains(te.Id))
.ExecuteUpdate(s => s.SetProperty(
i => te.InvoiceId,
i => invoice.Id));
Also there is async version for this method ExecuteUpdateAsync.
in EF Core 7, use ExecuteUpdate(), what's new
var multipleRows = TableA.Where(t=>t.Id < 99);
multipleRows.ExecuteUpdate(t=>
t.SetProperty(
r => r.Salary,
r => r.Salary * 2));
//SQL already sent to database, do not run below
//SaveChanges();
SQL being generated by EF
UPDATE [t]
SET [t].[Salary] = [t].[Salary] * 2
FROM [TableA] AS [t]
WHERE [t].[ID] < 99

How to select specific fields to update in EF

I want to get all records from a database with #where, then update them. To do this, I have created a query like this:
public async Task MarkAllAsActive()
{
var currentUserId = _userManager.GetCurrentUserId();
await _workOrders.Where(row => row.Status == WorkOrderStatus.Draft).ForEachAsync(row =>
{
row.Status = WorkOrderStatus.Active;
_uow.MarkAsChanged(row, currentUserId);
});
}
But this query selects all fields from the database which isn't good. To solve this I try to select just specific fields like ID, Status:
public async Task MarkAllAsActive()
{
var currentUserId = _userManager.GetCurrentUserId();
await _workOrders.Select(row=>new WorkOrder { Id=row.Id,Status=row.Status}).Where(row => row.Status == WorkOrderStatus.Draft).ForEachAsync(row =>
{
row.Status = WorkOrderStatus.Active;
_uow.MarkAsChanged(row, currentUserId);
});
}
But it return this error:
The entity or complex type 'DataLayer.Context.WorkOrder' cannot be constructed in a LINQ to Entities query.
I've seen a similar post and the same error, but my problem is different because I want to update.
How can I do this?
Sadly you have to fetch the entire entity.
In order to update an entity with EF, the class type edited has to be a DbContext mapped entity .
If you want to Update without fetching Entities to the server , and without writing any SQL you can use Entity Framework Extended Library .
See the update section on the site.
Fetching entity within same entity will not work in your case, as you are getting only selected columns. e.g. You are fetching WorkOrder entity in WorkOrder again.
I would suggest you to use DTO to load selected columns only. It should work. But at the time of update you will have to copy same to db object.

How to perform delete rows on some condition with EF?

in ADO.NET I can use delete statement
to delete some rows from an SQL table.
what is the equivalent in Entity Framework?
How can I achieve that same result?
updateing with null objects isn't the same.
Replies telling you, that you need to first fetch objects (strictly speaking keys are enough, but then you need to do some work manually) into memory and mark them for deletion and finally call SaveChanges. Though that's the "normal" approach, there's bunch of extensions, helpers, ... that allow you to do i.e. batch deletes, batch updates and other helpful stuff.
You can check EntityFramework.Extended (also on GitHub) or Entity Framework Extensions (sources there as well).
You need to retrieve the object to be deleted first.
For example :
// Assuming ID is primary key in `Customer` entity
Customer cust = (from c in context.Customers where c.ID = "1" select c);
Then delete the object using DataContext.entity.DeleteObject
context.Customers.DeleteObject(cust);
context.SaveChanges();
More : DataContext
First of all you need to create instance from your Database entities,after that you should select your desired object,then delete it :
TestEntities db = new TestEntities();
Test ts = (from rows in db.Tests
where rows.ID == 1
select rows).FirstOrDefault();
if (ts != null)
{
db.Tests.DeleteObject(ts);
db.SaveChanges();
}
Update :
If your result set is a list, I mean more than one record you can use this solution :
List<Test> lst = (from rows in db.Tests select rows).ToList();
foreach (Test item in lst)
{
db.Tests.DeleteObject(item);
}
db.SaveChanges();

What is the recommended practice to update or delete multiple entities in EntityFramework?

In SQL one might sometimes write something like
DELETE FROM table WHERE column IS NULL
or
UPDATE table SET column1=value WHERE column2 IS NULL
or any other criterion that might apply to multiple rows.
As far as I can tell, the best EntityFramework can do is something like
foreach (var entity in db.Table.Where(row => row.Column == null))
db.Table.Remove(entity); // or entity.Column2 = value;
db.SaveChanges();
But of course that will retrieve all the entities, and then run a separate DELETE query for each. Surely that must be much slower if there are many entities that satisfy the criterion.
So, cut a long story short, is there any support in EntityFramework for updating or deleting multiple entities in a single query?
EF doesn't have support for batch updates or deletes but you can simply do:
db.Database.ExecuteSqlCommand("DELETE FROM ...", someParameter);
Edit:
People who really want to stick with LINQ queries sometimes use workaround where they first create select SQL query from LINQ query:
string query = db.Table.Where(row => row.Column == null).ToString();
and after that find the first occurrence of FROM and replace the beginning of the query with DELETE and execute result with ExecuteSqlCommand. The problem with this approach is that it works only in basic scenarios. It will not work with entity splitting or some inheritance mapping where you need to delete two or more records per entity.
Take a look to Entity Framework Extensions (Multiple entity updates). This project allow set operations using lambda expressions. Samples from doc:
this.Container.Devices.Delete(o => o.Id == 1);
this.Container.Devices.Update(
o => new Device() {
LastOrderRequest = DateTime.Now,
Description = o.Description + "teste"
},
o => o.Id == 1);
Digging EFE project source code you can see how automatize #Ladislav Mrnka second approach also adding setting operations:
public override string GetDmlCommand()
{
//Recover Table Name
StringBuilder updateCommand = new StringBuilder();
updateCommand.Append("UPDATE ");
updateCommand.Append(MetadataAccessor.GetTableNameByEdmType(
typeof(T).Name));
updateCommand.Append(" ");
updateCommand.Append(setParser.ParseExpression());
updateCommand.Append(whereParser.ParseExpression());
return updateCommand.ToString();
}
Edited 3 years latter
Take a look to this great answer: https://stackoverflow.com/a/12751429
Entity Framework Extended Library helps to do this.
Delete
//delete all users where FirstName matches
context.Users.Delete(u => u.FirstName == "firstname");
Update
//update all tasks with status of 1 to status of 2
context.Tasks.Update(
t => t.StatusId == 1,
t2 => new Task {StatusId = 2});
//example of using an IQueryable as the filter for the update
var users = context.Users.Where(u => u.FirstName == "firstname");
context.Users.Update(users, u => new User {FirstName = "newfirstname"});
https://github.com/loresoft/EntityFramework.Extended

How to update related entries of different tables when using Entity Framework 4.0?

Is there any shorter way to do this update?
void Update(Table1 table1Entry, Table2[] table2entries)
{
entities.Table1.Attach(table1Entry);
var table2EntriesIds = table2entries.Select(a => a.Id);
var updates = entities.Table2
.Where(a => table2EntriesIds.Contains(a.Id));
foreach(var update in updates)
{
entities.Table2.Attach(update);
}
var deletions = entities.Table2
.Where(a => a.Table1Id == table1Entry.Id);
.Where(a => !table2EntriesIds.Contains(a.Id));
foreach(var deletion in deletions)
{
entities.DeleteObject(deletion);
}
var insertions = table2entries.Except(matches);
foreach(var insertion in insertions)
{
entities.AddToTable2(insertion);
}
entities.SaveChanges();
}
where Table2 has an Table1_Id foreign key.
The idea is correct. You can optimize it so for example you will not load separately relations to update and relations to delete but you will still have to manually synchronize current detached state of your entities with state in the database. The only way to synchronize the state of the entity graph is to do it manually per entity and relation.
The question is if your code works. I think it doesn't. It doesn't update any records because it doesn't change state of the records to modified. You also cannot attach again record loaded from the context. As the last point if those table1 and table2 are somehow related I don't see any code working with the relation itself (unless you use FK properties).

Categories

Resources