Entity Framework RemoveRange is very slow

Entity Framework RemoveRange is very slow - c#

I am deleting hundreds of rows from my table. Using ADO.Net code i.e. Delete from table where somecolumn = somestring takes less than a second while using Entity Framwork i.e.
MyDbContext context = new MyDbContext()
context.SomeEntity.RemoveRange(context.SomeEntity.Where(i => i.somecolumn == somestring));
context.SaveChanges();
is taking 8-10 seconds.
can anybody explain it or I am doing something wrong.

In these situations which you want to delete among of records (read thousand or more) , the most efficient way would be a so called "bulk delete". EFCore.BulkExtensions allows that. Some thing like below code:
var recordsToRemove = context.SomeEcontity.Where(i => i.somecolumn == somestring));
context.BulkDelete(recordsToRemove);

EF is designed around providing relational data mapped to an object model. It isn't ideally suited to large bulk operations. Still, you can facilitate one-off requirements like this a number of ways.
The first method would be to use stubs for the delete. For this you would want to ensure that the DbContext instance is "clean" of any tracked instances that might be deleted, so ideally a DbContext that is scoped to that method.
using (var context = new SomeDbContext())
{
var stubs = context.SomeEntities
.Where(x => x.SomeColumn == someString)
.Select(x => x.Id)
.ToList()
.Select(x => new SomeEntity { Id == x })
.ToList();
}
Now you might be able to simplify that down to:
var stubs = context.SomeEntities
.Where(x => x.SomeColumn == someString)
.Select(x => new SomeEntity { Id == x.Id })
.ToList();
However, you would probably want to test that to ensure that the resulting SQL is just selecting the ID (not the entire entity) and that context.SomeEntities.Local.Any() is still false... The first example will just ensure that the query load the IDs, then proceeds to build stub Entities using that ID. This makes the "selection" for our data as efficient as possible.
From here you should be able to use RemoveRange on the untracked stubs.
context.SomEntities.RemoveRamge(stubs);
context.SaveChanges();
The important detail is that the DbContext cannot be tracking any of these entities as this will temporarily attach these stubs to the DbContext. If the context was already tracking an instance with one of those ids then you would receive an error that one or more entities with the same ID was already being tracked. (Hence the locally scoped DbContext to avoid that)
The other way to perform this deletion would be to issue a direct SQL operation. If you have a DbContext that is scoped to a request or longer than this single operation then this should be done after dealing with any currently tracked instances.
Step 1. Deal with any tracked instances if you have an injected DbContext:
var trackedInstances = context.SomeEntities.Local
.Where(x => x.SomeColumn == someString);
.ToList();
if (trackedInstances.Any())
context.SomeInstances.RemoveRange(trackedInstances);
This will check the DbContext for any tracked instances without hitting the DB. We will want to remove these instances to avoid possibly having any of these marked as Modified and triggering an exception later during a SaveChanges call.
Step 2. Build and run a parameterized raw SQL statement to clean off all remaining rows in the DB.
context.Database.ExecuteSqlCommand(#"DELETE FROM dbo.SomeEntities
WHERE SomeColumn = #someString", new SqlParameter("someString", someString));
context.SaveChanges();
The important detail here is to use a parameterized query. Do not execute raw SQL with the parameters embedded in the string as this leaves the door open to SQL injection attacks.
I.e. do not use anything like:
context.Database.ExecuteSqlCommand($"DELETE FROM dbo.SomeEntities
WHERE SomeColumn = '{someString}'");
// or
context.Database.ExecuteSqlCommand("DELETE FROM dbo.SomeEntities
WHERE SomeColumn = '" + someString + "'");

Related

How to reload database with Entity Framework in C#

I am trying to do some checks on my database with an automated process. On a schedule the process goes out to a service and checks all the entries in the database against a list.
I want to re-insert the records that may have been deleted and update ones that are out of date
foreach (Category x in CustomeClass)
{
Category exists = Context.SSActivewear_Category
.Where(b => b.CategoryID == x.CategoryID)
.FirstOrDefault();
if (exists == null)
Context.Add(x);
else
Context.Update(x);
}
Not sure but I keep getting messages about tracking an instance with the same key etc. Can someone point me to a best practice on something like this
Danka!

This type of error is common when re-using the same instance of EF dbContext, especially when trying to load the same entity more then once from database using the same context.
If this is the case, then simply recreate the context (either new it up or use context factory) and then try update or modify your database data.
Creating new context is cheap, so no worries there.
After updating, save changes and dispose of the context (or use it in using statement to begin with).
If you are modifying the same entity multiple times using the same context, then do not load it from database multiple times.
In Your particular code example I would check if there is no duplication of Category objects in CustomeClass collection.
Is there duplication of CategoryID?
Is CategoryID for sure auto-generated when saving data (entity configuration)?
Is code not trying to update multiple entities with same id?
etc.

Entity framework works with references. Two instances of a class with the same data amount to two different references and only one reference can be associated with a DbContext, otherwise you get errors like that.
As per your example below:
foreach (Category x in CustomeClass)
{
Category exists = Context.SSActivewear_Category
.Where(b => b.CategoryID == x.CategoryID)
.FirstOrDefault();
if (exists == null)
Context.Add(x);
else
Context.Update(x);
}
Category is assumed to be your Entity, so "CustomeClass" would be a collection of instances that are not associated with your Context. These are Detached instances.
When your "exists" comes back as #null, this will appear to work as the Category "x" gets added and tracked by the Context. However, when "exists" comes back as not #null, you now have two instances for the same entity. "exists" is tracked by the DbContext, while "x" is not. You cannot use Update() with "x", you must copy the values across.
The simplest way to do this would be Automapper where you can create a map from Category to Category, then use Map to copy all values from "x" over into "exists":
var config = new MapperConfiguration(cfg => cfg.CreateMap<Category, Category>());
var mapper = config.CreateMapper();
mapper.Map(x, exists);
This is purely an example, you'll probably want to configure and inject a mapper that handles your entity copying. You can configure the CreateMap to exclude columns that shouldn't ever change. (Using ForMember, etc.)
Alternatively you can copy the values across manually:
// ...
else
{
exists.Name = x.Name,
exists.SomeValue = x.SomeValue,
// ...
}
In general you should avoid using the Update method in EF as this will result in a statement that overwrites all columns in a table, rather than updating just the column(s) that changes. (If no columns changed then no UPDATE SQL will actually get run)
On another side note, when getting "exists", you should use SingleOrDefault() not FirstOrDefault() as you expect 0 or 1 row back. First* methods should be used in cases where you expect there can be multiple matches but only want the first match, and should always be used with an OrderBy*() method to ensure the results are predictable.
You can use Update by performing an Exists check query that doesn't load a tracked entity. Examples would be:
Category exists = Context.SSActivewear_Category
.AsNoTracking()
.Where(b => b.CategoryID == x.CategoryID)
.SingleOrDefault();
or better:
bool exists = Context.SSActivewear_Category
.Where(b => b.CategoryID == x.CategoryID)
.Any();
Then you could use Context.Update(x). AsNoTracking() tells EF to load an instance but not track it. This would really be a waste in this case as it's a round trip to the DB to return everything in the Category only to check if something was returned or not. The Any() call would be a round trip but just does an EXISTS db query to return true or false.
However, these are not fool-proof as there is no guarantee that the Context instance isn't already tracking an instance for that Category from some other operation. Something as trivial as having a Category appear in CustomeClass twice for any reason would be enough to trip the above examples up as once you call Context.Update(x) that instance is now tracked, so the loop iteration for the second instance would fail.

How does Distinct manage to work on an IQueryable after mapping from entities to DTOs in EF Core?

I have a query similar to this that I am running against our database:
var result = await _context.MyEntities
.Select(x => new SubEntityDto { Id = x.SubEntity.Id })
.Distinct()
.ToListAsync();
The entity SubEntity can be the same entity set on multiple MyEntity.
Since I want a list of SubEntityDto with no duplicates, I run .Distinct() on the resulting IQueryable<SubEntityDto>. This is the SubEntityDto class:
public class SubEntityDto
{
public int Id { get; set; }
}
And this works, but I don't know how it manages to make the list distinct when working with the DTOs. Doesn't .Distinct() use the .Equals method in this scenario? And doesn't .Equals default to reference equality, which checks whether two instances are the same instance?
If I load the list from the database, and then do Distinct(), it doesn't work anymore, like this:
var subEntities = await _context.MyEntities
.Select(x => new SubEntityDto { Id = x.SubEntity.Id })
.ToListAsync();
var distinctSubEntities = subEntities.Distinct().ToList(); // Not distinct.
I'm thinking that it somehow manages to do this when creating the SQL for the SQL query, but can anyone tell me what is happening? I'm puzzled about the fact that it seems to keep track of the entites after they're mapped to DTOs.

As mentioned in the comments to your question, when you create your query it gets translated into SQL query by your provider. So in the first case it is Queryable.Distinct(), however, in the second case you have already run your query against your database by calling ToListAsync() and result of it already translated to DTO's in memory (objects). Therefore Enumerable.Distinct() is run.
You can also use your debugger to see the created query
You can see the created query by your provider by putting a breakpoint and then investigating the contents of the variable DebugView→Query→Text Visualizer will show you the result.

Live Transfer of data from one provider to another in Entity Framework

I apologise if this has been asked already, I am struggling greatly with the terminology of what I am trying to find out about as it conflicts with functionality in Entity Framework.
What I am trying to do:
I would like to create an application that on setup gives the user to use 1 database as a "trial"/"startup" database, i.e. non-production database. This would allow a user to trial the application but would not have backups etc. in no way would this be a "production" database. This could be SQLite for example.
When the user is then ready, they could then click "convert to production" (or similar), and give it the target of the new database machine/database. This would be considered the "production" environment. This could be something like MySQL, SQLServer or.. whatever else EF connects to these days..
The question:
Does EF support this type of migration/data transfer live? Would it need another app where you could configure the EF source and EF destination for it to then run through the process of conversion/seeding/population of the data source to another data source?
Why I have asked here:
I have tried to search for things around this topic, but transferring/migration brings up subjects totally non-related, so any help would be much appreciated.

From what you describe I don't think there is anything out of the box to support that. You can map a DbContext to either database, then it would be a matter of fetching and detaching entities from the evaluation DbContext and attaching them to the production one.
For a relatively simple schema / object graph this would be fairly straight-forward to implement.
ICollection<Customer> customers = new List<Customer>();
using(var context = new AppDbContext(evalConnectionString))
{
customers = context.Customers.AsNoTracking().ToList();
}
using(var context = new AppDbContext(productionConnectionString))
{ // Assuming an empty database...
context.Customers.AddRange(customers);
}
Though for more complex models this could take some work, especially when dealing with things like existing lookups/references. Where you want to move objects that might share the same reference to another object you would need to query the destination DbContext for existing relatives and substitute them before saving the "parent" entity.
ICollection<Order> orders = new List<Order>();
using(var context = new AppDbContext(evalConnectionString))
{
orders = context.Orders
.Include(x => x.Customer)
.AsNoTracking()
.ToList();
}
using(var context = new AppDbContext(productionConnectionString))
{
var customerIds = orders.Select(x => x.Customer.CustomerId)
.Distinct().ToList();
var existingCustomers = context.Customers
.Where(x => customerIds.Contains(x.CustomerId))
.ToList();
foreach(var order in orders)
{ // Assuming all customers were loaded
var existingCustomer = existingCustomers.SingleOrDefault(x => x.CustomerId == order.Customer.CustomerId);
if(existingCustomer != null)
order.Customer = existingCustomer;
else
existingCustomers.Add(order.Customer);
context.Orders.Add(order);
}
}
This is a very simple example to outline how to handle scenarios where you may be inserting data with references that may, or may not exist in the target DbContext. If we are copying across Orders and want to deal with their respective Customers we first need to check if any tracked customer reference exists and use that reference to avoid a duplicate row being inserted or throwing an exception.
Normally loading the orders and related references from one DbContext should ensure that multiple orders referencing the same Customer entity will all share the same entity reference. However, to use detached entities that we can associate with the new DbContext via AsNoTracking(), detached references to the same record will not be the same reference so we need to treat these with care.
For example where there are 2 orders for the same customer:
var ordersA = context.Orders.Include(x => x.Customer).ToList();
Assert.AreSame(orders[0].Customer, orders[1].Customer); // Passes
var ordersB = context.Orders.Include(x => x.Customer).AsNoTracking().ToList();
Assert.AreSame(orders[0].Customer, orders[1].Customer); // Fails
Even though in the 2nd example both are for the same customer. Each will have a Customer reference with the same ID, but 2 different references because the DbContext is not tracking the references used. One of the several "gotchas" with detached entities and efforts to boost performance etc. Using tracked references isn't ideal since those entities will still think they are associated with another DbContext. We can detach them, but that means diving through the object graph and detaching all references. (Do-able, but messy compared to just loading them detached)
Where it can also get complicated is when possibly migrating data in batches (disposing of a DbContext regularly to avoid performance pitfalls for larger data volumes) or synchronizing data over time. It is generally advisable to first check the destination DbContext for matching records and use those to avoid duplicate data being inserted. (or throwing exceptions)
So simple data models this is fairly straight forward. For more complex ones where there is more data to bring across and more relationships between that data, it's more complicated. For those systems I'd probably look at generating a database-to-database migration such as creating INSERT statements for the desired target DB from the data in the source database. There it is just a matter of inserting the data in relational order to comply with the data constraints. (Either using a tool or rolling your own script generation)

Fluent Nhibernates loads the whole object dependencies in a for loop

I'm working in a C# project using the NHibernate ORM. I'm quite new to this framework.
I have the following table mapping:
this.Table("FCT_CONNECTR_TRANSF_MAP_CONCEP");
this.LazyLoad();
this.Id(x => x.Id).GeneratedBy.TriggerIdentity().Column("ID_CONNECTR_TRANSF_MAP_CONCEP");
this.References(x => x.ConnectorTransformation).Not.Nullable().Column("ID_CONNECTOR_TRANSFORMATION");
this.References(x => x.LookUpConcept).Nullable().Column("ID_MAP_CONCEPT").Cascade.All();
this.References(x => x.MapCustomSource).Column("ID_MAP_CUSTOM_SOURCE").Cascade.All();
this.References(x => x.MapCustomTarget).Column("ID_MAP_CUSTOM_TARGET").Cascade.All();
this.References(x => x.CreatedBy).Nullable().Column("CREATED_BY");
this.References(x => x.ModifiedBy).Column("MODIFIED_BY");
this.Map(x => x.DtCreated).Nullable().Column("DT_CREATED");
this.Map(x => x.DtModified).Column("DT_MODIFIED");
this.Map(x => x.Description).Column("DSC_MAP_CONCEPT");
And in my C# code, I have the following code snipplet.
foreach (var mapConcept in mapConcepts)
{
this.connectorTransformationMapConceptEntityRepository.Delete(mapConcept);
}
On the line this.connectorTransformationMapConceptEntityRepository.Delete(mapConcept); the ORM loads all dependencies (+5000 select queries).
My question: Why NHibernate needs to resolve all dependencies in order to delete the object?

There is mapping .Cascade.All() applied to your many-to-one reference mapping. That cascade setting instructs NHibernate: "load related data and delete them on delete".
This is reason, why NHibernate must load related stuff. If cascading is a feature we NEED - we can use some optimization, batch fetching:
19.1.5. Using batch fetching
NHibernate can make efficient use of batch fetching, that is, NHibernate can load several uninitialized proxies if one proxy is accessed (or collections. Batch fetching is an optimization of the lazy select fetching strategy. There are two ways you can tune batch fetching: on the class and the collection level.
Batch fetching for classes/entities is easier to understand. Imagine you have the following situation at runtime: You have 25 Cat instances loaded in an ISession, each Cat has a reference to its Owner, a Person. The Person class is mapped with a proxy, lazy="true". If you now iterate through all cats and call cat.Owner on each, NHibernate will by default execute 25 SELECT statements, to retrieve the proxied owners. You can tune this behavior by specifying a batch-size in the mapping of Person:
<class name="Person" batch-size="10">...</class>
NHibernate will now execute only three queries, the pattern is 10, 10, 5.
So, we can extend mapping of our classes MapCustomSource, MapCustomTarget with batch size:
Table(...)
Id(...)
BatchSize(25)
Or we can even change the approach and use some more efficient way how to delete more items without loading them at all:
13.3. DML-style operations
As already discussed, automatic and transparent object/relational mapping is concerned with the management of object state. This implies that the object state is available in memory, hence manipulating (using the SQL Data Manipulation Language (DML) statements: INSERT, UPDATE, DELETE) data directly in the database will not affect in-memory state. However, NHibernate provides methods for bulk SQL-style DML statement execution which are performed through the Hibernate Query Language (HQL).
An example of code DELETION without LOADING at all:
ISession session = sessionFactory.OpenSession();
ITransaction tx = session.BeginTransaction();
String hqlDelete = "delete Customer c where c.name = :oldName";
// or String hqlDelete = "delete Customer where name = :oldName";
int deletedEntities = s.CreateQuery( hqlDelete )
.SetString( "oldName", oldName )
.ExecuteUpdate();
tx.Commit();
session.Close();

The reason is due to cascading, when you delete the parent, all children will be deleted etc.
You may need to change from .Cascade().All() to .Cascade().SaveUpdate() instead if you don't want deletes to cascade.
You can find out more about cascade behavior at ayende.com

Entity Framework - Reference not loading

I have a model-first, entity framework design like this (version 4.4)
When I load it using code like this:
PriceSnapshotSummary snapshot = db.PriceSnapshotSummaries.FirstOrDefault(pss => pss.Id == snapshotId);
the snapshot has loaded everything (that is SnapshotPart, Quote, QuoteType), except the DataInfo. Now looking into the SQL this appears to be because Quote has no FK to DataInfo because of the 0..1 relationship.
However, I would have expected that the navigation property 'DataInfo' on Quote would still go off to the database to fetch it.
My current work around is this:
foreach (var quote in snapshot.ComponentQuotes)
{
var dataInfo = db.DataInfoes.FirstOrDefault(di => di.Quote.Id == quote.InstrumentQuote.Id);
quote.InstrumentQuote.DataInfo = dataInfo;
}
Is there a better way to achieve this? I thought EF would automatically load the reference?

This problem has to do with how the basic linq building blocks interact with Entity Framework.
take the following (pseudo)code:
IQueryable<Address> addresses;
Using (var db = new ObjectContext()) {
addresses = db.Users.Addresses.Where(addr => addr.Number > 1000);
}
addresses.Select(addr => Console.WriteLine(addr.City.Name));
This looks OK, but will throw a runtime error, because of an interface called IQueryable.
IQueryable implements IEnumerable and adds info for an expression and a provider. This basically allows it to build and execute sql statements against a database and not have to load whole tables when fetching data and iterating over them like you would over an IEnumerable.
Because linq defers execution of the expression until it's used, it compiles the IQueryable expression into SQL and executes the database query only right before it's needed. This speeds up things a lot, and allows for expression chaining without going to the database every time a Where() or Select() is executed. The side effect is if the object is used outside the scope of db, then the sql statement is executed after db has been disposed of.
To force linq to execute, you can use ToList, like this:
IQueryable<Address> addresses;
Using (var db = new ObjectContext()) {
addresses = db.Users.Addresses.Where(addr => addr.Number > 1000).ToList();
}
addresses.Select(addr => Console.WriteLine(addr.City.Name));
This will force linq to execute the expression against db and get all addresses with number greater than a thousand. this is all good if you need to access a field within the addresses table, but since we want to get the name of a city (a 1..1 relationship similar to yours), we'll hit another bump before it can run: lazy loading.
Entity framework lazy loads entities by default, so nothing is fetched from the database until needed. Again, this speeds things up considerably, since without it every call to the database could potentially bring the whole database into memory; but has the problem of depending on the context being available.
You could set EF to eager load (in your model, go to properties and set 'Lazy Loading Enabled' to False), but that would bring in a lot of info you probably don't use.
The best fix for this problem is to execute everything inside db's scope:
IQueryable<Address> addresses;
Using (var db = new ObjectContext()) {
addresses = db.Users.Addresses.Where(addr => addr.Number > 1000);
addresses.Select(addr => Console.WriteLine(addr.City.Name));
}
I know this is a really simple example but in the real world you can use a DI container like ninject to handle your dependencies and have your db available to you throughout execution of the app.
This leaves us with Include. Include will make IQueryable include all specified relation paths when building the sql statement:
IQueryable<Address> addresses;
Using (var db = new ObjectContext()) {
addresses = db.Users.Addresses.Include("City").Where(addr => addr.Number > 1000).ToList;
}
addresses.Select(addr => Console.WriteLine(addr.City.Name));
This will work, and it's a nice compromise between having to load the whole database and having to refactor an entire project to support DI.
Another thing you can do, is map multiple tables to a single entity. In your case, since the relationship is 1-0..1, you shouldn't have a problem doing it.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Entity Framework RemoveRange is very slow - c#

Related

How to reload database with Entity Framework in C#

How does Distinct manage to work on an IQueryable after mapping from entities to DTOs in EF Core?

Live Transfer of data from one provider to another in Entity Framework

Fluent Nhibernates loads the whole object dependencies in a for loop

Entity Framework - Reference not loading

Categories

Resources