I'm working in a C# project using the NHibernate ORM. I'm quite new to this framework.
I have the following table mapping:
this.Table("FCT_CONNECTR_TRANSF_MAP_CONCEP");
this.LazyLoad();
this.Id(x => x.Id).GeneratedBy.TriggerIdentity().Column("ID_CONNECTR_TRANSF_MAP_CONCEP");
this.References(x => x.ConnectorTransformation).Not.Nullable().Column("ID_CONNECTOR_TRANSFORMATION");
this.References(x => x.LookUpConcept).Nullable().Column("ID_MAP_CONCEPT").Cascade.All();
this.References(x => x.MapCustomSource).Column("ID_MAP_CUSTOM_SOURCE").Cascade.All();
this.References(x => x.MapCustomTarget).Column("ID_MAP_CUSTOM_TARGET").Cascade.All();
this.References(x => x.CreatedBy).Nullable().Column("CREATED_BY");
this.References(x => x.ModifiedBy).Column("MODIFIED_BY");
this.Map(x => x.DtCreated).Nullable().Column("DT_CREATED");
this.Map(x => x.DtModified).Column("DT_MODIFIED");
this.Map(x => x.Description).Column("DSC_MAP_CONCEPT");
And in my C# code, I have the following code snipplet.
foreach (var mapConcept in mapConcepts)
{
this.connectorTransformationMapConceptEntityRepository.Delete(mapConcept);
}
On the line this.connectorTransformationMapConceptEntityRepository.Delete(mapConcept); the ORM loads all dependencies (+5000 select queries).
My question: Why NHibernate needs to resolve all dependencies in order to delete the object?
There is mapping .Cascade.All() applied to your many-to-one reference mapping. That cascade setting instructs NHibernate: "load related data and delete them on delete".
This is reason, why NHibernate must load related stuff. If cascading is a feature we NEED - we can use some optimization, batch fetching:
19.1.5. Using batch fetching
NHibernate can make efficient use of batch fetching, that is, NHibernate can load several uninitialized proxies if one proxy is accessed (or collections. Batch fetching is an optimization of the lazy select fetching strategy. There are two ways you can tune batch fetching: on the class and the collection level.
Batch fetching for classes/entities is easier to understand. Imagine you have the following situation at runtime: You have 25 Cat instances loaded in an ISession, each Cat has a reference to its Owner, a Person. The Person class is mapped with a proxy, lazy="true". If you now iterate through all cats and call cat.Owner on each, NHibernate will by default execute 25 SELECT statements, to retrieve the proxied owners. You can tune this behavior by specifying a batch-size in the mapping of Person:
<class name="Person" batch-size="10">...</class>
NHibernate will now execute only three queries, the pattern is 10, 10, 5.
So, we can extend mapping of our classes MapCustomSource, MapCustomTarget with batch size:
Table(...)
Id(...)
BatchSize(25)
Or we can even change the approach and use some more efficient way how to delete more items without loading them at all:
13.3. DML-style operations
As already discussed, automatic and transparent object/relational mapping is concerned with the management of object state. This implies that the object state is available in memory, hence manipulating (using the SQL Data Manipulation Language (DML) statements: INSERT, UPDATE, DELETE) data directly in the database will not affect in-memory state. However, NHibernate provides methods for bulk SQL-style DML statement execution which are performed through the Hibernate Query Language (HQL).
An example of code DELETION without LOADING at all:
ISession session = sessionFactory.OpenSession();
ITransaction tx = session.BeginTransaction();
String hqlDelete = "delete Customer c where c.name = :oldName";
// or String hqlDelete = "delete Customer where name = :oldName";
int deletedEntities = s.CreateQuery( hqlDelete )
.SetString( "oldName", oldName )
.ExecuteUpdate();
tx.Commit();
session.Close();
The reason is due to cascading, when you delete the parent, all children will be deleted etc.
You may need to change from .Cascade().All() to .Cascade().SaveUpdate() instead if you don't want deletes to cascade.
You can find out more about cascade behavior at ayende.com
Related
I am deleting hundreds of rows from my table. Using ADO.Net code i.e. Delete from table where somecolumn = somestring takes less than a second while using Entity Framwork i.e.
MyDbContext context = new MyDbContext()
context.SomeEntity.RemoveRange(context.SomeEntity.Where(i => i.somecolumn == somestring));
context.SaveChanges();
is taking 8-10 seconds.
can anybody explain it or I am doing something wrong.
In these situations which you want to delete among of records (read thousand or more) , the most efficient way would be a so called "bulk delete". EFCore.BulkExtensions allows that. Some thing like below code:
var recordsToRemove = context.SomeEcontity.Where(i => i.somecolumn == somestring));
context.BulkDelete(recordsToRemove);
EF is designed around providing relational data mapped to an object model. It isn't ideally suited to large bulk operations. Still, you can facilitate one-off requirements like this a number of ways.
The first method would be to use stubs for the delete. For this you would want to ensure that the DbContext instance is "clean" of any tracked instances that might be deleted, so ideally a DbContext that is scoped to that method.
using (var context = new SomeDbContext())
{
var stubs = context.SomeEntities
.Where(x => x.SomeColumn == someString)
.Select(x => x.Id)
.ToList()
.Select(x => new SomeEntity { Id == x })
.ToList();
}
Now you might be able to simplify that down to:
var stubs = context.SomeEntities
.Where(x => x.SomeColumn == someString)
.Select(x => new SomeEntity { Id == x.Id })
.ToList();
However, you would probably want to test that to ensure that the resulting SQL is just selecting the ID (not the entire entity) and that context.SomeEntities.Local.Any() is still false... The first example will just ensure that the query load the IDs, then proceeds to build stub Entities using that ID. This makes the "selection" for our data as efficient as possible.
From here you should be able to use RemoveRange on the untracked stubs.
context.SomEntities.RemoveRamge(stubs);
context.SaveChanges();
The important detail is that the DbContext cannot be tracking any of these entities as this will temporarily attach these stubs to the DbContext. If the context was already tracking an instance with one of those ids then you would receive an error that one or more entities with the same ID was already being tracked. (Hence the locally scoped DbContext to avoid that)
The other way to perform this deletion would be to issue a direct SQL operation. If you have a DbContext that is scoped to a request or longer than this single operation then this should be done after dealing with any currently tracked instances.
Step 1. Deal with any tracked instances if you have an injected DbContext:
var trackedInstances = context.SomeEntities.Local
.Where(x => x.SomeColumn == someString);
.ToList();
if (trackedInstances.Any())
context.SomeInstances.RemoveRange(trackedInstances);
This will check the DbContext for any tracked instances without hitting the DB. We will want to remove these instances to avoid possibly having any of these marked as Modified and triggering an exception later during a SaveChanges call.
Step 2. Build and run a parameterized raw SQL statement to clean off all remaining rows in the DB.
context.Database.ExecuteSqlCommand(#"DELETE FROM dbo.SomeEntities
WHERE SomeColumn = #someString", new SqlParameter("someString", someString));
context.SaveChanges();
The important detail here is to use a parameterized query. Do not execute raw SQL with the parameters embedded in the string as this leaves the door open to SQL injection attacks.
I.e. do not use anything like:
context.Database.ExecuteSqlCommand($"DELETE FROM dbo.SomeEntities
WHERE SomeColumn = '{someString}'");
// or
context.Database.ExecuteSqlCommand("DELETE FROM dbo.SomeEntities
WHERE SomeColumn = '" + someString + "'");
I apologise if this has been asked already, I am struggling greatly with the terminology of what I am trying to find out about as it conflicts with functionality in Entity Framework.
What I am trying to do:
I would like to create an application that on setup gives the user to use 1 database as a "trial"/"startup" database, i.e. non-production database. This would allow a user to trial the application but would not have backups etc. in no way would this be a "production" database. This could be SQLite for example.
When the user is then ready, they could then click "convert to production" (or similar), and give it the target of the new database machine/database. This would be considered the "production" environment. This could be something like MySQL, SQLServer or.. whatever else EF connects to these days..
The question:
Does EF support this type of migration/data transfer live? Would it need another app where you could configure the EF source and EF destination for it to then run through the process of conversion/seeding/population of the data source to another data source?
Why I have asked here:
I have tried to search for things around this topic, but transferring/migration brings up subjects totally non-related, so any help would be much appreciated.
From what you describe I don't think there is anything out of the box to support that. You can map a DbContext to either database, then it would be a matter of fetching and detaching entities from the evaluation DbContext and attaching them to the production one.
For a relatively simple schema / object graph this would be fairly straight-forward to implement.
ICollection<Customer> customers = new List<Customer>();
using(var context = new AppDbContext(evalConnectionString))
{
customers = context.Customers.AsNoTracking().ToList();
}
using(var context = new AppDbContext(productionConnectionString))
{ // Assuming an empty database...
context.Customers.AddRange(customers);
}
Though for more complex models this could take some work, especially when dealing with things like existing lookups/references. Where you want to move objects that might share the same reference to another object you would need to query the destination DbContext for existing relatives and substitute them before saving the "parent" entity.
ICollection<Order> orders = new List<Order>();
using(var context = new AppDbContext(evalConnectionString))
{
orders = context.Orders
.Include(x => x.Customer)
.AsNoTracking()
.ToList();
}
using(var context = new AppDbContext(productionConnectionString))
{
var customerIds = orders.Select(x => x.Customer.CustomerId)
.Distinct().ToList();
var existingCustomers = context.Customers
.Where(x => customerIds.Contains(x.CustomerId))
.ToList();
foreach(var order in orders)
{ // Assuming all customers were loaded
var existingCustomer = existingCustomers.SingleOrDefault(x => x.CustomerId == order.Customer.CustomerId);
if(existingCustomer != null)
order.Customer = existingCustomer;
else
existingCustomers.Add(order.Customer);
context.Orders.Add(order);
}
}
This is a very simple example to outline how to handle scenarios where you may be inserting data with references that may, or may not exist in the target DbContext. If we are copying across Orders and want to deal with their respective Customers we first need to check if any tracked customer reference exists and use that reference to avoid a duplicate row being inserted or throwing an exception.
Normally loading the orders and related references from one DbContext should ensure that multiple orders referencing the same Customer entity will all share the same entity reference. However, to use detached entities that we can associate with the new DbContext via AsNoTracking(), detached references to the same record will not be the same reference so we need to treat these with care.
For example where there are 2 orders for the same customer:
var ordersA = context.Orders.Include(x => x.Customer).ToList();
Assert.AreSame(orders[0].Customer, orders[1].Customer); // Passes
var ordersB = context.Orders.Include(x => x.Customer).AsNoTracking().ToList();
Assert.AreSame(orders[0].Customer, orders[1].Customer); // Fails
Even though in the 2nd example both are for the same customer. Each will have a Customer reference with the same ID, but 2 different references because the DbContext is not tracking the references used. One of the several "gotchas" with detached entities and efforts to boost performance etc. Using tracked references isn't ideal since those entities will still think they are associated with another DbContext. We can detach them, but that means diving through the object graph and detaching all references. (Do-able, but messy compared to just loading them detached)
Where it can also get complicated is when possibly migrating data in batches (disposing of a DbContext regularly to avoid performance pitfalls for larger data volumes) or synchronizing data over time. It is generally advisable to first check the destination DbContext for matching records and use those to avoid duplicate data being inserted. (or throwing exceptions)
So simple data models this is fairly straight forward. For more complex ones where there is more data to bring across and more relationships between that data, it's more complicated. For those systems I'd probably look at generating a database-to-database migration such as creating INSERT statements for the desired target DB from the data in the source database. There it is just a matter of inserting the data in relational order to comply with the data constraints. (Either using a tool or rolling your own script generation)
I have this code to explicit loading for an entity:
dbContext.StorageRequests.Add(storageRequest);
dbContext.SaveChanges();
//Here I want to explict loading some navigation properties
dbContext.Entry(storageRequest).Reference(c => c.Manager).Load();
dbContext.Entry(storageRequest).Reference(c => c.Facility).Load();
dbContext.Entry(storageRequest).Collection(x=> x.PhysicalObjects).Query().Include(x => x.Classification).Load();
My question is two parts:
The first one how can I load all together (I want to call Load() once)?
The second part does the above code sends query for each Load() calling which in turn hit the database to load related data?
I had a similar question withEF core. Turning on SQL logging to the debugoutput window helped answer a lot of my questions as to what it was doing, and why. In terms of your questions:
1) You can't, though you can eager load it with a series of dbContext.Collection.Include(otherCollection).ThenInclude(stuffRelatedToOtherCollection) type chains
2) Yes it does, even eager loading in one c# statement bangs out multiple queries. I presumed this was because it's too much of a complex artificial intelligence problem to do it any way other than its most naive multiple-sql, because it's hard for the framework to deal with cartesian products when multiple tables are joined together in one rectangular dataset. (A school has students and teachers, teacher:students is a many:many relationship, decomposed by class. If you wrote one query to join school, class, student and teachers, you'd get repeated data all over the place and though conceptually possible to pick through it looking for unique school, class teacher and student primary key values, you could be downloading tens of thousands of duplicated rows only to have to unique them all again. EF tends to select the school ,then school join class, then school join class join students, then school join class join teachers (if that's how you coded your school include class theninclude students then include teachers. Changing your include strategy will change the queries that are run)
Nice question! Let me answer differently, in reverse order, with new info.
2.)
Each Load() will cause a query to the database as of the documentation (Querying and Finding Entities - 10/23/2016):
A query is executed against the database when:
It is enumerated by a foreach (C#) or For Each (Visual Basic) statement.
It is enumerated by a collection operation such as ToArray, ToDictionary, or ToList.
LINQ operators such as First or Any are specified in the outermost part of the query.
The following methods are called: the Load extension method on a DbSet, DbEntityEntry.Reload, and Database.ExecuteSqlCommand.
People often uses eager loading with Include() to let EF optimize as much as possible:
in most cases, EF will combine the joins when generating SQL
// ef 6
using System.Data.Entity;
var storageRequests = dbContext.StorageRequests
.Include(r => r.PhysicalObjects.Select(p => p.Classification))
.Include(r => r.Manager)
.Include(r => r.Facility);
// evaluate "storageRequests" here by linq method or foreach
or:
// ef core
var storageRequests = dbContext.StorageRequests
.Include(r => r.PhysicalObjects)
.ThenInclude(p => p.Classification)
.Include(r => r.Manager)
.Include(r => r.Facility);
// evaluate "storageRequests" here by linq method or foreach
1.)
Only possible way I can imagine is having above code with storageRequests.Load().
You could inspect whether it:
generates single/multiple queries,
loads navigation property data along StorageRequest.
FYI: these query executions are also called network roundtrips in microsoft docs:
Multiple network roundtrips can degrade performance, especially where latency to the database is high (for example, cloud services).
Point of interest:
There is a relative new option Single vs. Split Queries (10/03/2019) in .Net Core 5.
The default is single queries (behavior described above). After that you can decide to request/load data per table instead, by adding .AsSplitQuery() to your linq query, before the evaluation. Splitted queries increases roundtrips and memory usage (does not loads distinct data) but helps performance.
There is also .AsSingleQuery() if your global choice was:
.UseSqlServer(
connectionString,
o => o.UseQuerySplittingBehavior(QuerySplittingBehavior.SplitQuery));
i am new to linq, i started writing this query:
var dProjects = Projects
.Select(p => new Models.Project {
ProjectID = p.ProjectID,
Status = p.Status,
ExpiresOn = p.ExpiresOn,
LatestComments = p.ProjectComments
.OrderByDescending(pc => pc.CreatedOn)
.Select(pc => pc.Comments)
.FirstOrDefault(),
ProjectFileIDs = p.ProjectFiles
.Select(pf => pf.BinaryFileID)
.AsQueryable()
})
.AsQueryable<Models.Project>();
I already know this query will perform really slow because related entities like ProjectComments and ProjectFiles will create nested selects, though it works and gives me right results that i need.
How can i optimize this query and get the same results? One of my guesses would be using inner join but ProjectComments and ProjectFiles already has a relationship in database through keys, so not sure what we can achieve by setting the relationship again.
Basically, need to know which is the best approach to take here from performance perspective. One thing to note is i am sorting ProjectComments and only taking the most recent one. Should i be using combination of join and group by into? Help will be much appreciated. Thanks.
UPDATED:
Sorry, if i wasn't clear enough on what i am trying to do. Basically, in front end, i have a grid, which shows list of projects with latest project comments and list of all the files associated to project, so users can click on those links and actually open those documents. So the query that i have above is working and it does show the following in the grid:
Project ID (From Project table)
Status (From Project table)
ExpiresOn (From Project table)
LatestComments (latest entry from ProjectComments table which has project ID as foreign key)
ProjectFileIDs (list of file ids from ProjectFiles table which has Project ID as foreign key - i am using those File IDs and creating links so users can open those files).
So everything is working, i have it all setup, but the query is little slow. Right now we have very little data (only test data), but once this is launched, i am expecting lot of users/data and thus i want to optimize this query to the best, before it goes live. So, the goal here is to basically optimize. I am pretty sure this is not the best approach, because this will create nested selects.
In Entity Framework, you can drastically improve the performance of the queries by returning the objects back as an object graph instead of a projection. Entity Framework is extremely efficient at optimizing all but the most complex SQL queries, and can take advantage of deferred "Eager" loading vs. "Lazy" Loading (not loading related items from the db until they are actually accessed). This MSDN reference is a good place to start.
As far as your specific query is concerned, you could use this technique something like the following:
var dbProjects = yourContext.Projects
.Include(p => p.ProjectComments
.OrderByDescending(pc => pc.CreatedOn)
.Select(pc => pc.Comments)
.FirstOrDefault()
)
.Include(p => p.ProjectFileIDs)
.AsQueryable<Models.Project>();
note the .Include() being used to imply Eager Loading.
From the MDSN Reference on Loading Related Objects,
Performance Considerations
When you choose a pattern for loading related entities, consider the behavior of each approach with regard to the number and timing of connections made to the data source versus the amount of data returned by and the complexity of using a single query. Eager loading returns all related entities together with the queried entities in a single query. This means that, while there is only one connection made to the data source, a larger amount of data is returned in the initial query. Also, query paths result in a more complex query because of the additional joins that are required in the query that is executed against the data source.
Explicit and lazy loading enables you to postpone the request for related object data until that data is actually needed. This yields a less complex initial query that returns less total data. However, each successive loading of a related object makes a connection to the data source and executes a query. In the case of lazy loading, this connection occurs whenever a navigation property is accessed and the related entity is not already loaded.
Do you get any boost in performance if you add Include statements before the Select?
Example:
var dProjects = Projects
.Include(p => p.ProjectComments)
.Include(p => p.ProjectFiles)
Include allows all matching ProjectComments and ProjectFiles to be eagerly loaded. See Loading Related Entities for more details.
I wonder if there is a possibility to eager load related entities for certain subclass of given class.
Class structure is below
Order has relation to many base suborder classes (SuborderBase). MySubOrder class inherits from SuborderBase. I want to specify path for Include() to load MySubOrder related entities (Customer) when loading Order, but I got an error claiming that there is no relation between SuborderBase and Customer. But relation exists between MySubOrder and Customer.
Below is query that fails
Context.Orders.Include("SubOrderBases").Include("SubOrderBases.Customers")
How can I specify that explicitly?
Update. Entity scheme is below
This is a solution which requires only a single roundtrip:
var orders = Context.Orders
.Select(o => new
{
Order = o,
SubOrderBases = o.SubOrderBases.Where(s => !(s is MyOrder)),
MyOrdersWithCustomers = o.SubOrderBases.OfType<MyOrder>()
.Select(m => new
{
MyOrder = m,
Customers = m.Customers
})
})
.ToList() // <- query is executed here, the rest happens in memory
.Select(a =>
{
a.Order.SubOrderBases = new List<SubOrderBase>(
a.SubOrderBases.Concat(
a.MyOrdersWithCustomers.Select(m =>
{
m.MyOrder.Customers = m.Customers;
return m.MyOrder;
})));
return a.Order;
})
.ToList();
It is basically a projection into an anonymous type collection. Afterwards the query result is transformed into entities and navigation properties in memory. (It also works with disabled tracking.)
If you don't need entities you can omit the whole part after the first ToList() and work directly with the result in the anonymous objects.
If you must modify this object graph and need change tracking, I am not sure if this approach is safe because the navigation properties are not completely set when the data are loaded - for example MyOrder.Customers is null after the projection and then setting relationship properties in memory could be detected as a modification which it isn't and cause trouble when you call SaveChanges.
Projections are made for readonly scenarios, not for modifications. If you need change tracking the probably safer way is to load full entities in multiple roundtrips as there is no way to use Include in a single roundtrip to load the whole object graph in your situation.
Suppose u loaded the orders list as lstOrders, try this:
foreach (Orders order in lstOrders)
order.SubOrderBases.Load();
and the same for the customers..