Is it possible to manually assign an existing object to an Entity Framework (db first) object's navigation property?
The context to the question is that I have a problem in trying to bring back a (heavily filtered) list of objects with all the children and descendants attached so that the full graph is available in memory after the context is disposed.
I tried doing this via .Include() statements using something like this:
using (var ctx = new MyEntities())
{
myParents = ctx.Parents
.Where(p => MyFilter(p))
.Include(p => p.Children)
.Include(p => p.Children.Select(c=>c.Grandchildren))
.Include(p => p.Children.Select(c=>c.Grandchildren.Select(g=>g.GreatGrandChildren)));
}
but the generated query runs too slowly because of the known performance problems with using nested include statements (as explained in lots of places including this blog).
I can pull back the parents, children and Grandchildren without performance issues - I only hit the troubles when I include the very last .Include() statement for the greatgrandchildren.
I can easily get the GreatGrandChildren objects back from the database with a second separate query by building a list of GrandChildrenIds from the GrandChildren already retrieved and doing something like:
greatGrandKids = ctx.GreatGrandChildren.Where(g=>ids.Contains(g.GrandChildId)).ToList();
but now, once I dispose of the context, I cannot do something like grandChildA.GreatGrandChildren without hitting the object context disposed exception.
I could have as many as a few thousand GrandChildren objects so I really want to avoid a round trip to the database to fetch the GreatGrandChildren for each one which rules out simply using .Load() on each GrandChild object, right?
I could feasibly work around this by either just looking up the required greatgrandchildren from greatGrandKids each time I needed them in my subsequent code or even by adding a new (non-mapped) Property such as .GreatGrandChildrenLocal to the GrandChild class and assigning them all up front but these both feel very kludgy & ugly. I'd MUCH prefer to find a way to just be able to access the existing .GreatGrandChildren navigation property on each GrandChild object.
Trying the obvious of assigning to the navigation property with something like this:
grandchild.GreatGrandChildren = greatGrandKids
.Where(g=>g.GrandChildId == grandChild.Id)
.ToList();
fails too when I then try to access grandchild.GreatGrandChildren (still giving the object disposed exception).
So my question is:
Is there a way I can assign the existing GreatGrandChdildren objects I have already retrieved from the database to the .GreatGrandChdildren navigation property on the GrandChild object in such a way as to make them available (only needed for read operations) after the context is disposed?
(Or indeed is there a different solution to the problem?)
If you disable proxy creation with:
ctx.Configuration.ProxyCreationEnabled = false;
then reading and writing from/to the navigation property works exactly as expected without trying to lazily load the entities and throwing the object disposed exception.
So we have something like:
using (var ctx = new MyEntities())
{
myParents = ctx.Parents
.Where(p => MyFilter(p))
.Include(p => p.Children)
.Include(p => p.Children.Select(c=>c.Grandchildren));
//skip the final GreatGrandChildren include statement
//get the associated grandchildren & their ids:
var grandKids = myParents.SelectMany(p=>p.Children)
.SelectMany(c=>c.Grandchildren)
.ToList();
var ids = grandKids.Select(g=>g.Id)).ToList();
//Get the great grandkids:
var greatGrandKids = ctx.GreatGrandChildren
.Where(g=>ids.Contains(g.GrandChildId)).ToList();
//Assign the greatgrandchildren to the grandchildren:
foreach (grandChild in grandKids)
{
grandChild.GreatGrandChildren = greatGrandKids
.Where(g=>g.GrandChildId == grandChild.Id)
.ToList();
}
}
and now we can access the the .GreatGrandChildren property outside the context without hitting the context disposed exception. Whilst this still feels a little messy, it works out MUCH cheaper than either using the original Include() statement or calling .Load() on each GrandChild.
N.B. As these objects are only used in read operations and I don't need Lazy Loading then there are no negative implications to turning off proxy creation in my circumstances. If write operations and/or lazy loading were also necessary then we would also need to consider the implications of turning this off for the given EF context.
Related
I have the method below to load dependent data from navigation property. However, it generates an error. I can remove the error by adding ToList() or ToArray(), but I'd rather not do that for performance reasons. I also cannot set the MARS property in my web.config file because it causes a problem for other classes of the connection.
How can I solve this without using extension methods or editing my web.config?
public override void Load(IEnumerable<Ques> data)
{
if (data.Any())
{
foreach (var pstuu in data)
{
if (pstuu?.Id_user != null)
{
db.Entry(pstuu).Reference(q => q.Users).Load();
}
}
}
}
I take it from this question you've got a situation something like:
// (outside code)
var query = db.SomeEntity.Wnere(x => x.SomeCondition == someCondition);
LoadDependent(query);
Chances are based on this method it's probably a call stack of various methods that build search expressions and such, but ultimately what gets passed into LoadDependent() is an IQueryable<TEntity>.
Instead if you call:
// (outside code)
var query = db.SomeEntity.Wnere(x => x.SomeCondition == someCondition);
var data = query.ToList();
LoadDependent(data);
Or.. in your LoadDependent changing doing something like:
base.LoadDependent(data);
data = data.ToList();
or better,
foreach (Ques qst in data.ToList())
Then your LoadDependent() call works, but in the first example you get an error that a DataReader is open. This is because your foreach call as-is would be iterating over the IQueryable meaning EF's data reader would be left open so further calls to db, which I'd assume is a module level variable for the DbContext that is injected, cannot be made.
Replacing this:
db.Entry(qst).Reference(q => q.AspNetUsers).Load();
with this:
db.Entry(qst).Reference(q => q.AspNetUsers).LoadAsync();
... does not actually work. This just delegates the load call asynchronously, and without awaiting it, it too would fail, just not raise the exception on the continuation thread.
As mentioned in the comments to your question this is a very poor design choice to handle loading references. You are far, far better off enabling lazy loading and taking the Select n+1 hit if/when a reference is actually needed if you aren't going to implement the initial fetch properly with either eager loading or projection.
Code like this forces a Select n+1 pattern throughout your code.
A good example of loading a "Ques" with it's associated User eager loaded:
var ques = db.Ques
.Include(x => x.AspNetUsers)
.Where(x => x.SomeCondition == someCondition)
.ToList();
Whether "SomeCondition" results in 1 Ques returned or 1000 Ques returned, the data will execute with one query to the DB.
Select n+1 scenarios are bad because in the case where 1000 Ques are returned with a call to fetch dependencies you get:
var ques = db.Ques
.Where(x => x.SomeCondition == someCondition)
.ToList(); // 1 query.
foreach(var q in ques)
db.Entry(q).Reference(x => x.AspNetUsers).Load(); // 1 query x 1000
1001 queries run. This compounds with each reference you want to load.
Which then looks problematic where later code might want to offer pagination such as to take only 25 items where the total record count could run in the 10's of thousands or more. This is where lazy loading would be the lesser of two Select n+1 evils, as with lazy loading you know that AspNetUsers would only be selected if any returned Ques actually referenced it, and only for those Ques that actually reference it. So if the pagination only "touched" 25 rows, Lazy Loading would result in 26 queries. Lazy loading is a trap however as later code changes could inadvertently lead to performance issues appearing in seemingly unrelated areas as new referenences or code changes result in far more references being "touched" and kicking off a query.
If you are going to pursue a LoadDependent() type method then you need to ensure that it is called as late as possible, once you have a known set size to load because you will need to materialize the collection to load related entities with the same DbContext instance. (I.e. after pagination) Trying to work around it using detached instances (AsNoTracking()) or by using a completely new DbContext instance may give you some headway but will invariably lead to more problems later, as you will have a mix of tracked an untracked entities, or worse, entities tracked by different DbContexts depending on how these loaded entities are consumed.
An alternative teams pursue is rather than a LoadReference() type method would be an IncludeReference() type method. The goal here being to build .Include statements into the IQueryable. This can be done two ways, either by magic strings (property names) or by passing in expressions for the references to include. Again this can turn into a bit of a rabbit hole when handling more deeply nested references. (I.e. building .Include().ThenInclude() chains.) This avoids the Select n+1 issue by eager loading the required related data.
I have solved the problem by deletion the method Load and I have used Include() in my first query of data to show the reference data in navigation property
I would like to keep the lifespan of my DbContext as short as possible, so I take advantage of the using statement.
However, since Entity Framework tracks the entities, I cannot do something like this :
public IEnumerable<Person> GetPersons()
{
using (var db = new AppContext())
{
Logging.Log(Information, "Requested getPersons from service");
return db.Persons;
}
}
Because as soon as I use the list of Persons in my ViewModel, I will receive an InvalidOperationException (the context has been disposed before I use the object).
To bypass that, I convert the result of the query to a concrete List, by calling ToList()
public IEnumerable<Person> GetPersons()
{
using (var db = new AppContext())
{
Logging.Log(Information, "Requested getPersons from service");
return db.Persons.ToList();
}
}
Is this an okay thing to do ? AsNoTracking() doesn't have any effects because it still returns some entity that gets dropped if the context is disposed.
When it comes to Entities you have two options. One is simple, and the other looks even simpler but is a whole lot more complex.
The simple option: An entity should never be referenced beyond the scope of it's DBContext. This means in something like an MVC app, entities don't get sent to a view. They don't get serialized, they are consumed solely within the scope of their DbContext.
The key to this approach is using POCO view models and projection, (Select and Automapper's ProjectTo) and when dealing with various common layers and such, adopting something like a Unit of Work pattern to manage the lifetime scope of a DbContext.
The deceptively simple option: Detached entities. Simply ensuring the DbContext is alive when returning entities can be enough to solve your problem. A Unit of Work pattern can be leveraged for this approach as well. While it looks simple enough, it is loaded with pitfalls. Serializing an entity for instance to pass back to a view will trigger lazy load calls as the serializer traverses the entity(ies). Passing entities back to be persisted is also problematic as you have to be cautious of trying to re-attach entities where the Context may already be tracking leading to situational runtime errors, overwriting data with stale copies, and exposing your system to unintended tampering if overly trusting the passed in entity. From a performance standpoint, this is also the worst option as you will be serializing and transmitting entire entity graphs back and forth rather than just the data a view needs.
Fixing your problem with option 2 is generally as easy as Eager Loading anything the calling code might touch. For example, if an Person has a reference to an Address:
public IEnumerable<Person> GetPersons()
{
using (var db = new AppContext())
{
Logging.Log(Information, "Requested getPersons from service");
return db.Persons
.Include(x => x.Address)
.ToList();
}
}
However, here is where the complexity creeps in. This requires the method to know about how the Persons might be consumed. Person might have references to five or more other objects/sets, and those objects may have references to more. (I.e. Address having a reference to a Country entity, or AddressType, etc.) The catch-all fix ends up being to eager load everything, which results in a lot of unnecessary data and is prone to bugs as entities evolve and code grows to rely on different bits. Then when performance becomes a problem you start diving down a rabbit hole in trying to support the caller telling this method what to eager load. As the system grows and you want to support pagination, sorting, etc. the method becomes more and more complex or it becomes slower and slower.
The solution I advocate for is to ensure that entities never cross the boundary of their DbContext, and leveraging Linq/EF's IQueryable implementation in combination with a unit of work so that consumers of these repository or service methods can be responsible for the scope of the DbContext. For example:
private AppDbContext Context
{
get { return AmbientDbContextLocator.Get<AppDbContext>(); }
}
public IQueryable<Person> GetPersons()
{
Logging.Log(Information, "Requested getPersons from service");
return Context.Persons.AsQueryable();
}
Then in calling code:
var mapperConfig = new MapperConfiguration(cfg =>
{
cfg.CreateMap<Person, PersonViewModel>(); // Can include any relevant mapping to flatten needed fields...
});
using( var contextScope = ContextScopeFactory.Create())
{
var viewModels = PersonRepository.GetPersons()
.ProjectTo<PersonViewModel>(mapperConfig)
.ToList();
return View(viewModel);
}
Often you will have low-level rules such as defaulting to only returning Active rows. .AsQueryable() is only needed if you have no rules. Anything that results in a .Where or such inside the repository/service method returns IQueryable so it could return:
return Context.Persons.Where(x => x.IsActive);
Repositories can enforce low level filtering rules like IsActive, authorization checks, etc. leaving more variable filtering, sorting, etc. up to the consumers.
The advantage of this approach is that the consumer (Controller, etc.) has full control over how the data is consumed without introducing any complexity/business logic into the repository/service. If I want to support sorting and pagination:
var viewModels = PersonRepository.GetPersons()
.OrderBy(x => x.Age)
.ProjectTo<PersonViewModel>(mapperConfig)
.Skip(pageNumber * pageSize)
.Take(pageSize)
.ToList();
The key point is that the entity (Person) doesn't leave the scope of the unit of work / DbContext, it is projected into a ViewModel which holds no references to entities, only data. That model can be safely serialized and represents only the data that the view needs. For projection I've used Automappper's ProjectTo as an example. You can use Linq's Select as a more manual option. The unit of work pattern I use and have outlined above is Mehdime's DbContextScope.
If you just return db.Persons which is an IEnumerable<Person> to some caller and then dispose the db context before that caller can iterate the result, you will get the exception, due to the lazy nature by default of entity framework. ToList gets around this by forcing iteration immediately before the context is disposed, but still has overhead for object tracking, so combining ToList with AsNoTracking gets you the best performance.
AsNoTracking removes some internal tracking code inside the db context, it does not allow you to pass out the IEnumerable<Person> after disposing the context, you still need ToList for that.
Use the extension method AsNoTracking and then call ToList:
db.Persons.AsNoTracking().ToList();
Or you can disable change tracking on the change tracker property of the db context to make the whole thing read only, which avoids having to call AsNoTracking everywhere. You still need ToList with this approach.
/// <summary>
/// Disable all change tracking - place in db context sub class
/// </summary>
public void DisableChangeTracking()
{
ChangeTracker.AutoDetectChangesEnabled = false;
ChangeTracker.LazyLoadingEnabled = false;
ChangeTracker.QueryTrackingBehavior = QueryTrackingBehavior.NoTracking;
}
It is acceptable to do so. The problem with IEnumerable is that Entity Framework supports deferred execution, so they won't be populated until you fetch them to a concrete collection (i.e. .ToList()), which could happen after it's been disposed of.
Your 2 possibilities are:
Executing the work on the deferred object by fetching it to a concrete collection (.ToList()).
Using dependency injection with a scoped Lifetime to let it live until the request completes.
Both are correct, and which one to prefer depends on your use-case.
I have an entity hierarchy with several layers, one of which contains objects that can number in the 10s of thousands. There are occasions when I want only the top-level object but I'm finding that the Entity Framework is loading everything in the hierarchy.
I've even tried explicit lazy loading, to no avail.
using (var db = new MyEntities())
{
db.Configuration.ProxyCreationEnabled = true;
db.Configuration.LazyLoadingEnabled = true;
var daoDict = (from d in db.stt_dictionary
where d.id == dictionaryID && !d.deleted
select d).FirstOrDefault();
}
While debugging, if I step through this and then hover over daoDict I find that its collection properties (which are virtual) contain thousands of objects.
Why?
Fetching them using the debugger is going to load them. The debugger isn't doing anything any different than regular code would. It's calling the getter of the property, and doing that fetches the data.
Log the database queries that are actually executed (either through the context or through the database) to see what data is being pulled when in a way that doesn't actually change what queries are being executed.
I have a model-first, entity framework design like this (version 4.4)
When I load it using code like this:
PriceSnapshotSummary snapshot = db.PriceSnapshotSummaries.FirstOrDefault(pss => pss.Id == snapshotId);
the snapshot has loaded everything (that is SnapshotPart, Quote, QuoteType), except the DataInfo. Now looking into the SQL this appears to be because Quote has no FK to DataInfo because of the 0..1 relationship.
However, I would have expected that the navigation property 'DataInfo' on Quote would still go off to the database to fetch it.
My current work around is this:
foreach (var quote in snapshot.ComponentQuotes)
{
var dataInfo = db.DataInfoes.FirstOrDefault(di => di.Quote.Id == quote.InstrumentQuote.Id);
quote.InstrumentQuote.DataInfo = dataInfo;
}
Is there a better way to achieve this? I thought EF would automatically load the reference?
This problem has to do with how the basic linq building blocks interact with Entity Framework.
take the following (pseudo)code:
IQueryable<Address> addresses;
Using (var db = new ObjectContext()) {
addresses = db.Users.Addresses.Where(addr => addr.Number > 1000);
}
addresses.Select(addr => Console.WriteLine(addr.City.Name));
This looks OK, but will throw a runtime error, because of an interface called IQueryable.
IQueryable implements IEnumerable and adds info for an expression and a provider. This basically allows it to build and execute sql statements against a database and not have to load whole tables when fetching data and iterating over them like you would over an IEnumerable.
Because linq defers execution of the expression until it's used, it compiles the IQueryable expression into SQL and executes the database query only right before it's needed. This speeds up things a lot, and allows for expression chaining without going to the database every time a Where() or Select() is executed. The side effect is if the object is used outside the scope of db, then the sql statement is executed after db has been disposed of.
To force linq to execute, you can use ToList, like this:
IQueryable<Address> addresses;
Using (var db = new ObjectContext()) {
addresses = db.Users.Addresses.Where(addr => addr.Number > 1000).ToList();
}
addresses.Select(addr => Console.WriteLine(addr.City.Name));
This will force linq to execute the expression against db and get all addresses with number greater than a thousand. this is all good if you need to access a field within the addresses table, but since we want to get the name of a city (a 1..1 relationship similar to yours), we'll hit another bump before it can run: lazy loading.
Entity framework lazy loads entities by default, so nothing is fetched from the database until needed. Again, this speeds things up considerably, since without it every call to the database could potentially bring the whole database into memory; but has the problem of depending on the context being available.
You could set EF to eager load (in your model, go to properties and set 'Lazy Loading Enabled' to False), but that would bring in a lot of info you probably don't use.
The best fix for this problem is to execute everything inside db's scope:
IQueryable<Address> addresses;
Using (var db = new ObjectContext()) {
addresses = db.Users.Addresses.Where(addr => addr.Number > 1000);
addresses.Select(addr => Console.WriteLine(addr.City.Name));
}
I know this is a really simple example but in the real world you can use a DI container like ninject to handle your dependencies and have your db available to you throughout execution of the app.
This leaves us with Include. Include will make IQueryable include all specified relation paths when building the sql statement:
IQueryable<Address> addresses;
Using (var db = new ObjectContext()) {
addresses = db.Users.Addresses.Include("City").Where(addr => addr.Number > 1000).ToList;
}
addresses.Select(addr => Console.WriteLine(addr.City.Name));
This will work, and it's a nice compromise between having to load the whole database and having to refactor an entire project to support DI.
Another thing you can do, is map multiple tables to a single entity. In your case, since the relationship is 1-0..1, you shouldn't have a problem doing it.
I wonder if there is a possibility to eager load related entities for certain subclass of given class.
Class structure is below
Order has relation to many base suborder classes (SuborderBase). MySubOrder class inherits from SuborderBase. I want to specify path for Include() to load MySubOrder related entities (Customer) when loading Order, but I got an error claiming that there is no relation between SuborderBase and Customer. But relation exists between MySubOrder and Customer.
Below is query that fails
Context.Orders.Include("SubOrderBases").Include("SubOrderBases.Customers")
How can I specify that explicitly?
Update. Entity scheme is below
This is a solution which requires only a single roundtrip:
var orders = Context.Orders
.Select(o => new
{
Order = o,
SubOrderBases = o.SubOrderBases.Where(s => !(s is MyOrder)),
MyOrdersWithCustomers = o.SubOrderBases.OfType<MyOrder>()
.Select(m => new
{
MyOrder = m,
Customers = m.Customers
})
})
.ToList() // <- query is executed here, the rest happens in memory
.Select(a =>
{
a.Order.SubOrderBases = new List<SubOrderBase>(
a.SubOrderBases.Concat(
a.MyOrdersWithCustomers.Select(m =>
{
m.MyOrder.Customers = m.Customers;
return m.MyOrder;
})));
return a.Order;
})
.ToList();
It is basically a projection into an anonymous type collection. Afterwards the query result is transformed into entities and navigation properties in memory. (It also works with disabled tracking.)
If you don't need entities you can omit the whole part after the first ToList() and work directly with the result in the anonymous objects.
If you must modify this object graph and need change tracking, I am not sure if this approach is safe because the navigation properties are not completely set when the data are loaded - for example MyOrder.Customers is null after the projection and then setting relationship properties in memory could be detected as a modification which it isn't and cause trouble when you call SaveChanges.
Projections are made for readonly scenarios, not for modifications. If you need change tracking the probably safer way is to load full entities in multiple roundtrips as there is no way to use Include in a single roundtrip to load the whole object graph in your situation.
Suppose u loaded the orders list as lstOrders, try this:
foreach (Orders order in lstOrders)
order.SubOrderBases.Load();
and the same for the customers..