Given an object graph like:
A { IEnum<B> }
B { IEnum<C>, IEnum<D>, IEnum<E>, ... }
C { IEnum<X> }
How can I eagerly load the entire object graph without N+1 issues?
Here is the pseudo code for the queries that I would ultimately like to execute:
var a = Session.Get<A>(1); // Query 1
var b_Ids = foreach(b in A.B's) => Select(b.Id); // Query 2
var c = Session.CreateQuery("from C where B in (b_Ids)").Future<C>(); // Query 3
var d = Session.CreateQuery("from D where B in (b_Ids)").Future<D>(); // Query 3
var e = Session.CreateQuery("from E where B in (b_Ids)").Future<E>(); // Query 3
// Iterate through c, d, e, ... find the correct 'B' parent, add to collection manually
The problem that I have with this approach is that when I go to add the instances of 'C', 'D', and 'E' to the corresponding collection of the parent 'B', the collection is still proxied, and when .Add() is called, the proxy initializes itself and executes more queries; I think NHibernate is not capable of seeing that I already have all of the data in first level cache, which is understandable.
I've tried to work around this problem by doing something like this in my Add method:
void Add(IEnum<C>)
{
_collection = new Collection<C>(); // replace the proxied instance to prevent initialization
foreach(c) => _collection.Add(c);
}
This gave me the optimum query strategy that I wanted, but caught up with me later when doing persistence (NHibernate tracks the original collection by-ref somewhere from what I can tell).
So my question is, how can I load a complex graph with children of children without N+1? The only thing I've come across to date is joining B-C, B-D, B-E which is not acceptable in my situation.
We are using NH 2.1.2 with FluentHN for mapping. An upgrade to v3 of NH or using hbm's/stored procs/whatever would not be off the table.
UPDATE:
One of the comments references a join approach, and I did come across a blog that demonstrates this approach. This work around is not acceptable in our situation, but it may help someone else: Eager fetch multiple child collections in 1 round trip with NHibernate
UPDATE 2:
Jordan's answer led me to the following posts that are related to my question: Similar Question and Ayende's blog. The pending question at this point is "how can you perform the subselects without a round trip per-path".
UPDATE 3:
I've accepted Jordan's answer even though the subselect solution is not optimal.
You can use SubSelect fetching which can be setup in the mapping files. This will avoid N+1 and cartesian product.
firstly- you can change your mappings to load these collections eagerly. see item #4 in this section.
secondly- I believe that the reason that your collection seems to be loading twice is that you first fetch it using a query, and then using the collection property.
meaning- nHibernate distinguishes between queries generated by the user (like the one you use) and queries it generates itself (like the one that occurs when you first read your 'C' collection). they do not mix.
so, when you first read your 'C' collection, nHib does not recognize that it actually once sent the exact same query to the DB (since it was a user query), and sends it again.
The way to avoid this is to retrieve your C collection via your B entity.
Related
I have the method below to load dependent data from navigation property. However, it generates an error. I can remove the error by adding ToList() or ToArray(), but I'd rather not do that for performance reasons. I also cannot set the MARS property in my web.config file because it causes a problem for other classes of the connection.
How can I solve this without using extension methods or editing my web.config?
public override void Load(IEnumerable<Ques> data)
{
if (data.Any())
{
foreach (var pstuu in data)
{
if (pstuu?.Id_user != null)
{
db.Entry(pstuu).Reference(q => q.Users).Load();
}
}
}
}
I take it from this question you've got a situation something like:
// (outside code)
var query = db.SomeEntity.Wnere(x => x.SomeCondition == someCondition);
LoadDependent(query);
Chances are based on this method it's probably a call stack of various methods that build search expressions and such, but ultimately what gets passed into LoadDependent() is an IQueryable<TEntity>.
Instead if you call:
// (outside code)
var query = db.SomeEntity.Wnere(x => x.SomeCondition == someCondition);
var data = query.ToList();
LoadDependent(data);
Or.. in your LoadDependent changing doing something like:
base.LoadDependent(data);
data = data.ToList();
or better,
foreach (Ques qst in data.ToList())
Then your LoadDependent() call works, but in the first example you get an error that a DataReader is open. This is because your foreach call as-is would be iterating over the IQueryable meaning EF's data reader would be left open so further calls to db, which I'd assume is a module level variable for the DbContext that is injected, cannot be made.
Replacing this:
db.Entry(qst).Reference(q => q.AspNetUsers).Load();
with this:
db.Entry(qst).Reference(q => q.AspNetUsers).LoadAsync();
... does not actually work. This just delegates the load call asynchronously, and without awaiting it, it too would fail, just not raise the exception on the continuation thread.
As mentioned in the comments to your question this is a very poor design choice to handle loading references. You are far, far better off enabling lazy loading and taking the Select n+1 hit if/when a reference is actually needed if you aren't going to implement the initial fetch properly with either eager loading or projection.
Code like this forces a Select n+1 pattern throughout your code.
A good example of loading a "Ques" with it's associated User eager loaded:
var ques = db.Ques
.Include(x => x.AspNetUsers)
.Where(x => x.SomeCondition == someCondition)
.ToList();
Whether "SomeCondition" results in 1 Ques returned or 1000 Ques returned, the data will execute with one query to the DB.
Select n+1 scenarios are bad because in the case where 1000 Ques are returned with a call to fetch dependencies you get:
var ques = db.Ques
.Where(x => x.SomeCondition == someCondition)
.ToList(); // 1 query.
foreach(var q in ques)
db.Entry(q).Reference(x => x.AspNetUsers).Load(); // 1 query x 1000
1001 queries run. This compounds with each reference you want to load.
Which then looks problematic where later code might want to offer pagination such as to take only 25 items where the total record count could run in the 10's of thousands or more. This is where lazy loading would be the lesser of two Select n+1 evils, as with lazy loading you know that AspNetUsers would only be selected if any returned Ques actually referenced it, and only for those Ques that actually reference it. So if the pagination only "touched" 25 rows, Lazy Loading would result in 26 queries. Lazy loading is a trap however as later code changes could inadvertently lead to performance issues appearing in seemingly unrelated areas as new referenences or code changes result in far more references being "touched" and kicking off a query.
If you are going to pursue a LoadDependent() type method then you need to ensure that it is called as late as possible, once you have a known set size to load because you will need to materialize the collection to load related entities with the same DbContext instance. (I.e. after pagination) Trying to work around it using detached instances (AsNoTracking()) or by using a completely new DbContext instance may give you some headway but will invariably lead to more problems later, as you will have a mix of tracked an untracked entities, or worse, entities tracked by different DbContexts depending on how these loaded entities are consumed.
An alternative teams pursue is rather than a LoadReference() type method would be an IncludeReference() type method. The goal here being to build .Include statements into the IQueryable. This can be done two ways, either by magic strings (property names) or by passing in expressions for the references to include. Again this can turn into a bit of a rabbit hole when handling more deeply nested references. (I.e. building .Include().ThenInclude() chains.) This avoids the Select n+1 issue by eager loading the required related data.
I have solved the problem by deletion the method Load and I have used Include() in my first query of data to show the reference data in navigation property
I'm trying to find documentation about a behavior in Entity Framework. It works but before relying on this behavior, I want to be sure it's a normal behavior of EF and not a unexpected side effect that would be "fixed" in a future version. Here's the situation :
I've a pretty deep object hierachy (which I will simplify here). The structure is a multi levels collection of objects (class A contains a collection of class B which contains a collection of class C, which contains ...) for 7 levels deep.
I've to filter elements on some properties of C and my first trials of doing it loading the complete hierachy produce a complicated LINQ query (which would be a maintenance nightmare) and the generated SQL query was far from efficient. To simplify all this, I decided to split the query in 2 steps : first, I load the collection of class C (and all its childs) filtered as I want and then, I load class A and B for all instances of B that contains an item of my filtered collection of C.
Here's the catch : using that technique, I expected to have to repopulate manually the collection of C in my B class but actually, the collection is already populated with the elements of the collection. I verified the SQL query in intellitrace and the data required to fill instances of C is not included in the second query so the only logical conclusion is that EF did this from the informations in the context. BTW, lazy loading is turned off for that context.
Is this behavior normal in EF ? It so, can you give me link to the documentation explaining how this works ?
Here's a snippet to illustrate this :
using(var context = new MyContext())
{
//Includes and where clauses are greatly simplified for the purpose of the sample
var filteredC = context.C.Include(x=>x.ListOfD).Include(x=>x.ListOfD.Select(y=>y.ListOfE)).Where(c=>c.Status==Status).ToList();
int[] bToLoad = filteredC.Select(c=>c.IDofB).Distinct().ToArray();
var listOfAAndB = context.A.Include(a=>a.ListOfB).Where(x=>x.ListOfB.Any(y=>bToLoad.Contains(y.ID))).ToList();
//At this step, I expected B.ListOfC to be empty but it's somehow populated
}
Thanks
This is standard behavior for a DbContext life cycle. To be honest, I can't link you to any documentation that documents this feature, but I can explain you how this works.
An EF Context is stateful, and keeps track of all the entities that have already been fetched. It also knows about the relations between entities in your DB and your entity model.
So if you fetch new objects that have a direct relation to that object (in your case, C has a foreign key to B), the navigation property is populated by the Context. This is a feature, and not a bug, as it tries to explicitly avoid Lazy loading queries to the DB for objects that have already been fetched.
TL;DR
I want to write a query in LINQ to Entities and tell it that I'll never load the child entities of an entity. How do I do that without projecting?
Eg,
return (from a in this.Db.Assets
join at in this.Db.AssetTypes on a.AssetTypeId equals at.AssetTypeId
join ast in this.Db.AssetStatuses on a.AssetStatusId equals ast.AssetStatusId
select new {
a = a,
typeDesc = at.AssetTypeDesc,
statusDesc = ast.AssetStatusDesc
}).ToList().Select(anon => new AssetViewModel(anon.a, anon.typeDesc, anon.statusDesc)).ToList();
I want the entity called Asset pulled into a on the anonymous type I'm defining, and when I call ToList(), I don't want the Assets' children, Status and Type, to lazy load.
EDIT: After some random Visual Studio autcomplete investigation, much of this can be accomplished by turning off lazy loading in the DbContext:
this.Db.Configuration.LazyLoadingEnabled = false;
Unfortunately, if your work with the query results does have a few child tables, even with LazyLoadingEnabled turned off, things may still "work" for some subset of them iff the data for those children has already been loaded earlier in this DbContext -- that is, if those children have already had their context cached -- which can make for some surprising and temporarily confusing results.
That is to say, I want to explicitly load some children at query time and completely sever any relationship to other child entities.
Best would be some way to actively load some entities and to ignore the rest. That is, I could call ToList() and not have to worry about throwing off lots of db connections.
Context
I have a case where I'm hydrating a view model with the results of a LINQ to Entities query from an entity called Asset. The Asset table has a couple of child tables, Type and Status. Both Type and Status have Description fields, and my view model contains both descriptions in it. Let's pretend that's as complicated as this query gets.
So I'd like to pull everything from the Asset table joined to Type and Status in one database query, during which I pull the Type and Status descriptions. In other words, I don't want to lazy load that info.
WET (Woeful Entity reTranscription?)
What we're doing now, which does exactly what I want from a connection standpoint, is the usual .Select into the view model, with a tedious field matchup.
return (from a in this.Db.Assets
join at in this.Db.AssetTypes on a.AssetTypeId equals at.AssetTypeId
join ast in this.Db.AssetStatuses on a.AssetStatusId equals ast.AssetStatusId
select new AssetViewModel
{
AssetId = a.AssetId,
// *** LOTS of fields from Asset removed ***
AssetStatusDesc = ast.AssetStatusDesc,
AssetTypeDesc = at.AssetTypeDesc
}).ToList();
That's good in that the Status and Type child entities of Asset are never accessed, and there's no lazy load. The SQL is one join in one database hit for all the assets. Perfect.
The worry is all the repeated jive in // *** LOTS of fields from Asset removed ***. Currently, we've got that projection in every freakin query, which obviously isn't DRY. And it means that when the Asset table changes, it's rare that the new field is included in every projection (because humans), which stinks.
I don't see a quick way around the query, btw. If I want to do it in a single query, I have to have the joins. I could add wheres to it in separate methods, but I'm not sure how I'd skip the projection each time. Or I could add joins to the query in cascading methods, but then my projection is still "repository bound", which isn't best case if I'm using these sorts of queries elsewhere. But I'm betting I'm stupiding something here.
Dumb
When I tried adding a cast to my view model from asset and changing to something like this, which is beautiful from a code standpoint, though I get bitten by lazy loading -- two extra database hits per Asset, one for Status and one for Type.
return (from a in this.Db.Assets
select a).ToList().Select(asset => (AssetViewModel)asset).ToList();
Just as we would expect, since I'm using lines like...
AssetTypeDesc = a.AssetType.AssetTypeDesc,
... inside of the casting code. So that was dumb. Concise, reusable, but dumb. This is why we hate folks who use ORMs without checking the SQL. ;^)
Overly clever, sorta
But then I tried getting too clever, with a new constructor for the view model that took the asset entity & the two description values as strings, which ended up with the same lazy load issue (because, duh, the first ToList() before selecting the anonymous objects means we don't know how the Assets are going to be used, and we're stuck pulling back everything to be safe (I assume)).
//Use anon type to skirt "Only parameterless constructors
//and initializers are supported in LINQ to Entities,"
//issue.
return (from a in this.Db.Assets
join at in this.Db.AssetTypes on a.AssetTypeId equals at.AssetTypeId
join ast in this.Db.AssetStatuses on a.AssetStatusId equals ast.AssetStatusId
select new {
a = a,
typeDesc = at.AssetTypeDesc,
statusDesc = ast.AssetStatusDesc
}).ToList().Select(anon => new AssetViewModel(anon.a, anon.typeDesc, anon.statusDesc)).ToList();
If only there was some way to say, "cast these anonymous objects to a List, but don't lazy load the Asset's children while you're doing it." <<< That's my question, natch.
I've read some about DataLoadOptions.LoadWith(), which probably provides an okay solution, and I might end up just doing that, but that's not precisely what I'm asking. I think that's a global-esque setting (? I think just for the life of the data context, which should be the single controller interaction), which I might not necessarily want to set. I may also want ObjectTrackingEnabled = false, but I'm not grokking yet.
I also don't want to use an automapper.
Painfully, after some random Visual Studio autcomplete investigation, this might be as easy as turning off lazy loading in your DbContext:
this.Db.Configuration.LazyLoadingEnabled = false;
The wacky thing is that if your work with the query results does have a few child tables, even with LazyLoadingEnabled turned off, things may still "work" for some subset of them iff the data for those children has already been loaded earlier in this DbContext -- that is, if those children have already had their context cached -- which can make for some surprising and temporarily confusing results.
Better would be to be able to cherry pick what children are "lazy-loading eligible".
I may need to update the question to make it cover this variation of the original question.
Let's assume I have 2 tables, A and B with 1-0..1 relation. I use the Adapter approach. I need to load a collection of A entities in one place, and then load all related B entities for all A entities later. The reason to not use Prefetch is that in most cases I will not need to load B entities.
I use LINQ everywhere, so I would like to do it the same way.
The code I am trying to write looks like this:
var linqMetadata = new LinqMetaData(adapter)
{
ContextToUse = new Context()
};
var aCollection = linqMetadata.A.Where(/*some filter*/).ToList();
// ...
var bIds = aCollection.Select(x => x.BId);
var bCollection = linqMetadata.B.Where(x => bIds.Contains(x.BId)).ToList();
The problem is that bCollection and aCollection stay unlinked, i.e. all A entities have B = null and vice versa. I want these references to be set, and therefore the 2 graphs to be united into a single one.
I can join 2 collections using LINQ to Objects, but that's not elegant, and besides this might get much more complicated if both collections contain complex graph with interconnections that need to be established.
I can write a prefetch from B to A, but that's one extra DB query that is completely unnecessary.
Is there an easy way to get these 2 graphs merged?
Object relation graph is built automatically if you add related entities using assignment for master entity and AddRange for detail entities.
Here's a pseudo-code sample:
foreach aEntity in aCollection
aEntity.BEntity = bCollection.First(x=>x.Id == aEntity.bId);
OR
foreach bEntity in bCollection
bEntity.AEntity.AddRange(aCollection.Where(x=>x.bId == bEntity.Id));
After doing this your object graph is complete.
Since you are already using a Context, you can use PrefetchPath and context to load the PrefetchPath later, that way your entities are always linked. Then you can run a Linq2Objects query to load in-memory all B's.
var bs = aCollection.Select(x => x.B).ToList();
I inherited an application that uses llblgen 2.6. I have a PersonAppointmentType entity that has a AppointmentType property (n:1 relation). Now I want to sort a collection of PersonAppointmentTypes on the name of the AppointmentType. I tried this so far in the Page_Load:
if (!Page.IsPostBack)
{
var p = new PrefetchPath(EntityType.PersonAppointmentTypeEntity);
p.Add(PersonAppointmentTypeEntity.PrefetchPathAppointmentType);
dsItems.PrefetchPathToUse = p;
// dsItems.SorterToUse = new SortExpression(new SortClause(PersonAppointmentTypeFields.StartDate, SortOperator.Ascending)); // This works
dsItems.SorterToUse = new SortExpression(new SortClause(AppointmentTypeFields.Name, SortOperator.Ascending));
}
I'm probably just not getting it.
EDIT:
Phil put me on the right track, this works:
if (!Page.IsPostBack)
{
dsItems.RelationsToUse = new RelationCollection(PersonAppointmentTypeEntity.Relations.AppointmentTypeEntityUsingAppointmentTypeId);
dsItems.SorterToUse = new SortExpression(new SortClause(AppointmentTypeFields.Name, SortOperator.Ascending));
}
You'll need to share more code if you want an exact solution. You didn't post the code where you actually fetch the entity (or collection). This may not seem relevant but it (probably) is, as I'm guessing you are making a common mistake that people make with prefetch paths when they are first trying to sort or filter on a related entity.
You have a prefetch path from PersonAppointmentType (PAT) to AppointType (AT). This basically tells the framework to fetch PATs as one query, then after that query completes, to fetch ATs based on the results of the PAT query. LLBLGen takes care of all of this for you, and wires the objects together once the queries have completed.
What you are trying to do is sort the first query by the entity you are fetching in the second query. If you think in SQL terms, you need a join from PAT=>AT in the first query. To achieve this, you need to add a relation (join) via a RelationPredicateBucket and pass that as part of your fetch call.
It may seem counter-intuitive at first, but relations and prefetch paths are completely unrelated (although you can use them together). You may not even need the prefetch path at all; It may be that you ONLY need the relation and sort clause added to your fetch code (depending on whether you actually want the AT Entity in your graph, vs. the ability to sort by its fields).
There is a very good explanation of Prefetch Paths and how they were here:
http://www.llblgening.com/archive/2009/10/prefetchpaths-in-depth/
Post the remainder of your fetch code and I may be able to give you a more exact answer.