I'm trying to find documentation about a behavior in Entity Framework. It works but before relying on this behavior, I want to be sure it's a normal behavior of EF and not a unexpected side effect that would be "fixed" in a future version. Here's the situation :
I've a pretty deep object hierachy (which I will simplify here). The structure is a multi levels collection of objects (class A contains a collection of class B which contains a collection of class C, which contains ...) for 7 levels deep.
I've to filter elements on some properties of C and my first trials of doing it loading the complete hierachy produce a complicated LINQ query (which would be a maintenance nightmare) and the generated SQL query was far from efficient. To simplify all this, I decided to split the query in 2 steps : first, I load the collection of class C (and all its childs) filtered as I want and then, I load class A and B for all instances of B that contains an item of my filtered collection of C.
Here's the catch : using that technique, I expected to have to repopulate manually the collection of C in my B class but actually, the collection is already populated with the elements of the collection. I verified the SQL query in intellitrace and the data required to fill instances of C is not included in the second query so the only logical conclusion is that EF did this from the informations in the context. BTW, lazy loading is turned off for that context.
Is this behavior normal in EF ? It so, can you give me link to the documentation explaining how this works ?
Here's a snippet to illustrate this :
using(var context = new MyContext())
{
//Includes and where clauses are greatly simplified for the purpose of the sample
var filteredC = context.C.Include(x=>x.ListOfD).Include(x=>x.ListOfD.Select(y=>y.ListOfE)).Where(c=>c.Status==Status).ToList();
int[] bToLoad = filteredC.Select(c=>c.IDofB).Distinct().ToArray();
var listOfAAndB = context.A.Include(a=>a.ListOfB).Where(x=>x.ListOfB.Any(y=>bToLoad.Contains(y.ID))).ToList();
//At this step, I expected B.ListOfC to be empty but it's somehow populated
}
Thanks
This is standard behavior for a DbContext life cycle. To be honest, I can't link you to any documentation that documents this feature, but I can explain you how this works.
An EF Context is stateful, and keeps track of all the entities that have already been fetched. It also knows about the relations between entities in your DB and your entity model.
So if you fetch new objects that have a direct relation to that object (in your case, C has a foreign key to B), the navigation property is populated by the Context. This is a feature, and not a bug, as it tries to explicitly avoid Lazy loading queries to the DB for objects that have already been fetched.
Related
TL;DR
I want to write a query in LINQ to Entities and tell it that I'll never load the child entities of an entity. How do I do that without projecting?
Eg,
return (from a in this.Db.Assets
join at in this.Db.AssetTypes on a.AssetTypeId equals at.AssetTypeId
join ast in this.Db.AssetStatuses on a.AssetStatusId equals ast.AssetStatusId
select new {
a = a,
typeDesc = at.AssetTypeDesc,
statusDesc = ast.AssetStatusDesc
}).ToList().Select(anon => new AssetViewModel(anon.a, anon.typeDesc, anon.statusDesc)).ToList();
I want the entity called Asset pulled into a on the anonymous type I'm defining, and when I call ToList(), I don't want the Assets' children, Status and Type, to lazy load.
EDIT: After some random Visual Studio autcomplete investigation, much of this can be accomplished by turning off lazy loading in the DbContext:
this.Db.Configuration.LazyLoadingEnabled = false;
Unfortunately, if your work with the query results does have a few child tables, even with LazyLoadingEnabled turned off, things may still "work" for some subset of them iff the data for those children has already been loaded earlier in this DbContext -- that is, if those children have already had their context cached -- which can make for some surprising and temporarily confusing results.
That is to say, I want to explicitly load some children at query time and completely sever any relationship to other child entities.
Best would be some way to actively load some entities and to ignore the rest. That is, I could call ToList() and not have to worry about throwing off lots of db connections.
Context
I have a case where I'm hydrating a view model with the results of a LINQ to Entities query from an entity called Asset. The Asset table has a couple of child tables, Type and Status. Both Type and Status have Description fields, and my view model contains both descriptions in it. Let's pretend that's as complicated as this query gets.
So I'd like to pull everything from the Asset table joined to Type and Status in one database query, during which I pull the Type and Status descriptions. In other words, I don't want to lazy load that info.
WET (Woeful Entity reTranscription?)
What we're doing now, which does exactly what I want from a connection standpoint, is the usual .Select into the view model, with a tedious field matchup.
return (from a in this.Db.Assets
join at in this.Db.AssetTypes on a.AssetTypeId equals at.AssetTypeId
join ast in this.Db.AssetStatuses on a.AssetStatusId equals ast.AssetStatusId
select new AssetViewModel
{
AssetId = a.AssetId,
// *** LOTS of fields from Asset removed ***
AssetStatusDesc = ast.AssetStatusDesc,
AssetTypeDesc = at.AssetTypeDesc
}).ToList();
That's good in that the Status and Type child entities of Asset are never accessed, and there's no lazy load. The SQL is one join in one database hit for all the assets. Perfect.
The worry is all the repeated jive in // *** LOTS of fields from Asset removed ***. Currently, we've got that projection in every freakin query, which obviously isn't DRY. And it means that when the Asset table changes, it's rare that the new field is included in every projection (because humans), which stinks.
I don't see a quick way around the query, btw. If I want to do it in a single query, I have to have the joins. I could add wheres to it in separate methods, but I'm not sure how I'd skip the projection each time. Or I could add joins to the query in cascading methods, but then my projection is still "repository bound", which isn't best case if I'm using these sorts of queries elsewhere. But I'm betting I'm stupiding something here.
Dumb
When I tried adding a cast to my view model from asset and changing to something like this, which is beautiful from a code standpoint, though I get bitten by lazy loading -- two extra database hits per Asset, one for Status and one for Type.
return (from a in this.Db.Assets
select a).ToList().Select(asset => (AssetViewModel)asset).ToList();
Just as we would expect, since I'm using lines like...
AssetTypeDesc = a.AssetType.AssetTypeDesc,
... inside of the casting code. So that was dumb. Concise, reusable, but dumb. This is why we hate folks who use ORMs without checking the SQL. ;^)
Overly clever, sorta
But then I tried getting too clever, with a new constructor for the view model that took the asset entity & the two description values as strings, which ended up with the same lazy load issue (because, duh, the first ToList() before selecting the anonymous objects means we don't know how the Assets are going to be used, and we're stuck pulling back everything to be safe (I assume)).
//Use anon type to skirt "Only parameterless constructors
//and initializers are supported in LINQ to Entities,"
//issue.
return (from a in this.Db.Assets
join at in this.Db.AssetTypes on a.AssetTypeId equals at.AssetTypeId
join ast in this.Db.AssetStatuses on a.AssetStatusId equals ast.AssetStatusId
select new {
a = a,
typeDesc = at.AssetTypeDesc,
statusDesc = ast.AssetStatusDesc
}).ToList().Select(anon => new AssetViewModel(anon.a, anon.typeDesc, anon.statusDesc)).ToList();
If only there was some way to say, "cast these anonymous objects to a List, but don't lazy load the Asset's children while you're doing it." <<< That's my question, natch.
I've read some about DataLoadOptions.LoadWith(), which probably provides an okay solution, and I might end up just doing that, but that's not precisely what I'm asking. I think that's a global-esque setting (? I think just for the life of the data context, which should be the single controller interaction), which I might not necessarily want to set. I may also want ObjectTrackingEnabled = false, but I'm not grokking yet.
I also don't want to use an automapper.
Painfully, after some random Visual Studio autcomplete investigation, this might be as easy as turning off lazy loading in your DbContext:
this.Db.Configuration.LazyLoadingEnabled = false;
The wacky thing is that if your work with the query results does have a few child tables, even with LazyLoadingEnabled turned off, things may still "work" for some subset of them iff the data for those children has already been loaded earlier in this DbContext -- that is, if those children have already had their context cached -- which can make for some surprising and temporarily confusing results.
Better would be to be able to cherry pick what children are "lazy-loading eligible".
I may need to update the question to make it cover this variation of the original question.
I would like to know if the following scenario is possible with Entity Framework:
I want to load several tables with the option AsNoTracking since they are all like static tables that cannot be changed by user.
Those tables also happen to be navigation property of others. Up till now I relied on the AutoMapping feature of the Entity Framework, and don't use the .Include() or LazyLoading functionality.
So instead of:
var result = from x in context.TestTable
.Include("ChildTestTable")
select x;
I am using it like this:
context.ChildTestTable.Load();
context.TestTable.Load();
var result = context.TestTable.Local;
This is working smoothly because the application is so designed that the tables within the Database are very small, there won't be a table that exceeds 600 rows (and that's already pretty high value in my app).
Now my way of loading data, isn't working with .AsNoTracking().
Is there any way to make it working?
So I can write:
context.ChildTestTable.AsNoTracking().List();
var result = context.TestTable.AsNoTracking().List();
Instead of:
var result = from x in context.TestTable.AsNoTracking()
.Include("ChildTestTable")
select x;
So basically, I want to have 1 or more tables loaded with AutoMapping feature on but without loading them into the Object State Manager, is that a possibility?
The simple answer is no. For normal tracking queries, the state manager is used for both identity resolution (finding a previously loaded instance of a given entity and using it instead of creating a new instance) and fixup (connecting navigation properties together). When you use a no-tracking query it means that the entities are not tracked in the state manager. This means that fixup between entities from different queries cannot happen because EF has no way of finding those entities.
If you were to use Include with your no-tracking query then EF would attempt to do some fixup between entities within the query, and this will work a lot of the time. However, some queries can result in referencing the same entity multiple times and in some of those cases EF has no way of knowing that it is the same entity being referenced and hence you may get duplicates.
I guess the thing you don't really say is why you want to use no-tracking. If your tables don't have a lot of data then you're unlikely to see significant perf improvements, although many factors can influence this. (As a digression, using the ObservableCollection returned by .Local could also impact perf and should not be necessary if the data never changes.) Generally speaking you should only use no-tracking if you have an explicit need to do so since otherwise it ends up adding complexity without benefit.
I want to cache a never-changing aggregate which would be accessible by a root object only (all other entities are accessible only by using Reference/HasMany properties on the root object)?
Should I use NHibernate (which we are already using) second-level-cache or is it better to build some sort of singleton that provides access to all entities in the aggregate?
I found a blog post about getting everything with MultiQuery but my database does not support it.
The 'old way' to do this would be to
Do a select * from all aggregate tables
Loop the entities and set the References and the Collections manually
Something like:
foreach (var e in Entities)
{
e.Parent = loadedParentEntities.SingleOrDefault(pe => e.ParentId = pe.Id);
}
But surely there is a way to tell NHibernate to do this for me?
Update
Currently I tried merely fetching everything from the db and hope NHibernate does all the reference setting. It does not however :(
var getRoot = Session.Query<RootObject>().ToList();
var getRoot_hasMany = Session.Query<RootObjectCollection>().ToList();
var getRoot_hasMany_ref = Session.Query<RootObjectCollectionReference>().ToList();
var getRoot_hasMany_hasMany = Session.Query<RootObjectCollectionCollection>().ToList();
Domain:
Root objects are getRoot. These have a collection property 'HasMany'. These HasMany have each a reference back to GetRoot, and a reference to another entity (getRoot_hasMany_ref), and a collection of their own (getRoot_hasMany_hasMany). If this doesn't make sense, I'll create an ERD but the actual structure is not really relevant for the question (I think).
This results in 4 queries being executed. (which is good)
However, when accessing properties like getRoot.First().HasMany.First().Ref or getRoot.First().HasMany.First().HasMany().First() it still results in extra queries being executed even altough everything should already be known to the ISession?
So how do I tell NHibernate to perform those 4 queries and then build the graphs without using any proxy properties, ... so that I have access to everything even after the ISession went out of scope?
I think there are several questions in one.
I stopped trying to trick NHibernate too much. I wouldn't access entities from multiple threads, because they are usually not thread safe. At least when using lazy loading. Caching lazy entities is therefore something evil.
I would avoid too many queries by the use of batch size, which is far the cleanest and easiest solution and in most cases "good enough". It's fully transparent to the business logic, which makes it so cool.
I would:
Consider not caching the entity at all. Use NH first level cache (say: always load it using session.Get()). Make use of lazy loading when only a small part of the data is used in a single transaction.
Is there is a proven need to cache the data, consider to turn off lazy loading at all (by making the entities non-lazy and setting all the collections to non lazy. Load the entity once and cache it. Still consider thread safety when accessing the data while it is still loaded.
Should the entities be lazy, because some instances of the same type are not in the cache, consider using a DTO-like structure as cache. Copy all data in a similar class structure which are not entities. This may sound like a lot of additional work, but at the end it will avoid many strange problems and safe you much time.
Usually, query time is less important as flush time. This time is used by NH to find which entities changed in a session. To avoid this, make entities read only if you can.
if the whole object tree never changes (config settings?) then just load them efficiently with all references/collections initialised
using(var Session = Sessionfactory = OpenSession())
{
var root = Session.Query<RootObject>().FetchMany(x => x.Collection).ToFutureValue();
Session.Query<RootObjectCollection>().Fetch(x => x.Ref).FetchMany(x => x.Collection).ToFuture();
// Do something with root.Value
}
I will start by saying I have already looked thoroughly in stack overflow, nhusers and the documentation for a possible solution to my issue.
I need to be able to query only the base class table in parts of my multi/future query when eagerly loading associations (although from the research I have done I don't think this is possible)
I have started to map an existing schema using fluent nhibernate as a proof of concept. I have mapped an inheritance hierarchy using table per sub class (The mappings all work perfectly fine so I won't paste them all in here). The hierarchy has around 15 sub classes and the base class has some additional associations. E.g.
Base
Dictionary<string, Attribute> Attributes
List<EntityChange> Changes
I need to eagerly load both of the collections as in the given scenario they are required for post processing and lazily loading them causes performance issues. I am eagerly loading them by a multi / future query:
var baseQuery = session.CreateCriteria<Base>("b")
.CreateCriteria("Nested", JoinType.LeftOuterJoin)
.CreateCriteria("Nested2", JoinType.LeftOuterJoin)
.CreateCriteria("Nested2.AdditionalNested", JoinType.LeftOuterJoin);
var logsQuery = session.CreateCriteria<Base>("b").CreateAlias("Changes", "c", JoinType.LeftOuterJoin,
Expression.And(Expression.Ge("c.EntryDate", changesStartDate), Expression.Le("c.EntryDate", changesEndDate)))
.AddOrder(Order.Desc("c.EntryDate"));
var attributesQuery = session.CreateCriteria<Base>("t").SetFetchMode("Attributes", FetchMode.Join);
logsQuery.Future<Base>();
attributesQuery.Future<Base>();
var results = baseQuery.Future<Base>().ToList();
The queries execute and return the correct results. But just to eagerly load the associations in this manner means that the attribute and changes queries have to perform a polymorphic fetch (the addition of about 15 left outer joins per query that aren't required). I know this is required for polymorphic querying but the base query will return the hierarchy that I desire. The other parts of the multi query that are issuing a polymorphic query are redundant.
I haven't yet mapped the whole of the hierarchy so there will be additional unecessary joins being performed and there are also other associations that could be loaded up front. These two combined without the addition of an increase in volume will lead to performance issues. The performance currently of this query is about 6 seconds (which admittedly is better than the 20 it's currently taking) but by messing around a bit with the query and taking out the extra joins I can get it down to about 2 seconds (this is a common query so getting it as low as possible is beneficial not just pleasing to me. It will also be run from multiple distributed machine so I would rather not get into a discussion about caching / 2nd level caching).
I have tried
using the class modifier in the query 'class = base'. I initially done this blindly but believe this is for discriminator values. Even if it is for the case statement in the SQL this will not prevent the extra joins.
Doing everything in a single query. This is slower than splitting it up as above and gives the cartesian product
Using 'Polymorphism.Explicit();' in the fluent mappings. This has no effect as I am using ClassMap with SubclassMaps so it is ignored. I tried changing all the maps to ClassMaps and using Join but this didn't give the desired behaviour.
Tried to trick nhibernate into joining the base class table onto itself for loading associations (basically load a more concrete type to prevent the polymorphic query) - create a derived class 'BaseOnlyLoading' which uses the same table and primary key as the base class. This was obviously a hack but I was just trying to see what's possible. NHibernate doesn't allow the class and sub class to use the same table.
Define the BaseOnlyLoadingMap to be a classmap with the same assocations as the BaseMap with a join back onto the Base. This was hopeful as assocation collections are resolved in the context based on full type name.
Use an interceptor which modifies the SQL that before it's execute. I wouldn't use this in production and just tried it out of interest. I passed an interceptor into a local session. This caused issues and I didn't proceed.
The HQL 'Type' query operator as explained here although I am not sure this has been implemented in the .NET version and might behave similarly to 1.
There is comment on highest rated answer (How to perform a non-polymorphic HQL query in Hibernate?) which suggest overriding the IsExplicitPolymorphism on the persister. I had a quick look and from what I remember the persister was either global per entity or created in the SessionImpl from a static factory which would prevent doing this. Even if this was possible I am not sure what sort of side effects this would have.
I tried using some SQL to load everything but even if I use a stored proc I am not sure how nhibernate will piece the graph back together. Maybe I could specify all the entities and aliases?
Specifying explicit per query would be nice. Any suggestions?
Thanks in advance.
Given an object graph like:
A { IEnum<B> }
B { IEnum<C>, IEnum<D>, IEnum<E>, ... }
C { IEnum<X> }
How can I eagerly load the entire object graph without N+1 issues?
Here is the pseudo code for the queries that I would ultimately like to execute:
var a = Session.Get<A>(1); // Query 1
var b_Ids = foreach(b in A.B's) => Select(b.Id); // Query 2
var c = Session.CreateQuery("from C where B in (b_Ids)").Future<C>(); // Query 3
var d = Session.CreateQuery("from D where B in (b_Ids)").Future<D>(); // Query 3
var e = Session.CreateQuery("from E where B in (b_Ids)").Future<E>(); // Query 3
// Iterate through c, d, e, ... find the correct 'B' parent, add to collection manually
The problem that I have with this approach is that when I go to add the instances of 'C', 'D', and 'E' to the corresponding collection of the parent 'B', the collection is still proxied, and when .Add() is called, the proxy initializes itself and executes more queries; I think NHibernate is not capable of seeing that I already have all of the data in first level cache, which is understandable.
I've tried to work around this problem by doing something like this in my Add method:
void Add(IEnum<C>)
{
_collection = new Collection<C>(); // replace the proxied instance to prevent initialization
foreach(c) => _collection.Add(c);
}
This gave me the optimum query strategy that I wanted, but caught up with me later when doing persistence (NHibernate tracks the original collection by-ref somewhere from what I can tell).
So my question is, how can I load a complex graph with children of children without N+1? The only thing I've come across to date is joining B-C, B-D, B-E which is not acceptable in my situation.
We are using NH 2.1.2 with FluentHN for mapping. An upgrade to v3 of NH or using hbm's/stored procs/whatever would not be off the table.
UPDATE:
One of the comments references a join approach, and I did come across a blog that demonstrates this approach. This work around is not acceptable in our situation, but it may help someone else: Eager fetch multiple child collections in 1 round trip with NHibernate
UPDATE 2:
Jordan's answer led me to the following posts that are related to my question: Similar Question and Ayende's blog. The pending question at this point is "how can you perform the subselects without a round trip per-path".
UPDATE 3:
I've accepted Jordan's answer even though the subselect solution is not optimal.
You can use SubSelect fetching which can be setup in the mapping files. This will avoid N+1 and cartesian product.
firstly- you can change your mappings to load these collections eagerly. see item #4 in this section.
secondly- I believe that the reason that your collection seems to be loading twice is that you first fetch it using a query, and then using the collection property.
meaning- nHibernate distinguishes between queries generated by the user (like the one you use) and queries it generates itself (like the one that occurs when you first read your 'C' collection). they do not mix.
so, when you first read your 'C' collection, nHib does not recognize that it actually once sent the exact same query to the DB (since it was a user query), and sends it again.
The way to avoid this is to retrieve your C collection via your B entity.