How eager does LINQ and Entity Framework load by default? - c#

I'm new to LINQ and the Entity Framework. I've been fetching collections from the database using the following:
var Publications = from pubs in db.RecurringPublications
select pubs;
The Publications table is linked to other tables via foreign keys. I've been using this to reference properties like this:
Publications.Single().LinkedTable.LinkedTableColumn
and sometimes even further down the chain:
Publications.Single().LinkedTable.LinkedTable.LinkedLinkedTableColumn
I know you can specify lazy loading or eager loading, I was wondering how it's handled by default. Is there a maximum depth by default? Does it figure out how many joins to use at compile time?

It's only going to eager load what's in that specific table.
var Publications = from pubs in db.RecurringPublications
select pubs;
Will only get the data from your RecurringPublications table. You can specify if you want to load additional properties, but if you don't specify anything, it will only give you exactly what you ask for - nothing more.
Publications.Single().LinkedTable.LinkedTableColumn
Is lazy loading your LinkedTableColumn - now if your return is Queryable (and it is so far), it's going to do a join and return a single SQL query.
However, if the call has already been enumerated, it will make a second call.
Blog post to MSDN for info

Related

Mark some child entities as "Never Load" in LINQ to Entities query

TL;DR
I want to write a query in LINQ to Entities and tell it that I'll never load the child entities of an entity. How do I do that without projecting?
Eg,
return (from a in this.Db.Assets
join at in this.Db.AssetTypes on a.AssetTypeId equals at.AssetTypeId
join ast in this.Db.AssetStatuses on a.AssetStatusId equals ast.AssetStatusId
select new {
a = a,
typeDesc = at.AssetTypeDesc,
statusDesc = ast.AssetStatusDesc
}).ToList().Select(anon => new AssetViewModel(anon.a, anon.typeDesc, anon.statusDesc)).ToList();
I want the entity called Asset pulled into a on the anonymous type I'm defining, and when I call ToList(), I don't want the Assets' children, Status and Type, to lazy load.
EDIT: After some random Visual Studio autcomplete investigation, much of this can be accomplished by turning off lazy loading in the DbContext:
this.Db.Configuration.LazyLoadingEnabled = false;
Unfortunately, if your work with the query results does have a few child tables, even with LazyLoadingEnabled turned off, things may still "work" for some subset of them iff the data for those children has already been loaded earlier in this DbContext -- that is, if those children have already had their context cached -- which can make for some surprising and temporarily confusing results.
That is to say, I want to explicitly load some children at query time and completely sever any relationship to other child entities.
Best would be some way to actively load some entities and to ignore the rest. That is, I could call ToList() and not have to worry about throwing off lots of db connections.
Context
I have a case where I'm hydrating a view model with the results of a LINQ to Entities query from an entity called Asset. The Asset table has a couple of child tables, Type and Status. Both Type and Status have Description fields, and my view model contains both descriptions in it. Let's pretend that's as complicated as this query gets.
So I'd like to pull everything from the Asset table joined to Type and Status in one database query, during which I pull the Type and Status descriptions. In other words, I don't want to lazy load that info.
WET (Woeful Entity reTranscription?)
What we're doing now, which does exactly what I want from a connection standpoint, is the usual .Select into the view model, with a tedious field matchup.
return (from a in this.Db.Assets
join at in this.Db.AssetTypes on a.AssetTypeId equals at.AssetTypeId
join ast in this.Db.AssetStatuses on a.AssetStatusId equals ast.AssetStatusId
select new AssetViewModel
{
AssetId = a.AssetId,
// *** LOTS of fields from Asset removed ***
AssetStatusDesc = ast.AssetStatusDesc,
AssetTypeDesc = at.AssetTypeDesc
}).ToList();
That's good in that the Status and Type child entities of Asset are never accessed, and there's no lazy load. The SQL is one join in one database hit for all the assets. Perfect.
The worry is all the repeated jive in // *** LOTS of fields from Asset removed ***. Currently, we've got that projection in every freakin query, which obviously isn't DRY. And it means that when the Asset table changes, it's rare that the new field is included in every projection (because humans), which stinks.
I don't see a quick way around the query, btw. If I want to do it in a single query, I have to have the joins. I could add wheres to it in separate methods, but I'm not sure how I'd skip the projection each time. Or I could add joins to the query in cascading methods, but then my projection is still "repository bound", which isn't best case if I'm using these sorts of queries elsewhere. But I'm betting I'm stupiding something here.
Dumb
When I tried adding a cast to my view model from asset and changing to something like this, which is beautiful from a code standpoint, though I get bitten by lazy loading -- two extra database hits per Asset, one for Status and one for Type.
return (from a in this.Db.Assets
select a).ToList().Select(asset => (AssetViewModel)asset).ToList();
Just as we would expect, since I'm using lines like...
AssetTypeDesc = a.AssetType.AssetTypeDesc,
... inside of the casting code. So that was dumb. Concise, reusable, but dumb. This is why we hate folks who use ORMs without checking the SQL. ;^)
Overly clever, sorta
But then I tried getting too clever, with a new constructor for the view model that took the asset entity & the two description values as strings, which ended up with the same lazy load issue (because, duh, the first ToList() before selecting the anonymous objects means we don't know how the Assets are going to be used, and we're stuck pulling back everything to be safe (I assume)).
//Use anon type to skirt "Only parameterless constructors
//and initializers are supported in LINQ to Entities,"
//issue.
return (from a in this.Db.Assets
join at in this.Db.AssetTypes on a.AssetTypeId equals at.AssetTypeId
join ast in this.Db.AssetStatuses on a.AssetStatusId equals ast.AssetStatusId
select new {
a = a,
typeDesc = at.AssetTypeDesc,
statusDesc = ast.AssetStatusDesc
}).ToList().Select(anon => new AssetViewModel(anon.a, anon.typeDesc, anon.statusDesc)).ToList();
If only there was some way to say, "cast these anonymous objects to a List, but don't lazy load the Asset's children while you're doing it." <<< That's my question, natch.
I've read some about DataLoadOptions.LoadWith(), which probably provides an okay solution, and I might end up just doing that, but that's not precisely what I'm asking. I think that's a global-esque setting (? I think just for the life of the data context, which should be the single controller interaction), which I might not necessarily want to set. I may also want ObjectTrackingEnabled = false, but I'm not grokking yet.
I also don't want to use an automapper.
Painfully, after some random Visual Studio autcomplete investigation, this might be as easy as turning off lazy loading in your DbContext:
this.Db.Configuration.LazyLoadingEnabled = false;
The wacky thing is that if your work with the query results does have a few child tables, even with LazyLoadingEnabled turned off, things may still "work" for some subset of them iff the data for those children has already been loaded earlier in this DbContext -- that is, if those children have already had their context cached -- which can make for some surprising and temporarily confusing results.
Better would be to be able to cherry pick what children are "lazy-loading eligible".
I may need to update the question to make it cover this variation of the original question.

Load Entities AsNoTracking() with navigation properties, without specifying includes

I would like to know if the following scenario is possible with Entity Framework:
I want to load several tables with the option AsNoTracking since they are all like static tables that cannot be changed by user.
Those tables also happen to be navigation property of others. Up till now I relied on the AutoMapping feature of the Entity Framework, and don't use the .Include() or LazyLoading functionality.
So instead of:
var result = from x in context.TestTable
.Include("ChildTestTable")
select x;
I am using it like this:
context.ChildTestTable.Load();
context.TestTable.Load();
var result = context.TestTable.Local;
This is working smoothly because the application is so designed that the tables within the Database are very small, there won't be a table that exceeds 600 rows (and that's already pretty high value in my app).
Now my way of loading data, isn't working with .AsNoTracking().
Is there any way to make it working?
So I can write:
context.ChildTestTable.AsNoTracking().List();
var result = context.TestTable.AsNoTracking().List();
Instead of:
var result = from x in context.TestTable.AsNoTracking()
.Include("ChildTestTable")
select x;
So basically, I want to have 1 or more tables loaded with AutoMapping feature on but without loading them into the Object State Manager, is that a possibility?
The simple answer is no. For normal tracking queries, the state manager is used for both identity resolution (finding a previously loaded instance of a given entity and using it instead of creating a new instance) and fixup (connecting navigation properties together). When you use a no-tracking query it means that the entities are not tracked in the state manager. This means that fixup between entities from different queries cannot happen because EF has no way of finding those entities.
If you were to use Include with your no-tracking query then EF would attempt to do some fixup between entities within the query, and this will work a lot of the time. However, some queries can result in referencing the same entity multiple times and in some of those cases EF has no way of knowing that it is the same entity being referenced and hence you may get duplicates.
I guess the thing you don't really say is why you want to use no-tracking. If your tables don't have a lot of data then you're unlikely to see significant perf improvements, although many factors can influence this. (As a digression, using the ObservableCollection returned by .Local could also impact perf and should not be necessary if the data never changes.) Generally speaking you should only use no-tracking if you have an explicit need to do so since otherwise it ends up adding complexity without benefit.

optimize linq query with related entities

i am new to linq, i started writing this query:
var dProjects = Projects
.Select(p => new Models.Project {
ProjectID = p.ProjectID,
Status = p.Status,
ExpiresOn = p.ExpiresOn,
LatestComments = p.ProjectComments
.OrderByDescending(pc => pc.CreatedOn)
.Select(pc => pc.Comments)
.FirstOrDefault(),
ProjectFileIDs = p.ProjectFiles
.Select(pf => pf.BinaryFileID)
.AsQueryable()
})
.AsQueryable<Models.Project>();
I already know this query will perform really slow because related entities like ProjectComments and ProjectFiles will create nested selects, though it works and gives me right results that i need.
How can i optimize this query and get the same results? One of my guesses would be using inner join but ProjectComments and ProjectFiles already has a relationship in database through keys, so not sure what we can achieve by setting the relationship again.
Basically, need to know which is the best approach to take here from performance perspective. One thing to note is i am sorting ProjectComments and only taking the most recent one. Should i be using combination of join and group by into? Help will be much appreciated. Thanks.
UPDATED:
Sorry, if i wasn't clear enough on what i am trying to do. Basically, in front end, i have a grid, which shows list of projects with latest project comments and list of all the files associated to project, so users can click on those links and actually open those documents. So the query that i have above is working and it does show the following in the grid:
Project ID (From Project table)
Status (From Project table)
ExpiresOn (From Project table)
LatestComments (latest entry from ProjectComments table which has project ID as foreign key)
ProjectFileIDs (list of file ids from ProjectFiles table which has Project ID as foreign key - i am using those File IDs and creating links so users can open those files).
So everything is working, i have it all setup, but the query is little slow. Right now we have very little data (only test data), but once this is launched, i am expecting lot of users/data and thus i want to optimize this query to the best, before it goes live. So, the goal here is to basically optimize. I am pretty sure this is not the best approach, because this will create nested selects.
In Entity Framework, you can drastically improve the performance of the queries by returning the objects back as an object graph instead of a projection. Entity Framework is extremely efficient at optimizing all but the most complex SQL queries, and can take advantage of deferred "Eager" loading vs. "Lazy" Loading (not loading related items from the db until they are actually accessed). This MSDN reference is a good place to start.
As far as your specific query is concerned, you could use this technique something like the following:
var dbProjects = yourContext.Projects
.Include(p => p.ProjectComments
.OrderByDescending(pc => pc.CreatedOn)
.Select(pc => pc.Comments)
.FirstOrDefault()
)
.Include(p => p.ProjectFileIDs)
.AsQueryable<Models.Project>();
note the .Include() being used to imply Eager Loading.
From the MDSN Reference on Loading Related Objects,
Performance Considerations
When you choose a pattern for loading related entities, consider the behavior of each approach with regard to the number and timing of connections made to the data source versus the amount of data returned by and the complexity of using a single query. Eager loading returns all related entities together with the queried entities in a single query. This means that, while there is only one connection made to the data source, a larger amount of data is returned in the initial query. Also, query paths result in a more complex query because of the additional joins that are required in the query that is executed against the data source.
Explicit and lazy loading enables you to postpone the request for related object data until that data is actually needed. This yields a less complex initial query that returns less total data. However, each successive loading of a related object makes a connection to the data source and executes a query. In the case of lazy loading, this connection occurs whenever a navigation property is accessed and the related entity is not already loaded.
Do you get any boost in performance if you add Include statements before the Select?
Example:
var dProjects = Projects
.Include(p => p.ProjectComments)
.Include(p => p.ProjectFiles)
Include allows all matching ProjectComments and ProjectFiles to be eagerly loaded. See Loading Related Entities for more details.

How to efficiently delete all but the latest x children of parent entity using NHibernate?

Please forgive me if I don't explain this very well. First off, I am using NHibernate 2.0 with .NET 3.5. Basically I have an entity "EntityA" (for simplicity) with one or more children of type EntityB. Each EntityB has a number indicating how recently it was created. I would like to delete all but the x most recent EntityB. This forms part of a purge operation.
I am struggling to see an efficient way of doing this, The problem is that the EntityB instances are actually quite complex and could have hundreds of child objects themselves. The list of EntityB on EntityA is lazily loaded and I would ideally like to avoid loading it in memory if possible.
I tried passing an HQL query to Session.Delete. Unfortunately HQL doesn't seem to support the top statement so I cannot do a subselect to choose which ones not to delete.
The cascades are set up in NHibernate and not in the database. I'm not sure but I wonder if NHibernate will load the whole object graph even if the delete is done via HQL.
Any advice would be appreciated.
[Edit]Unfortunately any query must be HQL not SQL since it needs to be database independent[/Edit]
Cheers,
James
From my experience with nHibernate I don't think you're going to find a clean solution for this. But, it doesn't matter. Stepping outside the framework is only a bad thing if the framework offers a viable alternative. Use a parameterized SQL statement, make sure it's clear in the code where and why you're doing this and it'll be a great solution.
EDIT:
I'm fairly certain you could come up with a database independent SQL query for this but anyway... Instead of trying to use a top statement try using MAX() (assuming there is such a function in HQL) to grab the top item id and then structure a delete statement with a conditional that the id is not the MAX id.
You should be able to use HQL delete Child c where c in (:list) where list is a copy of the list of children with only the to-be-removed elements included. list may be obtained through an HQL query such as from Child c where c.Parent = :parent - HQL queries apparently do not obey the mapping's fetch strategy (lazy vs eager) and will only fetch children eagerly when instructed to fetch children eagerly (so just don't put in the left join fetch c.SubChildren) - and then filtered to include only the elments to be removed.
If nothing else you could set up the cascades in the database and just do
Session.CreateSQLQuery([SqlStatement]).SetParameter([ParameterStuff]).ExecuteUpdate();
I always try to keep cascades, default values, etc set up in both the database and nhibernate for cases like this when you need to do something directly to the database.

Eager load a self-referencing table

I have a standard self referencing table of Categories. In my entity model I have made associations Children and Parent. Is it possible to load the whole Category object without lazy loading?
if I use the code below, it loads only to the second level.
db.Categories.MergeOption = System.Data.Objects.MergeOption.NoTracking;
var query = from c in db.Categories.Include("Children")
where c.IsVisible == true
orderby c.SortOrder, c.Id
select c;
Is it possible to load references if I have all the category objects already loaded?
One method to load it is to add the Children property multiple times
db.Categories.Include("Children.Children.Children.Children.Children")
but this generates a very long insane T-SQL code and also it doesn't do what I want.
No, it isn't possible. Consider: All LINQ to Entities queries are translated into SQL. Which SQL statement includes an unlimited depth in a self-referencing hierarchy? In standard SQL, there isn't one. If there's an extension for this in T-SQL, I don't know what it is, and I don't think EF providers do, either.
Ok, you might consider using Load method.
if (!category.Children.IsLoaded)
category.Children.Load();
Of course, category entity need to be tracked by ObjectContext.
There is better explanation here how-does-entity-framework-work-with-recursive-hierarchies-include-seems-not-to.
One way I have used to implement that if I have several entities I want to get all children in self-referencing table is to use recursive cte and stored procedure to get their ids using Entity FrameWork :
Here is the code using Entity Framework and code first approach

Categories

Resources