Why are the entities being eager-loaded? - c#

I have an entity hierarchy with several layers, one of which contains objects that can number in the 10s of thousands. There are occasions when I want only the top-level object but I'm finding that the Entity Framework is loading everything in the hierarchy.
I've even tried explicit lazy loading, to no avail.
using (var db = new MyEntities())
{
db.Configuration.ProxyCreationEnabled = true;
db.Configuration.LazyLoadingEnabled = true;
var daoDict = (from d in db.stt_dictionary
where d.id == dictionaryID && !d.deleted
select d).FirstOrDefault();
}
While debugging, if I step through this and then hover over daoDict I find that its collection properties (which are virtual) contain thousands of objects.
Why?

Fetching them using the debugger is going to load them. The debugger isn't doing anything any different than regular code would. It's calling the getter of the property, and doing that fetches the data.
Log the database queries that are actually executed (either through the context or through the database) to see what data is being pulled when in a way that doesn't actually change what queries are being executed.

Related

There is already an open DataReader associated with this Command without ToList()

I have the method below to load dependent data from navigation property. However, it generates an error. I can remove the error by adding ToList() or ToArray(), but I'd rather not do that for performance reasons. I also cannot set the MARS property in my web.config file because it causes a problem for other classes of the connection.
How can I solve this without using extension methods or editing my web.config?
public override void Load(IEnumerable<Ques> data)
{
if (data.Any())
{
foreach (var pstuu in data)
{
if (pstuu?.Id_user != null)
{
db.Entry(pstuu).Reference(q => q.Users).Load();
}
}
}
}
I take it from this question you've got a situation something like:
// (outside code)
var query = db.SomeEntity.Wnere(x => x.SomeCondition == someCondition);
LoadDependent(query);
Chances are based on this method it's probably a call stack of various methods that build search expressions and such, but ultimately what gets passed into LoadDependent() is an IQueryable<TEntity>.
Instead if you call:
// (outside code)
var query = db.SomeEntity.Wnere(x => x.SomeCondition == someCondition);
var data = query.ToList();
LoadDependent(data);
Or.. in your LoadDependent changing doing something like:
base.LoadDependent(data);
data = data.ToList();
or better,
foreach (Ques qst in data.ToList())
Then your LoadDependent() call works, but in the first example you get an error that a DataReader is open. This is because your foreach call as-is would be iterating over the IQueryable meaning EF's data reader would be left open so further calls to db, which I'd assume is a module level variable for the DbContext that is injected, cannot be made.
Replacing this:
db.Entry(qst).Reference(q => q.AspNetUsers).Load();
with this:
db.Entry(qst).Reference(q => q.AspNetUsers).LoadAsync();
... does not actually work. This just delegates the load call asynchronously, and without awaiting it, it too would fail, just not raise the exception on the continuation thread.
As mentioned in the comments to your question this is a very poor design choice to handle loading references. You are far, far better off enabling lazy loading and taking the Select n+1 hit if/when a reference is actually needed if you aren't going to implement the initial fetch properly with either eager loading or projection.
Code like this forces a Select n+1 pattern throughout your code.
A good example of loading a "Ques" with it's associated User eager loaded:
var ques = db.Ques
.Include(x => x.AspNetUsers)
.Where(x => x.SomeCondition == someCondition)
.ToList();
Whether "SomeCondition" results in 1 Ques returned or 1000 Ques returned, the data will execute with one query to the DB.
Select n+1 scenarios are bad because in the case where 1000 Ques are returned with a call to fetch dependencies you get:
var ques = db.Ques
.Where(x => x.SomeCondition == someCondition)
.ToList(); // 1 query.
foreach(var q in ques)
db.Entry(q).Reference(x => x.AspNetUsers).Load(); // 1 query x 1000
1001 queries run. This compounds with each reference you want to load.
Which then looks problematic where later code might want to offer pagination such as to take only 25 items where the total record count could run in the 10's of thousands or more. This is where lazy loading would be the lesser of two Select n+1 evils, as with lazy loading you know that AspNetUsers would only be selected if any returned Ques actually referenced it, and only for those Ques that actually reference it. So if the pagination only "touched" 25 rows, Lazy Loading would result in 26 queries. Lazy loading is a trap however as later code changes could inadvertently lead to performance issues appearing in seemingly unrelated areas as new referenences or code changes result in far more references being "touched" and kicking off a query.
If you are going to pursue a LoadDependent() type method then you need to ensure that it is called as late as possible, once you have a known set size to load because you will need to materialize the collection to load related entities with the same DbContext instance. (I.e. after pagination) Trying to work around it using detached instances (AsNoTracking()) or by using a completely new DbContext instance may give you some headway but will invariably lead to more problems later, as you will have a mix of tracked an untracked entities, or worse, entities tracked by different DbContexts depending on how these loaded entities are consumed.
An alternative teams pursue is rather than a LoadReference() type method would be an IncludeReference() type method. The goal here being to build .Include statements into the IQueryable. This can be done two ways, either by magic strings (property names) or by passing in expressions for the references to include. Again this can turn into a bit of a rabbit hole when handling more deeply nested references. (I.e. building .Include().ThenInclude() chains.) This avoids the Select n+1 issue by eager loading the required related data.
I have solved the problem by deletion the method Load and I have used Include() in my first query of data to show the reference data in navigation property

Manually assign existing object to Entity Framework navigation property

Is it possible to manually assign an existing object to an Entity Framework (db first) object's navigation property?
The context to the question is that I have a problem in trying to bring back a (heavily filtered) list of objects with all the children and descendants attached so that the full graph is available in memory after the context is disposed.
I tried doing this via .Include() statements using something like this:
using (var ctx = new MyEntities())
{
myParents = ctx.Parents
.Where(p => MyFilter(p))
.Include(p => p.Children)
.Include(p => p.Children.Select(c=>c.Grandchildren))
.Include(p => p.Children.Select(c=>c.Grandchildren.Select(g=>g.GreatGrandChildren)));
}
but the generated query runs too slowly because of the known performance problems with using nested include statements (as explained in lots of places including this blog).
I can pull back the parents, children and Grandchildren without performance issues - I only hit the troubles when I include the very last .Include() statement for the greatgrandchildren.
I can easily get the GreatGrandChildren objects back from the database with a second separate query by building a list of GrandChildrenIds from the GrandChildren already retrieved and doing something like:
greatGrandKids = ctx.GreatGrandChildren.Where(g=>ids.Contains(g.GrandChildId)).ToList();
but now, once I dispose of the context, I cannot do something like grandChildA.GreatGrandChildren without hitting the object context disposed exception.
I could have as many as a few thousand GrandChildren objects so I really want to avoid a round trip to the database to fetch the GreatGrandChildren for each one which rules out simply using .Load() on each GrandChild object, right?
I could feasibly work around this by either just looking up the required greatgrandchildren from greatGrandKids each time I needed them in my subsequent code or even by adding a new (non-mapped) Property such as .GreatGrandChildrenLocal to the GrandChild class and assigning them all up front but these both feel very kludgy & ugly. I'd MUCH prefer to find a way to just be able to access the existing .GreatGrandChildren navigation property on each GrandChild object.
Trying the obvious of assigning to the navigation property with something like this:
grandchild.GreatGrandChildren = greatGrandKids
.Where(g=>g.GrandChildId == grandChild.Id)
.ToList();
fails too when I then try to access grandchild.GreatGrandChildren (still giving the object disposed exception).
So my question is:
Is there a way I can assign the existing GreatGrandChdildren objects I have already retrieved from the database to the .GreatGrandChdildren navigation property on the GrandChild object in such a way as to make them available (only needed for read operations) after the context is disposed?
(Or indeed is there a different solution to the problem?)
If you disable proxy creation with:
ctx.Configuration.ProxyCreationEnabled = false;
then reading and writing from/to the navigation property works exactly as expected without trying to lazily load the entities and throwing the object disposed exception.
So we have something like:
using (var ctx = new MyEntities())
{
myParents = ctx.Parents
.Where(p => MyFilter(p))
.Include(p => p.Children)
.Include(p => p.Children.Select(c=>c.Grandchildren));
//skip the final GreatGrandChildren include statement
//get the associated grandchildren & their ids:
var grandKids = myParents.SelectMany(p=>p.Children)
.SelectMany(c=>c.Grandchildren)
.ToList();
var ids = grandKids.Select(g=>g.Id)).ToList();
//Get the great grandkids:
var greatGrandKids = ctx.GreatGrandChildren
.Where(g=>ids.Contains(g.GrandChildId)).ToList();
//Assign the greatgrandchildren to the grandchildren:
foreach (grandChild in grandKids)
{
grandChild.GreatGrandChildren = greatGrandKids
.Where(g=>g.GrandChildId == grandChild.Id)
.ToList();
}
}
and now we can access the the .GreatGrandChildren property outside the context without hitting the context disposed exception. Whilst this still feels a little messy, it works out MUCH cheaper than either using the original Include() statement or calling .Load() on each GrandChild.
N.B. As these objects are only used in read operations and I don't need Lazy Loading then there are no negative implications to turning off proxy creation in my circumstances. If write operations and/or lazy loading were also necessary then we would also need to consider the implications of turning this off for the given EF context.

Getting weird behavior when retrieving data from Microsoft CRM using LINQ

I'm having this problem accessing the Contact entity using LINQ.
I have the 2 functions below.
If I ran the 1st function and then call the 2nd one, I seemed to be missing a lot of fields in the 2nd query. Like firstname and lastname are not showing up. They just shows up as null values. If I ran the 2nd function on its own, I am getting the right data. The only fields that shows up correctly in both runs are Id, ContactId and new_username.
If I ran the 2nd function on its own, I am getting the right data.
Any ideas what am I doing wrong?
Thanks a lot
Here are the 2 functions
public List<String> GetContactsUsernameOnly()
{
IQueryable<String> _records = from _contactSet in _flinsafeContext.ContactSet
where
_contactSet.new_FAN == "username"
orderby _contactSet.new_username
select _contactSet.new_username;
return _records.ToList();
}
public List<Contact> GetContacts()
{
IQueryable<Contact> _records = from _contactSet in _flinsafeContext.ContactSet
where
_contactSet.new_FAN == "my-username-here"
orderby _contactSet.new_username
select _contactSet;
return _records.ToList();
}
It is because you are reusing the same CRM context when you call both methods (in your case _flinsafeContext)
What the context does is cache records, so the first method is returning your contact but only bringing back the new_username field.
The second method wants to return the whole record, but when it is called after the first one the record already exists in the context so it just returns that, despite only having the one field populated. It is not clever enough to lazy load the fields that have not been populated. If this method was called first, it doesn't exist in the context so will return the whole record.
There are 2 ways to get around this:
1) Don't reuse CRMContexts. Instead create a new one in each method based on a singleton IOrganizationService.
2) There is a ClearChanges() method on your context that will mean the next time you do a query it will go back to CRM and get the fields you have selected. This will also clear any unsaved Created/Updates/Deletes etc so you have to be careful around what state the context is in.
As an aside, creating a new CRM Context isn't an intensive operation so it's not often worthwhile passing contexts around and reusing them. It is creating the underlying OrganisationService that is the slowest bit.
This behaviour can be so painful, because it is horribly inefficient and slow to return the entire record so you WANT to be selecting only the fields you want for each query.
And here's how you return just the fields you want:
IEnumerable<ptl_billpayerapportionment> bpas = context.ptl_billpayerapportionmentSet
.Where(bm => bm.ptl_bill.Id == billId)
.Select(bm => new ptl_billpayerapportionment()
{
Id = bm.Id,
ptl_contact = bm.ptl_contact
})
This will ensure a much smaller sql statement will be executed against the context as the Id and ptl_contact are the only two fields being returned. But as Ben says above, further retrievals against the same entity in the same context will return nulls for fields not included in the initial select (as per the OP's question).
For bonus points, using IEnumerable and creating a new, lightweight, entity gives you access to the usual LINQ methods, e.g. .Any(), .Sum() etc. The CRM SDK doesn't like using them against var datasets, apparently.

Entity Framework - Reference not loading

I have a model-first, entity framework design like this (version 4.4)
When I load it using code like this:
PriceSnapshotSummary snapshot = db.PriceSnapshotSummaries.FirstOrDefault(pss => pss.Id == snapshotId);
the snapshot has loaded everything (that is SnapshotPart, Quote, QuoteType), except the DataInfo. Now looking into the SQL this appears to be because Quote has no FK to DataInfo because of the 0..1 relationship.
However, I would have expected that the navigation property 'DataInfo' on Quote would still go off to the database to fetch it.
My current work around is this:
foreach (var quote in snapshot.ComponentQuotes)
{
var dataInfo = db.DataInfoes.FirstOrDefault(di => di.Quote.Id == quote.InstrumentQuote.Id);
quote.InstrumentQuote.DataInfo = dataInfo;
}
Is there a better way to achieve this? I thought EF would automatically load the reference?
This problem has to do with how the basic linq building blocks interact with Entity Framework.
take the following (pseudo)code:
IQueryable<Address> addresses;
Using (var db = new ObjectContext()) {
addresses = db.Users.Addresses.Where(addr => addr.Number > 1000);
}
addresses.Select(addr => Console.WriteLine(addr.City.Name));
This looks OK, but will throw a runtime error, because of an interface called IQueryable.
IQueryable implements IEnumerable and adds info for an expression and a provider. This basically allows it to build and execute sql statements against a database and not have to load whole tables when fetching data and iterating over them like you would over an IEnumerable.
Because linq defers execution of the expression until it's used, it compiles the IQueryable expression into SQL and executes the database query only right before it's needed. This speeds up things a lot, and allows for expression chaining without going to the database every time a Where() or Select() is executed. The side effect is if the object is used outside the scope of db, then the sql statement is executed after db has been disposed of.
To force linq to execute, you can use ToList, like this:
IQueryable<Address> addresses;
Using (var db = new ObjectContext()) {
addresses = db.Users.Addresses.Where(addr => addr.Number > 1000).ToList();
}
addresses.Select(addr => Console.WriteLine(addr.City.Name));
This will force linq to execute the expression against db and get all addresses with number greater than a thousand. this is all good if you need to access a field within the addresses table, but since we want to get the name of a city (a 1..1 relationship similar to yours), we'll hit another bump before it can run: lazy loading.
Entity framework lazy loads entities by default, so nothing is fetched from the database until needed. Again, this speeds things up considerably, since without it every call to the database could potentially bring the whole database into memory; but has the problem of depending on the context being available.
You could set EF to eager load (in your model, go to properties and set 'Lazy Loading Enabled' to False), but that would bring in a lot of info you probably don't use.
The best fix for this problem is to execute everything inside db's scope:
IQueryable<Address> addresses;
Using (var db = new ObjectContext()) {
addresses = db.Users.Addresses.Where(addr => addr.Number > 1000);
addresses.Select(addr => Console.WriteLine(addr.City.Name));
}
I know this is a really simple example but in the real world you can use a DI container like ninject to handle your dependencies and have your db available to you throughout execution of the app.
This leaves us with Include. Include will make IQueryable include all specified relation paths when building the sql statement:
IQueryable<Address> addresses;
Using (var db = new ObjectContext()) {
addresses = db.Users.Addresses.Include("City").Where(addr => addr.Number > 1000).ToList;
}
addresses.Select(addr => Console.WriteLine(addr.City.Name));
This will work, and it's a nice compromise between having to load the whole database and having to refactor an entire project to support DI.
Another thing you can do, is map multiple tables to a single entity. In your case, since the relationship is 1-0..1, you shouldn't have a problem doing it.

How do I make Entity Framework not join tables?

Ok so I am testing out EF once again for performance and I just want a simple result back from my database.
Example
var jobsList = from j in mf.Jobs
where j.UserID == 1001 select new { Job = j };
This unfortunately joins my User object to this list, which I don't want EF to do. How do I tell EF not to join just because there is a relationship. Basically I just want a simple row from that table.
Or do I need to use a different type of retrieval. I am still using the basic type of database retrieval below and I feel there are better ways to handle db work by now.
SqlConnection myconnection = new SqlConnection();
Edit
Basically what I am saying in a more clearer context. Is that instead of only getting the following.
Job.JobID
Job.UserID
//Extra properties
I Get
Job.JobID
Job.UserID
Job.User
//Extra properties
That User object easily consumes way more memory than is needed, plus I don't need it.
My Solution
So I am still not believing in EF too much and here is why. I turned off LazyLoading and turned it on and didn't really notice too much of a performance difference there. I then compared the amount of data that my SqlConnection type method uses compared to my EF method.
I get back the exact same result set and here are the performance differences.
For my Entity Framework method I get back a list of jobs.
MyDataEntities mf = new MyDataEntities(); // 4MB for the connection...really?
mf.ContextOptions.LazyLoadingEnabled = false;
// 9MB for the list below
var test = from j in mf.Jobs
where j.UserID == 1031
select j;
foreach (Job job in test) {
Console.WriteLine(job.JobID);
}
For my SqlConnection method that executes a Stored Proc and returns a result set.
//356 KB for the connection and the EXACT same list.
List<MyCustomDataSource.Jobs> myJobs = MyCustomDataSource.Jobs.GetJobs(1031);
I fully understand that Entity Framework is doing way way more than a standard SqlConnection, but why all this hype if it is going to take at a minimum of 25x more memory for a result set. Just doesn't seem worth it.
My solution is not to go with EF after all.
The User property is part of the job class but wont be loaded until you access it (lazy loading). So it is not actually "joined".
If you only want the two specified columns you can write
var jobsList = from j in mf.Jobs
where j.UserID == 1001
select new {
Job.JobID,
Job.UserID
};
The most probable reason for this behavior is that you have LazyLoadingEnabled property set to true.
If this is the case, the User isn't recovered in the original query. But if you try to acces this property, even if you do it through an inspection while debugging, this will be loaded from the database. But only if you try to access it.
You can check this opening a a SQL Server Profiler, and seeing what commands are begin sent to the DB.
Your code is not using eager loading or explicit loading. So this must be the reason.
I think that EF don't know that you want one result only. Try something like this.
Job jobsItem = mf.Jobs.Single(j=>j.UserID==1001)
If you don't want to use lambas...
Job JobItem = (from j in mf.Jobs where j.UserID == 1001 select j).Single()
I haven't a compiler near right now, I hope the syntax is right. Use can use var instead of Job for your variable if you preffer. It has no effect but I think Job is more readable in this case.
User actually is not attached to context until you access User property of Job. Turn off Lazy Loading if you want to get a null for User.
Entity Framework does not support lazy loading of properties. However, it has table-splitting
Emphasized the properties. Of course, Entity Framework supports lazy loading of rows

Categories

Resources