Nested Select Statements in LINQ to Entity Without Unnecessary Loads

Nested Select Statements in LINQ to Entity Without Unnecessary Loads - c#

I want to perform a nested select statement in linq the form of:
select *
from table_a
where w in (select w
from table_b
where x in (select x
from table_c
where z in (select z
from table_d
where z = z)))
The problem is, the only way I can figure out how to do that is by loading the results from table_b and table_c, which adds an unnecessary expense. For example, say I am attempting to load all of a customer's orderdetaildetails. The following code will load ALL of MyCustomer's orders and ALL of each order's orderdetails and, only then, all of each orderdetail's orderdetaildetails:
customer MyCustomer; //Entity customer already loaded.
var query = MyCustomer.orders.SelectMany(order => order.orderdetails).SelectMany(od => od.orderdetaildetails);
Another approach is to use the .Include function. However, this also loads each level:
var query = MyCustomer.orders.CreateSourceQuery().Include("orderdetails.orderdetaildetails");
Both of these functions load unnecessary data. The first, SelectMany(), actually makes separate roundtrips to the database for each navigation level and then for each returned entity (save entities on the last navigation level). Include() makes one trip to the database and does one giant join statement. This is a little better, but still unseemly.
Is there a way to reach the ordetaildetails level (from customer) WITHOUT loading orders and orderdetails into memory AND in one trip to the database only?
Thanks guys - Lax

This should get you the orderdetaildetails for a given customer without unnecessary loading.
customer MyCustomer; // Entity customer already loaded
var orderDetailsDetails = context.OrderDetailsDetails
.Where(odd => odd.OrderDetail.Order.Customer.CustomerPK == customer.CustomerPK);
It looks like you have lazy loading enabled which means that as soon as you access the customers orders EF goes off to the database to get them for you. It's the same when ever you access the orders orderdetails. An alternative method similar to what you used would be.
var query = context.Customers.Where(c => c.CustomerPK == customer.CustomerPK)
.SelectMany(c => c.orders)
.SelectMany(order => order.orderdetails)
.SelectMany(od => od.orderdetaildetails);

Related

Linq to SQL/Entities: Greatest N-Per group problem/performance increase

Allright, So I have too encountered what I believe is the Greatest-N-Per problem, whereas this question has been answered before I do not think it has been solved well yet with Linq. I have a table with a few million entries, so therefore queries take a lot of time. I would like these queries to take less than a second, whereas currently they spend about 10 seconds to infinity.
var query =
from MD in _context.MeasureDevice
where MD.DbdistributorMap.DbcustomerId == 6 // Filter the devices based on customer
select new
{
DbMeasureDeviceId = MD.DbMeasureDeviceId,
// includes measurements and alarms which have 1-m and 1-m relations
Measure = _context.Measures.Include(e=> e.MeasureAlarms)
.FirstOrDefault(e => e.DbMeasureDeviceId == MD.DbMeasureDeviceId && e.MeasureTimeStamp == _context.Measures
.Where(x => x.DbMeasureDeviceId == MD.DbMeasureDeviceId)
.Max(e=> e.MeasureTimeStamp)),
Address = MD.Dbaddress // includes address 1-1 relation
};
In this query I'm selecting data from 4 different tables. Firstly the MeasureDevice table which is the primary entity im after. Secondly I want the latest measurement from the measures table, which should also include alarms from another table if any exist. Lastly I need the address of the device, which is located in its own table.
There are a few thousand devices, but they have between themselves several thousands of measures which amount to several million rows in the measurement table.
I wonder if anyone has any knowledge as to either improve the performance of Linq queries using EF5, or any better method for solving the Greatest-N-Per problem. I've analyzed the query using Microsoft SQL Server Manager and the most time is spent fetching the measurements.
Query generated as requested:
SELECT [w].[DBMeasureDeviceID], [t].[DBMeasureID], [t].[AlarmDBAlarmID], [t].[batteryValue], [t].[DBMeasureDeviceID], [t].[MeasureTimeStamp], [t].[Stand], [t].[Temperature], [t].[c], [a].[DBAddressID], [a].[AmountAvtalenummere],
[a].[DBOwnerID], [a].[Gate], [a].[HouseCharacter], [a].[HouseNumber], [a].[Latitude], [a].[Longitude], [d].[DBDistributorMapID], [m1].[DBMeasureID], [m1].[DBAlarmID], [m1].[AlarmDBAlarmID], [m1].[MeasureDBMeasureID]
FROM [MeasureDevice] AS [w]
INNER JOIN [DistribrutorMap] AS [d] ON [w].[DBDistributorMapID] = [d].[DBDistributorMapID]
LEFT JOIN [Address] AS [a] ON [w].[DBAddressID] = [a].[DBAddressID]
OUTER APPLY (
SELECT TOP(1) [m].[DBMeasureID], [m].[AlarmDBAlarmID], [m].[batteryValue], [m].[DBMeasureDeviceID], [m].[MeasureTimeStamp], [m].[Stand], [m].[Temperature], 1 AS [c]
FROM [Measure] AS [m]
WHERE ([m].[MeasureTimeStamp] = (
SELECT MAX([m0].[MeasureTimeStamp])
FROM [Measure] AS [m0]
WHERE [m0].[DBMeasureDeviceID] = [w].[DBMeasureDeviceID])) AND ([w].[DBMeasureDeviceID] = [m].[DBMeasureDeviceID])
) AS [t]
LEFT JOIN [MeasureAlarm] AS [m1] ON [t].[DBMeasureID] = [m1].[MeasureDBMeasureID]
WHERE [d].[DBCustomerID] = 6
ORDER BY [w].[DBMeasureDeviceID], [d].[DBDistributorMapID], [a].[DBAddressID], [t].[DBMeasureID], [m1].[DBMeasureID], [m1].[DBAlarmID]
Entity Relations

You have navigation properties defined, so it stands that MeasureDevice should have a reference to it's Measures:
var query = _context.MeasureDevice
.Include(md => md.Measures.Select(m => m.MeasureAlarms)
.Where(md => md.DbDistributorMap.DbCustomerId == 6)
.Select(md => new
{
DbMeasureDeviceId = md.DbMeasureDeviceId,
Measure = md.Measures.OrderByDescending(m => m.MeasureTimeStamp).FirstOrDefault(),
Address = md.Address
});
The possible bugbear here is including the MeasureAlarms with the required Measure. AFAIK you cannot put an .Include() within a .Select() (Where we might have tried Measure = md.Measures.Include(m => m.MeasureAlarms)...
Caveat: It has been quite a while since I have used EF 5 (Unless you are referring to EF Core 5) If you are using the (very old) EF5 in your project I would recommend arguing for the upgrade to EF6 given EF6 did bring a number of performance and capability improvements to EF5. If you are instead using EF Core 5, the Include statement above would be slightly different:
.Include(md => md.Measures).ThenInclude(m => m.MeasureAlarms)
Rather than returning entities, my go-to advice is to use Projection to select precisely the data we need. That way we don't need to worry about eager or lazy loading. If there are details about the Measure and MeasureAlarms we need:
var query = _context.MeasureDevice
.Where(md => md.DbDistributorMap.DbCustomerId == 6)
.Select(md => new
{
md.DbMeasureDeviceId,
Measure = md.Measures
.Select(m => new
{
m.MeasureId,
m.MeasureTimestamp,
// any additional needed fields from Measure
Address = m.Address.Select(a => new
{
// Assuming Address is an entity, any needed fields from Address.
}),
Alarms = m.MeasureAlarms.Select(ma => new
{
ma.MeasureAlarmId,
ma.Label // etc. Whatever fields needed from Alarm...
}).ToList()
}).OrderByDescending(m => m.MeasureTimestamp)
.FirstOrDefault()
});
This example selects anonymous types, alternatively you can define DTOs/ViewModels and can leverage libraries like Automapper to map the fields to the respective entity values to replace all of that with something like ProjectTo<LatestMeasureSummaryDTO> where Automapper has rules to map a MeasureDevice to resolve the latest Measure and extract the needed fields.
The benefits of projection are handling otherwise complex/clumsy eager loading, building optimized payloads with only the fields a consumer needs, and resilience in a changing system where new relationships don't accidentally introduce lazy loading performance issues. For example if Measure currently only has MeasureAlarm to eager load, everything works. But down the road if a new relationship is added to Measure or MeasureAlarm and your payload containing those entities are serialized, that serialization call will now "trip" lazy loading on the new relationship unless you revisit all queries retrieving these entities and add more eager loads, or start worrying about disabling lazy loading entirely. Projections remain the same until only if and when the fields they need to return actually need to change.
Beyond that, the next thing you can investigate is to run the resulting query through an analyzer, such as within SQL Management Studio to return the execution plan and identify whether the query could benefit from indexing changes.

Entity Framework gets progressively slow with extra join added even though SQL generated is fast

We have 18 table join which is typical for ERP systems. The join is done via LINQ over Entity Framework.
The join gets progressively slower as more joins are added. The return result set is small(15 records). The LINQ generated query is captured via SQL Profiler and when we run this via Microsoft Management Console it is very fast : 10ms. When we run it via our C# LINQ-over-EntityFramework it takes 4 seconds.
What i guess is happening:
The time it takes to compile expression tree into SQL is 2 seconds out of total 4 seconds, and another 2 seconds i guess is spent internally to convert SQL result set into actually C# classes. Also it is not connected to initialization of entity framework because we run some queries before and repetitive calls to this join produce same 4 seconds.
Is there a way to speed this up. Otherwise we are considering abandoning Entity Framework for being absolutely inefficient...

In case it helps, I had a nasty performance issue, whereby a simple query that took 1-2 seconds in raw SQL took about 11 seconds via EF.
I went from using...
List<GeographicalLocation> geographicalLocations = new SalesTrackerCRMEntities()
.CreateObjectSet<GeographicalLocation>()
.Where(g => g.Active)
.ToList();
which took about 11 seconds via EF, to using...
var geographicalLocations = getContext().CreateObjectSet<GeographicalLocation>()
.AsNoTracking()
.Where(g => g.Active).ToList();
which took less than 200 milliseconds.
The disadvantage to this is that it won't load related entities, so you have to load them manually afterwards, but it gives such an immense performance boost that it was well worth it (in this case at least).
You would have to assess each case individually to see if the extra speed is worth the extra code.

You correctly identified bottlenecks.
If you have quite complex queries, I would suggest you to use compiled queries to overcome expression tree to sql query conversion.
You can refer Compiled Queries in EF from here.
Fo second part if EF is using two much time materialize your object graph then I would suggest to use some other means to retrieve data apart from EF.
One option can be Dapper.NET, You can have your concise sql query and you can directly retrieve its result in concrete model objects using Dapper (or any other tiny ORM)

I suspect your query takes so long to generate becuase you are treating Entity Framework like it is a SQL Query, which is not correct. You have many joins and akward calls in your linq syntax. Generally, your syntax should be similar to the following fictitious modeling query:
var result = (from appointment in appointments
from operation in appointment.Operations
where appointment.Id == 12
select new Model {
Id = appointment.Id,
Name = appointment.Name,
// etc, etc
}).ToList();
There is no use of joins above, the navigation property between Appointment and Operations takes care of the neccessary plumbing. Remember, this is an ORM, there is no concept of a join, only a concept of relationships.
The call to Distinct at the end, also indicates the structure of the db schema may be problematic if it returns too many duplicate results.
If after refactoring the entity model and correctly constructing the query still leaves with underperformance, it is advisable to use a stored procedure and map the result with EF's built in methods for doing so.

It is hard to tell what is going wrong here without seeing how you are using linq, but I suspect this will fix your problem:
var myResult = dataContext.table.Where(x == "Your joins and otherstuff").ToList();
//after converting it to a list use it how you need, but not before.
If this does not help please post your code.

The problem is that you are probably passing it to a data source that is running all sorts of additional queries based on you open result set.
Try this instead:
IEnumerable<SigmaTEK.Web.Models.SchedulerGridModel> tasks = (from appointment in _appointmentRep.Value.Get().Where(a => (a.Start < DbContext.MaxTime && DbContext.MinTime < a.Expiration))
join timeApplink in _timelineAppointmentLink.Value.Get().Where(a => a.AppointmentId != Guid.Empty)
on appointment.Id equals timeApplink.AppointmentId
join timeline in timelineRep.Value.Get().Where(i => timelines.Contains(i.Id))
on timeApplink.TimelineId equals timeline.Id
join repeater in _appointmentRepeaterRep.Value.Get().Where(repeater => (repeater.Start < DbContext.MaxTime && DbContext.MinTime < repeater.Expiration))
on appointment.Id equals repeater.Appointment
into repeaters
from repeater in repeaters.DefaultIfEmpty()
join aInstance in _appointmentInstanceRep.Value.Get()
on appointment.Id equals aInstance.Appointment
into instances
from instance in instances.DefaultIfEmpty()
join opRes in opResRep.Get()
on instance.ResourceOwner equals opRes.Id
into opResources
from op in opResources.DefaultIfEmpty()
join itemResource in _opDocItemResourcelinkRep.Value.Get()
on op.Id equals itemResource.Resource
into itemsResources
from itemresource in itemsResources.DefaultIfEmpty()
join opDocItem in opDocItemRep.Get()
on itemresource.OpDocItem equals opDocItem.Id
into opDocItems
from opdocitem in opDocItems.DefaultIfEmpty()
join opDocSection in opDocOpSecRep.Get()
on opdocitem.SectionId equals opDocSection.Id
into sections
from section in sections.DefaultIfEmpty()
join opDoc in opDocRep.Get()
on section.InternalOperationalDocument equals opDoc.Id
into opdocs
from opdocitem2 in opDocItems.DefaultIfEmpty()
join opDocItemLink in opDocItemStrRep.Get()
on opdocitem2.Id equals opDocItemLink.Parent
into opDocItemLinks
from link in opDocItemLinks.DefaultIfEmpty()
join finItem in finItemsRep.Get()
on link.Child equals finItem.Id
into temp1
from rd1 in temp1.DefaultIfEmpty()
join sec in finSectionRep.Get()
on rd1.SectionId equals sec.Id
into opdocsections
from finopdocsec in opdocsections.DefaultIfEmpty()
join finopdoc in opDocRep.Get().Where(i => i.DocumentType == "Quote")
on finopdocsec.InternalOperationalDocument equals finopdoc.Id
into finOpdocs
from finOpDoc in finOpdocs.DefaultIfEmpty()
join entry in entryRep.Get()
on rd1.Transaction equals entry.Transaction
into entries
from entry2 in entries.DefaultIfEmpty()
join resproduct in resprosductRep.Get()
on entry2.Id equals resproduct.Entry
into resproductlinks
from resprlink in resproductlinks.DefaultIfEmpty()
join res in resRep.Get()
on resprlink.Resource equals res.Id
into rootResource
from finopdoc in finOpdocs.DefaultIfEmpty()
join rel in orgDocIndRep.Get().Where(i => (i.Relationship == "OrderedBy"))
on finopdoc.Id equals rel.OperationalDocument
into orgDocIndLinks
from orgopdoclink in orgDocIndLinks.DefaultIfEmpty()
join org in orgRep.Get()
on orgopdoclink.Organization equals org.Id
into toorgs
from opdoc in opdocs.DefaultIfEmpty()
from rootresource in rootResource.DefaultIfEmpty()
from toorg in toorgs.DefaultIfEmpty()
select new SigmaTEK.Web.Models.SchedulerGridModel()
{
Id = appointment.Id,
Description = appointment.Description,
End = appointment.Expiration,
Start = appointment.Start,
OperationDisplayId = op.DisplayId,
OperationName = op.Name,
AppContextId = _appContext.Id,
TimelineId = timeline.Id,
AssemblyDisplayId = rootresource.DisplayId,
//Duration = SigmaTEK.Models.App.Utils.StringHelpers.TimeSpanToString((appointment.Expiration - appointment.Start)),
WorkOrder = opdoc.DisplayId,
Organization = toorg.Name
}).Distinct().ToList();
//In your UI
MyGrid.DataSource = tasks;
MyGrid.DataBind();
//Do not use an ObjectDataSource! It makes too many extra calls

Entity Framework COUNT is doing a SELECT of all records

Profiling my code because it is taking a long time to execute, it is generating a SELECT instead of a COUNT and as there are 20,000 records it is very very slow.
This is the code:
var catViewModel= new CatViewModel();
var catContext = new CatEntities();
var catAccount = catContext.Account.Single(c => c.AccountId == accountId);
catViewModel.NumberOfCats = catAccount.Cats.Count();
It is straightforward stuff, but the code that the profiler is showing is:
exec sp_executesql N'SELECT
[Extent1].xxxxx AS yyyyy,
[Extent1].xxxxx AS yyyyy,
[Extent1].xxxxx AS yyyyy,
[Extent1].xxxxx AS yyyyy // You get the idea
FROM [dbo].[Cats] AS [Extent1]
WHERE Cats.[AccountId] = #EntityKeyValue1',N'#EntityKeyValue1 int',#EntityKeyValue1=7
I've never seen this behaviour before, any ideas?
Edit: It is fixed if I simply do this instead:
catViewModel.NumberOfRecords = catContext.Cats.Where(c => c.AccountId == accountId).Count();
I'd still like to know why the former didn't work though.

So you have 2 completely separate queries going on here and I think I can explain why you get different results. Let's look at the first one
// pull a single account record
var catAccount = catContext.Account.Single(c => c.AccountId == accountId);
// count all the associated Cat records against said account
catViewModel.NumberOfCats = catAccount.Cats.Count();
Going on the assumption that Cats has a 0..* relationship with Account and assuming you are leveraging the frameworks ability to lazily load foreign tables then your first call to catAccounts.Cats is going to result in a SELECT for all the associated Cat records for that particular account. This results in the table being brought into memory therefore the call to Count() would result in an internal check of the Count property of the in-memory collection (hence no COUNT SQL generated).
The second query
catViewModel.NumberOfRecords =
catContext.Cats.Where(c => c.AccountId == accountId).Count();
Is directly against the Cats table (which would be IQueryable<T>) therefore the only operations performed against the table are Where/Count, and both of these will be evaluated on the DB-side before execution so it's obviously a lot more efficient than the first.
However, if you need both Account and Cats then I would recommend you eager load the data on the fetch, that way you take the hit upfront once
var catAccount = catContext.Account.Include(a => a.Cats).Single(...);

Most times, when somebody accesses a sub-collection of an entity, it is because there are a limited number of records, and it is acceptable to populate the collection. Thus, when you access:
catAccount.Cats
(regardless of what you do next), it is filling that collection. Your .Count() is then operating on the local in-memory collection. The problem is that you don't want that. Now you have two options:
check whether your provider offer some mechanism to make that a query rather than a collection
build the query dynamically
access the core data-model instead
I'm pretty confident that if you did:
catViewModel.NumberOfRecords =
catContext.Cats.Count(c => c.AccountId == accountId);
it will work just fine. Less convenient? Sure. But "works" is better than "convenient".

How do you override lazy loading on a single LINQ query using Fluent NHibernate

By default, I have lazy loading enabled on all my models, and that's the way I want to keep things. However, sometimes I want to eagerly fetch all the data up front on an individual query. From everything I've read, I should be using FetchMany() for this purpose. However, if I do:
var dbRecipe = (from r in session.Query<Models.Recipes>().FetchMany(x => x.Ingredients)
where r.RecipeId == recipeId
select r).FirstOrDefault();
Then dbRecipe.Ingredients.Count() returns 1. In other words, it only returns the first ingredient of that recipe. However, if I do:
var dbRecipe = (from r in session.Query<Models.Recipes>()
where r.RecipeId == recipeId
select r).FirstOrDefault();
Then dbRecipe.Ingredients.Count() returns 12, which is correct, however it performs a second query to load the ingredients for that recipe.
How do I make FetchMany fetch all 12 records up front? I was assuming that was the difference between Fetch and FetchMany. I'm clearly doing something wrong.

You can work around this by not running FirstOrDefault as the last statement. This will cause nh to run a top(1) query which yields wrong results...
Instead use .ToList().FirstOrDefault().
Or you use QueryOver<> which works fine
session.QueryOver<Models.Recipes>()
.Fetch(prop => prop.Ingredients)
.Eager
.Where(p => p.RecipeId == recipeId)
.SingleOrDefault();

Child Entities not loaded when query joins tables

I have a many-to-many relationship where Content has ContentTags which point to a Tag. I've put the relevant [Include] attributes on my entities to create the properties.
If I write enumerate ObjectContext.Contents.Include("ContentTags.Tag") then I get the ContentTags and Tags included as expected. When I use a join however the ContentTags are missing from my Content entity:
var contentsForTag =
from c in ObjectContext.Contents.Include("ContentTags.Tag")
join ct in ObjectContext.ContentTags on c.Id equals ct.ContentId
join t in ObjectContext.Tags on ct.TagId equals t.Id
where t.Name.ToLower().Contains(lowerTag)
select c;
Any ideas what's going on here?

I'm not sure why this is happening but I think it is because of a contradiction.
The join says that EF should load only the tags that containes lowerTag but the Include says that all tags should be loaded. I would guess EF can't resolve this and that is why none are included. You should be able to write your query without the join though
var contentsForTag =
from c in ObjectContext.Contents.Include("ContentTags.Tag")
where c.ContentTags.Any(ct => ct.Tag.Name.ToLower().Contains(lowerTag))
select c;

Try the following:
var anonType =
from c in ObjectContext.Contents
join ct in ObjectContext.ContentTags on c.Id equals ct.ContentId
join t in ObjectContext.Tags on ct.TagId equals t.Id
where t.Name.ToLower().Contains(lowerTag)
select new { Contents = c, ContentTags = ct, Tags = t }).AsEnumerable();
IList<Contents> contentsForTag = anonType.Select(c => c.Contents).ToList();
If you drop all relevant tables into an anonymous type EF will understand that you in fact need all of that info and will bring it back. The best part is that EF will also take care of the auto-fixup, meaning all relationships will be maintained. The last line of the sample simply extracts the desired objects from the anonymous type into a strongly typed list, however the rest of the graph is still alive and well.

Sounds like a "lazy-load" vs "eager-load" difference. The collection of Tags for a Content class is stored in a child table. Many ORMs including EF try to "lazy-load" collections and other many-to-one references, because it doesn't know if you'll need them and it would be a waste of bandwidth if you didn't. However, this means your tags aren't available in retrieved instances. To tell L2E that, yes, you really do need the tags, you specify that the child reference should be "eagerly" traversed when building the context.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Nested Select Statements in LINQ to Entity Without Unnecessary Loads - c#

Related

Linq to SQL/Entities: Greatest N-Per group problem/performance increase

Entity Framework gets progressively slow with extra join added even though SQL generated is fast

Entity Framework COUNT is doing a SELECT of all records

How do you override lazy loading on a single LINQ query using Fluent NHibernate

Child Entities not loaded when query joins tables

Categories

Resources