Sorting Primary Entities on Properties of Doubly-Nested Entities

Sorting Primary Entities on Properties of Doubly-Nested Entities - c#

I'm using Entity Framework Core with ASP.Net MVC. My business object model consists, in part, of Jobs (the primary entities), each of which contains one or more Projects, and each Project has zero or more Schedules (which link to sets of Departments, but that's not important here). Each schedule has a StartDate and and EndDate.
Here is a (simplified) class diagram (which reflects the database schema as you would expect):
I want to sort the Jobs list by the earliest StartDateTime value in the Schedule entity. I haven't been able to come up with a LINQ chain that accomplishes this. For the time being, I have cobbed the functionality I want by using ADO.Net directly in my controller to assemble the Jobs list based on the following SQL Statement:
#"SELECT
Jobs.JobId,
Jobs.JobName,
Jobs.jobNumber,
MIN(ProjectSchedules.StartDateTime) AS StartDateTime
FROM
Jobs INNER JOIN Projects ON Jobs.JobID = Projects.JobID
LEFT JOIN ProjectSchedules ON Projects.ProjectID = ProjectSchedules.ProjectID
GROUP BY Jobs.JobId, Jobs.JobName, Jobs.JobNumber
ORDER BY StartDateTime"
I would prefer to use EF Core properly, rather than to do an end-run around it. What would be the LINQ statements to generate this SQL statement (or an equivalent one)?

You need a query that "collects" all StartDateTimes in schedules of projects per job, takes their lowest value then sorts by that value:
context.Jobs
.OrderBy(j => j.Projects
.SelectMany(p => p.Schedules)
.Select(s => s.StartDate).OrderBy(d => d).First())
.Select(j => new { ... })
As you see, there's not even a Min function in there. The function could be used, but it may perform worse, because it has to evaluate all StartDate values, while the ordering could make use of an ordered index.
Either way, for comparison, this is with Min():
context.Jobs
.OrderBy(j => j.Projects
.SelectMany(p => p.Schedules)
.Select(s => s.StartDate).Min())

First we need to retrieve all the entities we can do this with the Include statement, and we use theninclude to retrieve entities one further down.
dbcontext.Jobs.Include(j => j.Projects).ThenInclude(p => p.Schedules)
Now that we have all the entities you can do all the sorting, grouping or whatever else you wish to do.
To me it sounds like you want to Orderby on schedule.startdatetime.

Related

How can I query for multiple results while matching on multiple sets of parameters?

First off, I'm using Entity Framework Core 5 with SQL Server provider and a repository pattern to do any work related to the infrastructure.
My problem is that I'm trying to query for a list of CardSale entities while matching on a list of parameters.
The CardSale entity has no key currently, it uses a composite clustered index for uniqueness and organization. we went for this kind of index for several reasons including GroupBy query optimization.
So if I want to query for a single record I can just use something simple like this:
return await _appDbContext.CardSales
.Where(c =>
c.PurchaseCurrency == currency &&
c.Date == date &&
c.SubcategoryId == subcategoryId &&
c.CardId == cardId &&
c.RegionId == regionId &&
c.AccountId == accountId)
.Include(c => c.Card)
.Include(c => c.Region)
.Include(c => c.Account)
.FirstOrDefaultAsync();
Now the problem lies in the next step, a portion of the code requires the ability to pass in a list of Notification objects to a query and receive a list of matching CardSale object on the properties mentioned above.
Before the current change in the code base we had a really big string Key property which was just the properties mentioned above interpolated into a single string (I know, it was really bad) so that meant I could pass in a List<string> ids and use something like
return await _appDbContext.CardSales
.Where(c => ids.Contains(c.id)
and it works just fine.
But now my problem is that this big string id doesn't exist anymore, I only have the individual properties and the composite clustered index.
How can I query for multiple CardSale entities while using parameters from multiple Notification objects?
Extra notes:
All ids are Guid. Date is a date component of a DateTime object.
The query for a specific CardSale has to contain ALL the mentioned properties in the first query which all also exist in the Notification object.

Linq to SQL/Entities: Greatest N-Per group problem/performance increase

Allright, So I have too encountered what I believe is the Greatest-N-Per problem, whereas this question has been answered before I do not think it has been solved well yet with Linq. I have a table with a few million entries, so therefore queries take a lot of time. I would like these queries to take less than a second, whereas currently they spend about 10 seconds to infinity.
var query =
from MD in _context.MeasureDevice
where MD.DbdistributorMap.DbcustomerId == 6 // Filter the devices based on customer
select new
{
DbMeasureDeviceId = MD.DbMeasureDeviceId,
// includes measurements and alarms which have 1-m and 1-m relations
Measure = _context.Measures.Include(e=> e.MeasureAlarms)
.FirstOrDefault(e => e.DbMeasureDeviceId == MD.DbMeasureDeviceId && e.MeasureTimeStamp == _context.Measures
.Where(x => x.DbMeasureDeviceId == MD.DbMeasureDeviceId)
.Max(e=> e.MeasureTimeStamp)),
Address = MD.Dbaddress // includes address 1-1 relation
};
In this query I'm selecting data from 4 different tables. Firstly the MeasureDevice table which is the primary entity im after. Secondly I want the latest measurement from the measures table, which should also include alarms from another table if any exist. Lastly I need the address of the device, which is located in its own table.
There are a few thousand devices, but they have between themselves several thousands of measures which amount to several million rows in the measurement table.
I wonder if anyone has any knowledge as to either improve the performance of Linq queries using EF5, or any better method for solving the Greatest-N-Per problem. I've analyzed the query using Microsoft SQL Server Manager and the most time is spent fetching the measurements.
Query generated as requested:
SELECT [w].[DBMeasureDeviceID], [t].[DBMeasureID], [t].[AlarmDBAlarmID], [t].[batteryValue], [t].[DBMeasureDeviceID], [t].[MeasureTimeStamp], [t].[Stand], [t].[Temperature], [t].[c], [a].[DBAddressID], [a].[AmountAvtalenummere],
[a].[DBOwnerID], [a].[Gate], [a].[HouseCharacter], [a].[HouseNumber], [a].[Latitude], [a].[Longitude], [d].[DBDistributorMapID], [m1].[DBMeasureID], [m1].[DBAlarmID], [m1].[AlarmDBAlarmID], [m1].[MeasureDBMeasureID]
FROM [MeasureDevice] AS [w]
INNER JOIN [DistribrutorMap] AS [d] ON [w].[DBDistributorMapID] = [d].[DBDistributorMapID]
LEFT JOIN [Address] AS [a] ON [w].[DBAddressID] = [a].[DBAddressID]
OUTER APPLY (
SELECT TOP(1) [m].[DBMeasureID], [m].[AlarmDBAlarmID], [m].[batteryValue], [m].[DBMeasureDeviceID], [m].[MeasureTimeStamp], [m].[Stand], [m].[Temperature], 1 AS [c]
FROM [Measure] AS [m]
WHERE ([m].[MeasureTimeStamp] = (
SELECT MAX([m0].[MeasureTimeStamp])
FROM [Measure] AS [m0]
WHERE [m0].[DBMeasureDeviceID] = [w].[DBMeasureDeviceID])) AND ([w].[DBMeasureDeviceID] = [m].[DBMeasureDeviceID])
) AS [t]
LEFT JOIN [MeasureAlarm] AS [m1] ON [t].[DBMeasureID] = [m1].[MeasureDBMeasureID]
WHERE [d].[DBCustomerID] = 6
ORDER BY [w].[DBMeasureDeviceID], [d].[DBDistributorMapID], [a].[DBAddressID], [t].[DBMeasureID], [m1].[DBMeasureID], [m1].[DBAlarmID]
Entity Relations

You have navigation properties defined, so it stands that MeasureDevice should have a reference to it's Measures:
var query = _context.MeasureDevice
.Include(md => md.Measures.Select(m => m.MeasureAlarms)
.Where(md => md.DbDistributorMap.DbCustomerId == 6)
.Select(md => new
{
DbMeasureDeviceId = md.DbMeasureDeviceId,
Measure = md.Measures.OrderByDescending(m => m.MeasureTimeStamp).FirstOrDefault(),
Address = md.Address
});
The possible bugbear here is including the MeasureAlarms with the required Measure. AFAIK you cannot put an .Include() within a .Select() (Where we might have tried Measure = md.Measures.Include(m => m.MeasureAlarms)...
Caveat: It has been quite a while since I have used EF 5 (Unless you are referring to EF Core 5) If you are using the (very old) EF5 in your project I would recommend arguing for the upgrade to EF6 given EF6 did bring a number of performance and capability improvements to EF5. If you are instead using EF Core 5, the Include statement above would be slightly different:
.Include(md => md.Measures).ThenInclude(m => m.MeasureAlarms)
Rather than returning entities, my go-to advice is to use Projection to select precisely the data we need. That way we don't need to worry about eager or lazy loading. If there are details about the Measure and MeasureAlarms we need:
var query = _context.MeasureDevice
.Where(md => md.DbDistributorMap.DbCustomerId == 6)
.Select(md => new
{
md.DbMeasureDeviceId,
Measure = md.Measures
.Select(m => new
{
m.MeasureId,
m.MeasureTimestamp,
// any additional needed fields from Measure
Address = m.Address.Select(a => new
{
// Assuming Address is an entity, any needed fields from Address.
}),
Alarms = m.MeasureAlarms.Select(ma => new
{
ma.MeasureAlarmId,
ma.Label // etc. Whatever fields needed from Alarm...
}).ToList()
}).OrderByDescending(m => m.MeasureTimestamp)
.FirstOrDefault()
});
This example selects anonymous types, alternatively you can define DTOs/ViewModels and can leverage libraries like Automapper to map the fields to the respective entity values to replace all of that with something like ProjectTo<LatestMeasureSummaryDTO> where Automapper has rules to map a MeasureDevice to resolve the latest Measure and extract the needed fields.
The benefits of projection are handling otherwise complex/clumsy eager loading, building optimized payloads with only the fields a consumer needs, and resilience in a changing system where new relationships don't accidentally introduce lazy loading performance issues. For example if Measure currently only has MeasureAlarm to eager load, everything works. But down the road if a new relationship is added to Measure or MeasureAlarm and your payload containing those entities are serialized, that serialization call will now "trip" lazy loading on the new relationship unless you revisit all queries retrieving these entities and add more eager loads, or start worrying about disabling lazy loading entirely. Projections remain the same until only if and when the fields they need to return actually need to change.
Beyond that, the next thing you can investigate is to run the resulting query through an analyzer, such as within SQL Management Studio to return the execution plan and identify whether the query could benefit from indexing changes.

Linq query timing out, how to streamline query

Our front end UI has a filtering system that, in the back end, operates over millions of rows. It uses a an IQueryable that is built up over the course of the logic, then executed all at once. Each individual UI component is ANDed together (for example, Dropdown1 and Dropdown2 will only return rows that have both of what is selected in common). This is not a problem. However, Dropdown3 has has two types of data in it, and the checked items need to be ORd together, then ANDed with the rest of the query.
Due to the large amount of rows it is operating over, it keeps timing out. Since there are some additional joins that need to happen, it is somewhat tricky. Here is my code, with the table names replaced:
//The end list has driver ids in it--but the data comes from two different places. Build a list of all the driver ids.
driverIds = db.CarDriversManyToManyTable.Where(
cd =>
filter.CarIds.Contains(cd.CarId) && //get driver IDs for each car ID listed in filter object
).Select(cd => cd.DriverId).Distinct().ToList();
driverIds = driverIds.Concat(
db.DriverShopManyToManyTable.Where(ds => filter.ShopIds.Contains(ds.ShopId)) //Get driver IDs for each Shop listed in filter object
.Select(ds => ds.DriverId)
.Distinct()).Distinct().ToList();
//Now we have a list solely of driver IDs
//The query operates over the Driver table. The query is built up like this for each item in the UI. Changing from Linq is not an option.
query = query.Where(d => driverIds.Contains(d.Id));
How can I streamline this query so that I don't have to retrieve thousands and thousands of IDs into memory, then feed them back into SQL?

There are several ways to produce a single SQL query. All they require to keep the parts of the query of type IQueryable<T>, i.e. do not use ToList, ToArray, AsEnumerable etc. methods that force them to be executed and evaluated in memory.
One way is to create Union query containing the filtered Ids (which will be unique by definition) and use join operator to apply it on the main query:
var driverIdFilter1 = db.CarDriversManyToManyTable
.Where(cd => filter.CarIds.Contains(cd.CarId))
.Select(cd => cd.DriverId);
var driverIdFilter2 = db.DriverShopManyToManyTable
.Where(ds => filter.ShopIds.Contains(ds.ShopId))
.Select(ds => ds.DriverId);
var driverIdFilter = driverIdFilter1.Union(driverIdFilter2);
query = query.Join(driverIdFilter, d => d.Id, id => id, (d, id) => d);
Another way could be using two OR-ed Any based conditions, which would translate to EXISTS(...) OR EXISTS(...) SQL query filter:
query = query.Where(d =>
db.CarDriversManyToManyTable.Any(cd => d.Id == cd.DriverId && filter.CarIds.Contains(cd.CarId))
||
db.DriverShopManyToManyTable.Any(ds => d.Id == ds.DriverId && filter.ShopIds.Contains(ds.ShopId))
);
You could try and see which one performs better.

The answer to this question is complex and has many facets that, individually, may or may not help in your particular case.
First of all, consider using pagination. .Skip(PageNum * PageSize).Take(PageSize) I doubt your user needs to see millions of rows at once in the front end. Show them only 100, or whatever other smaller number seems reasonable to you.
You've mentioned that you need to use joins to get the data you need. These joins can be done while forming your IQueryable (entity framework), rather than in-memory (linq to objects). Read up on join syntax in linq.
HOWEVER - performing explicit joins in LINQ is not the best practice, especially if you are designing the database yourself. If you are doing database first generation of your entities, consider placing foreign-key constraints on your tables. This will allow database-first entity generation to pick those up and provide you with Navigation Properties which will greatly simplify your code.
If you do not have any control or influence over the database design, however, then I recommend you construct your query in SQL first to see how it performs. Optimize it there until you get the desired performance, and then translate it into an entity framework linq query that uses explicit joins as a last resort.
To speed such queries up, you will likely need to perform indexing on all of the "key" columns that you are joining on. The best way to figure out what indexes you need to improve performance, take the SQL query generated by your EF linq and bring it on over to SQL Server Management Studio. From there, update the generated SQL to provide some predefined values for your #p parameters just to make an example. Once you've done this, right click on the query and either use display estimated execution plan or include actual execution plan. If indexing can improve your query performance, there is a pretty good chance that this feature will tell you about it and even provide you with scripts to create the indexes you need.

It looks to me that using the instance versions of the LINQ extensions is creating several collections before you're done. using the from statement versions should cut that down quite a bit:
driveIds = (from var record in db.CarDriversManyToManyTable
where filter.CarIds.Contains(record.CarId)
select record.DriverId).Concat
(from var record in db.DriverShopManyToManyTable
where filter.ShopIds.Contains(record.ShopId)
select record.DriverId).Distinct()
Also using the groupby extension would give better performance than querying each driver Id.

Optimising Linq to Entities

I have a set of related entities. I'm using linq to group a collection of an entity type by a property on a related entity and then doing a sum calculation on a property of another related entity:
Vehicles.GroupBy(v => v.Mechanics.Engine.Size)
.Select(g => g.Sum(s => s.Passengers.Count));
I'm trying to do as much as possible via linq to entities because there is a large number of records in the db. However, the generated sql includes 9 select statements and an outer apply which takes more than 5 times as long to execute as writing the simplified sql code to achieve the same in one select statement.
How do I improve the generated sql?

You're in fact counting the number of passengers per engine size. So, the navigation properties permitting, you could also do:
Passengers.GroupBy(p => p.Vehicle.Mechanics.Engine.Size)
.Select(g => g.Count())
This will probably generate more joins and less subqueries. And only one aggregating statement in stead of two in the original query, of which one (Count) is repeated for each size.

Perhaps try the query like this:
Vehicles
.Select(x => new
{
EngineSize = x.Mechanics.Engine.Size,
PassengersCount = xs.Passengers.Count,
})
.ToArray()
.GroupBy(v => v.EngineSize)
.Select(g => g.Sum(s => s.PassengersCount));
This will execute in a single query, but may pull back too much data to make it faster. It's worth timing and profiling to see which is better.

You could also consider a hybrid approach whereby you bypass LINQ query generation yet use EF to project results into strong types like this:
public List<Vechicles> GetVehcileInformation(string VehicleType){
var QueryString = Resources.Queries.AllVehicles;
var parms = new List<SqlParameters>();
parms.Add(new SqlParameter("VehicleType", VehicleType );
try{
using (var db = new MyEntities()){
var stuff= db.SqlQuery<Vehicles>(QueryString, parms.ToArray());
return stuff.ToList();
}
}catch(exception iox){Log.ErrorMessage(iox);}
}
The idea is that the group by is done at DB layer which gives you more control than in LINQ. You get the speed of direct SQL Queries but get back strongly typed results! The query string itself is stored in a resources file as a string with Parameter place holders like this:
Select * from Table Where FieldName = #VehicleType...

Data Loading Strategy/Syntax in EF4

Long time lurker, first time posting, and newly learning EF4 and MVC3.
I need help making sure I'm using the correct data loading strategy in this case as well as some help finalizing some details of the query. I'm currently using the eager loading approach outlined here for somewhat of a "dashboard" view that requires a small amount of data from about 10 tables (all have FK relationships).
var query = from l in db.Leagues
.Include("Sport")
.Include("LeagueContacts")
.Include("LeagueContacts.User")
.Include("LeagueContacts.User.UserContactDatas")
.Include("LeagueEvents")
.Include("LeagueEvents.Event")
.Include("Seasons")
.Include("Seasons.Divisions")
.Include("Seasons.Divisions.Teams")
.Where(l => l.URLPart.Equals(leagueName))
select (l);
model = (Models.League) query.First();
However, I need to do some additional filtering, sorting, and shaping of the data that I haven't been able to work out. Here are my chief needs/concerns from this point:
Several child objects still need additional filtering but I haven't been able to figure out the syntax or best approach yet. Example: "TOP 3 LeagueEvents.Event WHERE StartDate >= getdate() ORDER BY LeagueEvents.Event.StartDate"
I need to sort some of the fields. Examples: ORDERBY Seasons.StartDate, LeagueEvents.Event.StartDate, and LeagueContacts.User.SortOrder, etc.
I'm already very concerned about the overall size of the SQL generated by this query and the number of joins and am thinking that I may need a different data loading approach alltogether.(Explicit loading? Multiple QueryObjects? POCO?)
Any input, direction, or advice on how to resolve these remaining needs as well as ensuring the best performance is greatly appreciated.

Your concern about size of the query and size of the result set are tangible.
As #BrokenGlass mentioned EF doesn't allow you doing filtering or ordering on includes. If you want to order or filter relations you must use projection either to anonymous type or custom (non mapped) type:
var query = db.Leagues
.Where(l => l.URLPart.Equals(leagueName))
.Select(l => new
{
League = l,
Events = l.LeagueEvents.Where(...)
.OrderBy(...)
.Take(3)
.Select(e => e.Event)
...
});

Unfortunately EF doesn't allow to selectively load related entities using its navigation properties, it will always load all Foos if you specify Include("Foo").
You will have to do a join on each of the related entities using your Where() clauses as filters where they apply.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.