I'm new to EF, I have a database in which a certain table contains keys of a number of other tables, it is the central table from which I go to the other tables (ProductIdentifiers).
The input to the query is not id, but the Name which is not defined as any key.
Here's my Entity Framework query :
public ProductIdentifier? GetFullCedMed(string v)
=> db.ProductIdentifiers.Where(a => a.Name == v)
.Include(ced => ced.Project)
.Include(ced => ced.LimitValues).ThenInclude(l => l.Parameter)
.Include(ced => ced.LimitValues).ThenInclude(l => l.Bin)
.Include(ced => ced.LimitValues).ThenInclude(l => l.Stage)
.Include(ced => ced.LimitValues).ThenInclude(l => l.TestTypeNavigation)
.Include(ced => ced.ConfigValues).ThenInclude(c => c.Parameter)
.Include(ced => ced.ConfigValues).ThenInclude(c => c.Bin)
.Include(ced => ced.ConfigValues).ThenInclude(c => c.Stage)
.Include(ced => ced.ConfigValues).ThenInclude(c => c.TestTypeNavigation)
.Include(ced => ced.FatherCedmed)
.ToList().FirstOrDefault();
The code is converted to a SQL query that looks like this:
exec sp_executesql N'SELECT [p].[id], [p].[FatherCedmedId], [p].[Name], [p].[ProjectId], [m].[id], [m].[IsActive], [m].[Name], [m].[ProductLineName], [p0].[id], [t].[id], [t].[BinId], [t].[CED_MED], [t].[LSL], [t].[ParameterID], [t].[StageID], [t].[TestType], [t].[USL], [t].[id0], [t].[Enabled], [t].[FORMAT], [t].[IsLimit], [t].[ParamID], [t].[Parameter_Name], [t].[Print], [t].[Unit], [t].[id1], [t].[BinDescription], [t].[BinDescriptionOverride], [t].[BinNumber], [t].[BinNumberOverride], [t].[GroupID], [t].[ParamID0], [t].[StageID0], [t].[id2], [t].[Name], [t].[StageNumber], [t].[id3], [t].[LevelId], [t].[Name0], [t].[OrderingId], [t0].[id], [t0].[BinId], [t0].[CED_MED], [t0].[ParameterID], [t0].[StageID], [t0].[TestType], [t0].[Value], [t0].[id0], [t0].[Enabled], [t0].[FORMAT], [t0].[IsLimit], [t0].[ParamID], [t0].[Parameter_Name], [t0].[Print], [t0].[Unit], [t0].[id1], [t0].[BinDescription], [t0].[BinDescriptionOverride], [t0].[BinNumber], [t0].[BinNumberOverride], [t0].[GroupID], [t0].[ParamID0], [t0].[StageID0], [t0].[id2], [t0].[Name], [t0].[StageNumber], [t0].[id3], [t0].[LevelId], [t0].[Name0], [t0].[OrderingId], [p0].[FatherCedmedId], [p0].[Name], [p0].[ProjectId]
FROM [ProductIdentifiers] AS [p]
LEFT JOIN [main_Projects] AS [m] ON [p].[ProjectId] = [m].[id]
LEFT JOIN [ProductIdentifiers] AS [p0] ON [p].[FatherCedmedId] = [p0].[id]
LEFT JOIN (
SELECT [l].[id], [l].[BinId], [l].[CED_MED], [l].[LSL], [l].[ParameterID], [l].[StageID], [l].[TestType], [l].[USL], [p1].[id] AS [id0], [p1].[Enabled], [p1].[FORMAT], [p1].[IsLimit], [p1].[ParamID], [p1].[Parameter_Name], [p1].[Print], [p1].[Unit], [b].[id] AS [id1], [b].[BinDescription], [b].[BinDescriptionOverride], [b].[BinNumber], [b].[BinNumberOverride], [b].[GroupID], [b].[ParamID] AS [ParamID0], [b].[StageID] AS [StageID0], [s].[id] AS [id2], [s].[Name], [s].[StageNumber], [m0].[id] AS [id3], [m0].[LevelId], [m0].[Name] AS [Name0], [m0].[OrderingId]
FROM [LimitValues] AS [l]
INNER JOIN [Parameters] AS [p1] ON [l].[ParameterID] = [p1].[id]
INNER JOIN [Bins] AS [b] ON [l].[BinId] = [b].[id]
LEFT JOIN [Stages] AS [s] ON [l].[StageID] = [s].[id]
LEFT JOIN [main_TestTypes] AS [m0] ON [l].[TestType] = [m0].[id]
) AS [t] ON [p].[id] = [t].[CED_MED]
LEFT JOIN (
SELECT [c].[id], [c].[BinId], [c].[CED_MED], [c].[ParameterID], [c].[StageID], [c].[TestType], [c].[Value], [p2].[id] AS [id0], [p2].[Enabled], [p2].[FORMAT], [p2].[IsLimit], [p2].[ParamID], [p2].[Parameter_Name], [p2].[Print], [p2].[Unit], [b0].[id] AS [id1], [b0].[BinDescription], [b0].[BinDescriptionOverride], [b0].[BinNumber], [b0].[BinNumberOverride], [b0].[GroupID], [b0].[ParamID] AS [ParamID0], [b0].[StageID] AS [StageID0], [s0].[id] AS [id2], [s0].[Name], [s0].[StageNumber], [m1].[id] AS [id3], [m1].[LevelId], [m1].[Name] AS [Name0], [m1].[OrderingId]
FROM [ConfigValues] AS [c]
INNER JOIN [Parameters] AS [p2] ON [c].[ParameterID] = [p2].[id]
INNER JOIN [Bins] AS [b0] ON [c].[BinId] = [b0].[id]
LEFT JOIN [Stages] AS [s0] ON [c].[StageID] = [s0].[id]
LEFT JOIN [main_TestTypes] AS [m1] ON [c].[TestType] = [m1].[id]
) AS [t0] ON [p].[id] = [t0].[CED_MED]
WHERE [p].[Name] = #__v_0
ORDER BY [p].[id], [m].[id], [p0].[id], [t].[id], [t].[id0], [t].[id1], [t].[id2], [t].[id3], [t0].[id], [t0].[id0], [t0].[id1], [t0].[id2]',N'#__v_0 varchar(255)',#__v_0='NAME_OF_RECORD_FROM_ProductIdentifier_TABLE'
Pay attention to the input in the ORDER BY line.
My question is: how can the query be improved?
The tables contain a lot of data and the query takes a lot of time.
Will retrieval by ID make it faster? Indicates that all the data is relevant for me, which means that it is necessary to join all the tables.
Thanks for any other advice.
for creating separated query and getting better performance use .AsSplitQuery()
I'm having really hard time tuning up one of my Entity Framework generated queries in my application. It is very basic query but for some reason EF uses multiple inner subqueries which seem to perform horribly in DB instead of using joins.
Here's my LINQ code:
Projects.Select(proj => new ProjectViewModel()
{
Name = proj.Name,
Id = proj.Id,
Total = proj.Subvalue.Where(subv =>
subv.Created >= startDate
&& subv.Created <= endDate
&&
(subv.StatusId == 1 ||
subv.StatusId == 2))
.Select(c => c.SubValueSum)
.DefaultIfEmpty()
.Sum()
})
.OrderByDescending(c => c.Total)
.Take(10);
EF generates really complex query with multiple subqueries which has awful query performance like this:
SELECT TOP (10)
[Project3].[Id] AS [Id],
[Project3].[Name] AS [Name],
[Project3].[C1] AS [C1]
FROM ( SELECT
[Project2].[Id] AS [Id],
[Project2].[Name] AS [Name],
[Project2].[C1] AS [C1]
FROM ( SELECT
[Extent1].[Id] AS [Id],
[Extent1].[Name] AS [Name],
(SELECT
SUM([Join1].[A1]) AS [A1]
FROM ( SELECT
CASE WHEN ([Project1].[C1] IS NULL) THEN cast(0 as decimal(18)) ELSE [Project1].[SubValueSum] END AS [A1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable1]
LEFT OUTER JOIN (SELECT
[Extent2].[SubValueSum] AS [SubValueSum],
cast(1 as tinyint) AS [C1]
FROM [dbo].[Subvalue] AS [Extent2]
WHERE ([Extent1].[Id] = [Extent2].[Id]) AND ([Extent2].[Created] >= '2015-08-01') AND ([Extent2].[Created] <= '2015-10-01') AND ([Extent2].[StatusId] IN (1,2)) ) AS [Project1] ON 1 = 1
) AS [Join1]) AS [C1]
FROM [dbo].[Project] AS [Extent1]
WHERE ([Extent1].[ProjectCountryId] = 77) AND ([Extent1].[Active] = 1)
) AS [Project2]
) AS [Project3]
ORDER BY [Project3].[C1] DESC;
The execution time of the query generated by EF is ~10 seconds. But when I write the query by hand like this:
select
TOP (10)
Proj.Id,
Proj.Name,
SUM(Subv.SubValueSum) AS Total
from
SubValue as Subv
left join
Project as Proj on Proj.Id = Subv.ProjectId
where
Subv.Created > '2015-08-01' AND Subv.Created <= '2015-10-01' AND Subv.StatusId IN (1,2)
group by
Proj.Id,
Proj.Name
order by
Total DESC
The execution time is near instant; below 30ms.
The problem clearly lies in my ability to write good EF queries with LINQ but no matter what I try to do (using Linqpad for testing) I just can't write similar performant query with LINQ\EF as I can write by hand. I've trie querying the SubValue table and Project table but the endcome is mostly the same: multiple ineffective nested subqueries instead of a single join doing the work.
How can I write a query which imitates the hand written SQL shown above? How can I control the actual query generated by EF? And most importantly: how can I get Linq2SQL and Entity Framework to use Joins when I want to instead of nested subqueries.
EF generates SQL from the LINQ expression you provide and you cannot expect this conversion to completely unravel the structure of whatever you put into the expression in order to optimize it. In your case you have created an expression tree that for each project will use a navigation property to sum some subvalues related to the project. This results in nested subqueries as you have discovered.
To improve on the generated SQL you need to avoid navigating from project to subvalue before doing all the operations on subvalue and you can do this by creating a join (which is also what you do in you hand crafted SQL):
var query = from proj in context.Project
join s in context.SubValue.Where(s => s.Created >= startDate && s.Created <= endDate && (s.StatusId == 1 || s.StatusId == 2)) on proj.Id equals s.ProjectId into s2
from subv in s2.DefaultIfEmpty()
select new { proj, subv } into x
group x by new { x.proj.Id, x.proj.Name } into g
select new {
g.Key.Id,
g.Key.Name,
Total = g.Select(y => y.subv.SubValueSum).Sum()
} into y
orderby y.Total descending
select y;
var result = query.Take(10);
The basic idea is to join projects on subvalues restricted by a where clause. To perform a left join you need the DefaultIfEmpty() but you already know that.
The joined values (x) are then grouped and the summation of SubValueSum is performed in each group.
Finally, ordering and TOP(10) is applied.
The generated SQL still contains subqueries but I would expect it to more efficient compared to SQL generated by your query:
SELECT TOP (10)
[Project1].[Id] AS [Id],
[Project1].[Name] AS [Name],
[Project1].[C1] AS [C1]
FROM ( SELECT
[GroupBy1].[A1] AS [C1],
[GroupBy1].[K1] AS [Id],
[GroupBy1].[K2] AS [Name]
FROM ( SELECT
[Extent1].[Id] AS [K1],
[Extent1].[Name] AS [K2],
SUM([Extent2].[SubValueSum]) AS [A1]
FROM [dbo].[Project] AS [Extent1]
LEFT OUTER JOIN [dbo].[SubValue] AS [Extent2] ON ([Extent2].[Created] >= #p__linq__0) AND ([Extent2].[Created] <= #p__linq__1) AND ([Extent2].[StatusId] IN (1,2)) AND ([Extent1].[Id] = [Extent2].[ProjectId])
GROUP BY [Extent1].[Id], [Extent1].[Name]
) AS [GroupBy1]
) AS [Project1]
ORDER BY [Project1].[C1] DESC
UPDATE
thanks to #usr I have got this down to ~3 seconds simply by changing
.Select(
log => log.OrderByDescending(
d => d.DateTimeUTC
).FirstOrDefault()
)
to
.Select(
log => log.OrderByDescending(
d => d.Id
).FirstOrDefault()
)
I have a database with two tables - Logs and Collectors - which I am using Entity Framework to read. There are 86 collector records and each one has 50000+ corresponding Log records.
I want to get the most recent log record for each collector which is easily done with this SQL
SELECT CollectorLogModels_1.Status, CollectorLogModels_1.NumericValue,
CollectorLogModels_1.StringValue, CollectorLogModels_1.DateTimeUTC,
CollectorSettingsModels.Target, CollectorSettingsModels.TypeName
FROM
(SELECT CollectorId, MAX(Id) AS Id
FROM CollectorLogModels GROUP BY CollectorId) AS RecentLogs
INNER JOIN CollectorLogModels AS CollectorLogModels_1
ON RecentLogs.Id = CollectorLogModels_1.Id
INNER JOIN CollectorSettingsModels
ON CollectorLogModels_1.CollectorId = CollectorSettingsModels.Id
This takes ~2 seconds to execute.
the closest I have been able to get with LINQ is the following
var logs = context.Logs.Include(co => co.Collector)
.GroupBy(
log => log.CollectorId, log => log
)
.Select(
log => log.OrderByDescending(
d => d.DateTimeUtc
).FirstOrDefault()
)
.Join(
context.Collectors,
(l => l.CollectorId),
(c => c.Id),
(l, c) => new
{
c.Target,
DateTimeUTC = l.DateTimeUtc,
l.Status,
l.StringValue,
CollectorName = c.TypeName
}
).OrderBy(
o => o.Target
).ThenBy(
o => o.CollectorName
)
;
This produces the results I want but takes ~35 seconds to execute.
This becomes the following SQL
SELECT
[Distinct1].[CollectorId] AS [CollectorId],
[Extent3].[Target] AS [Target],
[Limit1].[DateTimeUtc] AS [DateTimeUtc],
[Limit1].[Status] AS [Status],
[Limit1].[StringValue] AS [StringValue],
[Extent3].[TypeName] AS [TypeName]
FROM (SELECT DISTINCT
[Extent1].[CollectorId] AS [CollectorId]
FROM [dbo].[CollectorLogModels] AS [Extent1] ) AS [Distinct1]
OUTER APPLY (SELECT TOP (1) [Project2].[Status] AS [Status], [Project2].[StringValue] AS [StringValue], [Project2].[DateTimeUtc] AS [DateTimeUtc], [Project2].[CollectorId] AS [CollectorId]
FROM ( SELECT
[Extent2].[Status] AS [Status],
[Extent2].[StringValue] AS [StringValue],
[Extent2].[DateTimeUtc] AS [DateTimeUtc],
[Extent2].[CollectorId] AS [CollectorId]
FROM [dbo].[CollectorLogModels] AS [Extent2]
WHERE [Distinct1].[CollectorId] = [Extent2].[CollectorId]
) AS [Project2]
ORDER BY [Project2].[DateTimeUtc] DESC ) AS [Limit1]
INNER JOIN [dbo].[CollectorSettingsModels] AS [Extent3] ON [Limit1].[CollectorId] = [Extent3].[Id]
ORDER BY [Extent3].[Target] ASC, [Extent3].[TypeName] ASC
How can I get performance closer to what is achievable with SQL alone?
In your original SQL you can select a collection DateTimeUTC from a different row than the MAX(ID). That's probably a bug. The EF does not have that problem. It's not semantically identical, it is a harder query.
If you rewrite the EF query to be structurally the same as the SQL query you'll get identical performance. I see nothing here that EF would not support.
Compute the max(id) with EF as well and join on that.
I had the exact same issue, i solved it by adding indexes.
A query of mine would take 45 seconds to complete, i managed to get it completing in less than a second.
In my SQL Server database I have the following hierarchy
Inventory > Datasets > Resources > Renditions > Conformities
where each is a one to many relationship. I wanted to get the id of the three datasets with the most recently updated conformity. Conformity doesn't have its own date but takes the modified date of the parent rendition. I therefore created the following query:
var datasets = _inventoryRepository
.GetConformitiesIncludeAncestors()
.OrderByDescending(conformity => conformity.Rendition.Modified)
.Select(conformity => conformity.Rendition.Resource.DatasetID)
.Distinct()
.Take(3);
GetConformitiesIncludeAncestors is simply returning the conformities with includes as follows:
return _context.Conformities.Include(conformity => conformity.Rendition.Resource.Dataset.Inventory);
but the SQL statement shown when stepping through the code doesn't have an ORDER BY clause.
SELECT
[Limit1].[DatasetID] AS [DatasetID]
FROM ( SELECT DISTINCT TOP (3)
[Extent3].[DatasetID] AS [DatasetID]
FROM [dbo].[Conformity] AS [Extent1]
INNER JOIN [dbo].[Rendition] AS [Extent2] ON [Extent1].[RenditionID] = [Extent2].[ID]
INNER JOIN [dbo].[Resource] AS [Extent3] ON [Extent2].[ResourceID] = [Extent3].[ID]
) AS [Limit1]
Why is OrderByDescending being ignored? Entity Framework version is 6.0.1.
EDIT: I have a workaround that does the trick, but by querying in a different way. I'm still interested in why the OrderByDescending had no effect so will leave open.
My workaround using GroupBy
var datasets = _inventoryRepository
.GetConformitiesIncludeAncestors()
.GroupBy(conformity => conformity.Rendition.Resource.DatasetID)
.OrderByDescending(group => group.Max(conformity => conformity.Rendition.Modified))
.Take(3)
.Select(group => group.Key);
If you remove the Distinct, you should get similar result like this.
var datasets = inventoryRepository
.GetConformitiesIncludeAncestors()
.OrderByDescending(comformity => comformity.Rendition.Modified)
.Select(comformity => comformity.Rendition.Resource.DatasetId)
//.Distinct()
.Take(3)
SELECT TOP (3)
[Extent3].[DatasetId] AS [DatasetId]
FROM [dbo].[Comformities] AS [Extent1]
INNER JOIN [dbo].[Renditions] AS [Extent2] ON [Extent1].[RenditionId] = [Extent2].[Id]
INNER JOIN [dbo].[Resources] AS [Extent3] ON [Extent2].[ResourceId] = [Extent3].[Id]
ORDER BY [Extent2].[Modified] DESC
But after you add the Distinct, it doesn't guarantee the ordering, check the documentation.
The expected behavior is that it returns an unordered sequence of the
unique items in source.