In my SQL Server database I have the following hierarchy
Inventory > Datasets > Resources > Renditions > Conformities
where each is a one to many relationship. I wanted to get the id of the three datasets with the most recently updated conformity. Conformity doesn't have its own date but takes the modified date of the parent rendition. I therefore created the following query:
var datasets = _inventoryRepository
.GetConformitiesIncludeAncestors()
.OrderByDescending(conformity => conformity.Rendition.Modified)
.Select(conformity => conformity.Rendition.Resource.DatasetID)
.Distinct()
.Take(3);
GetConformitiesIncludeAncestors is simply returning the conformities with includes as follows:
return _context.Conformities.Include(conformity => conformity.Rendition.Resource.Dataset.Inventory);
but the SQL statement shown when stepping through the code doesn't have an ORDER BY clause.
SELECT
[Limit1].[DatasetID] AS [DatasetID]
FROM ( SELECT DISTINCT TOP (3)
[Extent3].[DatasetID] AS [DatasetID]
FROM [dbo].[Conformity] AS [Extent1]
INNER JOIN [dbo].[Rendition] AS [Extent2] ON [Extent1].[RenditionID] = [Extent2].[ID]
INNER JOIN [dbo].[Resource] AS [Extent3] ON [Extent2].[ResourceID] = [Extent3].[ID]
) AS [Limit1]
Why is OrderByDescending being ignored? Entity Framework version is 6.0.1.
EDIT: I have a workaround that does the trick, but by querying in a different way. I'm still interested in why the OrderByDescending had no effect so will leave open.
My workaround using GroupBy
var datasets = _inventoryRepository
.GetConformitiesIncludeAncestors()
.GroupBy(conformity => conformity.Rendition.Resource.DatasetID)
.OrderByDescending(group => group.Max(conformity => conformity.Rendition.Modified))
.Take(3)
.Select(group => group.Key);
If you remove the Distinct, you should get similar result like this.
var datasets = inventoryRepository
.GetConformitiesIncludeAncestors()
.OrderByDescending(comformity => comformity.Rendition.Modified)
.Select(comformity => comformity.Rendition.Resource.DatasetId)
//.Distinct()
.Take(3)
SELECT TOP (3)
[Extent3].[DatasetId] AS [DatasetId]
FROM [dbo].[Comformities] AS [Extent1]
INNER JOIN [dbo].[Renditions] AS [Extent2] ON [Extent1].[RenditionId] = [Extent2].[Id]
INNER JOIN [dbo].[Resources] AS [Extent3] ON [Extent2].[ResourceId] = [Extent3].[Id]
ORDER BY [Extent2].[Modified] DESC
But after you add the Distinct, it doesn't guarantee the ordering, check the documentation.
The expected behavior is that it returns an unordered sequence of the
unique items in source.
Related
I'm trying to create a linq query that gives me a list with the number column from a master table with the count of detail records. My problem is that linq spits out a query without an outer apply which makes the query take 15 seconds. If I create the SQL myself using an outer apply the same query takes less then a second.
TekMas.Select(x => new {x.TekNr,Cnt = x.TekRev.Count})
This creates the following sql
SELECT
[Extent1].[TekMasID] AS [TekMasID],
[Extent1].[TekNr] AS [TekNr],
(SELECT
COUNT(1) AS [A1]
FROM [dbo].[TekRev] AS [Extent2]
WHERE [Extent1].[TekMasID] = [Extent2].[TekMasID]) AS [C1]
FROM [dbo].[TekMas] AS [Extent1]
I'm trying to create the following SQL using linq
SELECT TekMas.TekNr, RevCnt.Cnt
FROM TekMas
OUTER APPLY ( SELECT COUNT (TekRevID) AS Cnt
FROM TekRev
WHERE TekRev.TekMasID = TekMas.TekMasID) RevCnt;
I know that I can create an outer apply by using only the first detail record like this
TekMas.Select(x => new { x.TekNr, Cnt = x.TekRev.FirstOrDefault() })
.Select(x => new { x.TekNr, x.Cnt.TekRevID, x.Cnt.TekRevInf })
This linq create the following SQL result
SELECT
[Extent1].[TekMasID] AS [TekMasID],
[Extent1].[TekNr] AS [TekNr],
[Limit1].[TekRevID] AS [TekRevID],
[Limit1].[TekRevInf] AS [TekRevInf]
FROM [dbo].[TekMas] AS [Extent1]
OUTER APPLY (SELECT TOP (1)
[Extent2].[TekRevID] AS [TekRevID],
[Extent2].[TekRevInf] AS [TekRevInf]
FROM [dbo].[TekRev] AS [Extent2]
WHERE [Extent1].[TekMasID] = [Extent2].[TekMasID] ) AS [Limit1]
Is there a solution to force linq to create an outer apply when using count, just like it does on FirstOrDefault() in the last example
Thanks
Stephen
Just to set the context a little, I'm trying to use queries with mysql that use Late row lookup as shown in this article
https://explainextended.com/2009/10/23/mysql-order-by-limit-performance-late-row-lookups/
but that's a story for another day but the idea is that you do a key search on the table and then join it onto the whole table to force a late row lookup and the problem is coming from my LINQ queries when joined together.
-- Key search query --
Calling Code
IQueryable<int> keySearch = _defaultQueryFactory.Load(ContextEnums.ClientContext, MapEntityToDTO(), whereStatement, clientID).OrderBy(orderBy).Skip(startRow).Take(pageSize).Select(x => x.ID);
Resulting Query
SELECT
`Extent1`.`Sys_InvoiceID`
FROM `tblinvoice` AS `Extent1`
WHERE 3 = `Extent1`.`FK_StatusID`
ORDER BY
`Extent1`.`InvoiceDate` ASC LIMIT 0,430
-- Full Table Search --
Calling Code
IQueryable<InvoiceDTOModel> tableSearch = _defaultQueryFactory.Load(ContextEnums.ClientContext, MapEntityToDTO(), null, clientID, true).OrderBy(orderBy);
Resulting Query
SELECT
`Extent1`.`ID`,
`Extent1`.`C1`,
`Extent1`.`C2`,
`Extent1`.`C3`,
`Extent1`.`C4`,
`Extent1`.`C5`,
`Extent1`.`C6`,
`Extent2`.`SID`,
`Extent2`.`S1,
`Extent2`.`S2`,
`Extent2`.`S3`,
`Extent3`.`EID`,
`Extent3`.`E1`,
`Extent4`.`DID`,
`Extent4`.`D1`,
`Extent4`.`D2`,
`Extent4`.`D3`,
`Extent4`.`D4`,
`Extent4`.`D5`
FROM `tbl1` AS `Extent1` INNER JOIN `tbl2` AS `Extent2` ON `Extent1`.`SID` = `Extent2`.`SID` INNER JOIN `tbl3` AS `Extent3` ON `Extent1`.`EID` = `Extent3`.`EID` LEFT OUTER JOIN `tbl4` AS `Extent4` ON `Extent1`.`ID` = `Extent4`.`DID`
ORDER BY
`Extent1`.`C4` ASC
-- Joining the Two Together --
Calling Code
keySearch.Join(tableSearch, key => key, table => table.ID, (key, table) => table).OrderBy(orderBy).ToListAsync();
Resulting Query
SELECT
`Join3`.`ID`,
`Join3`.`C1`,
`Join3`.`C1`,
`Join3`.`C1`,
`Join3`.`C1`,
`Join3`.`C1`,
`Join3`.`C1`,
`Join3`.`SID`,
`Join3`.`S1,
`Join3`.`S2`,
`Join3`.`S3`,
`Join3`.`EID`,
`Join3`.`E1`,
`Join3`.`DID`,
`Join3`.`D1`,
`Join3`.`D2`,
`Join3`.`D3`,
`Join3`.`D4`,
`Join3`.`D5`
FROM (
`Extent1`.`ID`,
`Extent1`.`C1`,
`Extent1`.`C2`,
`Extent1`.`C3`,
`Extent1`.`C4`,
`Extent1`.`C5`,
`Extent1`.`C6`
FROM `tblinvoice` AS `Extent1`
WHERE 3 = `Extent1`.`EID`
ORDER BY
`Extent1`.`C4` ASC LIMIT 0,430) AS `Limit1` INNER JOIN (SELECT
`Extent1`.`ID`,
`Extent1`.`C1`,
`Extent1`.`C2`,
`Extent1`.`C3`,
`Extent1`.`C4`,
`Extent1`.`C5`,
`Extent1`.`C6`,
`Extent2`.`SID`,
`Extent2`.`S1,
`Extent2`.`S2`,
`Extent2`.`S3`,
`Extent3`.`EID`,
`Extent3`.`E1`,
`Extent4`.`DID`,
`Extent4`.`D1`,
`Extent4`.`D2`,
`Extent4`.`D3`,
`Extent4`.`D4`,
`Extent4`.`D5`
FROM `tbl1` AS `Extent2` INNER JOIN `tbl2` AS `Extent3` ON `Extent2`.`SID` = `Extent3`.`SID` INNER JOIN `tblstatus` AS `Extent4` ON `Extent2`.`EID` = `Extent4`.`EID` LEFT OUTER JOIN `tbl3` AS `Extent5` ON `Extent2`.`ID` = `Extent5`.`DID`) AS `Join3` ON `Limit1`.`ID` = `Join3`.`ID`
ORDER BY
`Join3`.`C4` ASC
Basically the inner select brings back
FROM (
`Extent1`.`ID`,
`Extent1`.`C1`,
`Extent1`.`C2`,
`Extent1`.`C3`,
`Extent1`.`C4`,
`Extent1`.`C5`,
`Extent1`.`C6`
FROM `tblinvoice` AS `Extent1`
WHERE 3 = `Extent1`.`EID`
ORDER BY
`Extent1`.`C4` ASC LIMIT 0,430) AS `Limit1`
Instead of
FROM (
`Extent1`.`ID`,
FROM `tblinvoice` AS `Extent1`
WHERE 3 = `Extent1`.`EID`
ORDER BY
`Extent1`.`C4` ASC LIMIT 0,430) AS `Limit1`
--Note--
The actual query selects around 15 columns, I've just shortened it to this example, it has an effect on the search as the dataset grows in size and it shouldn't be selecting all of the fields but i suspect there's an error in my join.
Any help is much appreciated.
UPDATE
thanks to #usr I have got this down to ~3 seconds simply by changing
.Select(
log => log.OrderByDescending(
d => d.DateTimeUTC
).FirstOrDefault()
)
to
.Select(
log => log.OrderByDescending(
d => d.Id
).FirstOrDefault()
)
I have a database with two tables - Logs and Collectors - which I am using Entity Framework to read. There are 86 collector records and each one has 50000+ corresponding Log records.
I want to get the most recent log record for each collector which is easily done with this SQL
SELECT CollectorLogModels_1.Status, CollectorLogModels_1.NumericValue,
CollectorLogModels_1.StringValue, CollectorLogModels_1.DateTimeUTC,
CollectorSettingsModels.Target, CollectorSettingsModels.TypeName
FROM
(SELECT CollectorId, MAX(Id) AS Id
FROM CollectorLogModels GROUP BY CollectorId) AS RecentLogs
INNER JOIN CollectorLogModels AS CollectorLogModels_1
ON RecentLogs.Id = CollectorLogModels_1.Id
INNER JOIN CollectorSettingsModels
ON CollectorLogModels_1.CollectorId = CollectorSettingsModels.Id
This takes ~2 seconds to execute.
the closest I have been able to get with LINQ is the following
var logs = context.Logs.Include(co => co.Collector)
.GroupBy(
log => log.CollectorId, log => log
)
.Select(
log => log.OrderByDescending(
d => d.DateTimeUtc
).FirstOrDefault()
)
.Join(
context.Collectors,
(l => l.CollectorId),
(c => c.Id),
(l, c) => new
{
c.Target,
DateTimeUTC = l.DateTimeUtc,
l.Status,
l.StringValue,
CollectorName = c.TypeName
}
).OrderBy(
o => o.Target
).ThenBy(
o => o.CollectorName
)
;
This produces the results I want but takes ~35 seconds to execute.
This becomes the following SQL
SELECT
[Distinct1].[CollectorId] AS [CollectorId],
[Extent3].[Target] AS [Target],
[Limit1].[DateTimeUtc] AS [DateTimeUtc],
[Limit1].[Status] AS [Status],
[Limit1].[StringValue] AS [StringValue],
[Extent3].[TypeName] AS [TypeName]
FROM (SELECT DISTINCT
[Extent1].[CollectorId] AS [CollectorId]
FROM [dbo].[CollectorLogModels] AS [Extent1] ) AS [Distinct1]
OUTER APPLY (SELECT TOP (1) [Project2].[Status] AS [Status], [Project2].[StringValue] AS [StringValue], [Project2].[DateTimeUtc] AS [DateTimeUtc], [Project2].[CollectorId] AS [CollectorId]
FROM ( SELECT
[Extent2].[Status] AS [Status],
[Extent2].[StringValue] AS [StringValue],
[Extent2].[DateTimeUtc] AS [DateTimeUtc],
[Extent2].[CollectorId] AS [CollectorId]
FROM [dbo].[CollectorLogModels] AS [Extent2]
WHERE [Distinct1].[CollectorId] = [Extent2].[CollectorId]
) AS [Project2]
ORDER BY [Project2].[DateTimeUtc] DESC ) AS [Limit1]
INNER JOIN [dbo].[CollectorSettingsModels] AS [Extent3] ON [Limit1].[CollectorId] = [Extent3].[Id]
ORDER BY [Extent3].[Target] ASC, [Extent3].[TypeName] ASC
How can I get performance closer to what is achievable with SQL alone?
In your original SQL you can select a collection DateTimeUTC from a different row than the MAX(ID). That's probably a bug. The EF does not have that problem. It's not semantically identical, it is a harder query.
If you rewrite the EF query to be structurally the same as the SQL query you'll get identical performance. I see nothing here that EF would not support.
Compute the max(id) with EF as well and join on that.
I had the exact same issue, i solved it by adding indexes.
A query of mine would take 45 seconds to complete, i managed to get it completing in less than a second.
I am using below LINQ query:
CreateObjectSet<ClientCustomFieldValue>()
.Include(scf => scf.ClientCustomField.CustomField)
.Where(str => str.PassengerTripID == passengerTripID).ToList();
Sql corresponding to this query is(as per sql profiler)
exec sp_executesql
N'SELECT
[Extent1].[ClientCustomFieldValueID] AS [ClientCustomFieldValueID],
[Extent1].[ClientCustomFieldID] AS [ClientCustomFieldID],
[Extent1].[PassengerTripID] AS [PassengerTripID],
[Extent1].[DataValue] AS [DataValue],
[Extent1].[RowVersion] AS [RowVersion],
[Extent1].[LastChangeSecSessionID] AS [LastChangeSecSessionID],
[Extent1].[LastChangeTimeUTC] AS [LastChangeTimeUTC],
[Extent2].[ClientCustomFieldID] AS [ClientCustomFieldID1],
[Extent2].[ClientID] AS [ClientID],
[Extent2].[CustomFieldID] AS [CustomFieldID],
[Extent2].[CustomFieldSourceEnumID] AS [CustomFieldSourceEnumID],
[Extent2].[RequiredFlag] AS [RequiredFlag],
[Extent2].[ValidationRegex] AS [ValidationRegex],
[Extent2].[RowVersion] AS [RowVersion1],
[Extent2].[PassengerTripStopTypeEnumID] AS [PassengerTripStopTypeEnumID],
[Extent2].[LastChangeSecSessionID] AS [LastChangeSecSessionID1],
[Extent2].[LastChangeTimeUTC] AS [LastChangeTimeUTC1],
[Extent4].[CustomFieldID] AS [CustomFieldID1],
[Extent4].[CustomFieldCode] AS [CustomFieldCode],
[Extent4].[Description] AS [Description],
[Extent4].[RowVersion] AS [RowVersion2],
[Extent4].[LastChangeSecSessionID] AS [LastChangeSecSessionID2],
[Extent4].[LastChangeTimeUTC] AS [LastChangeTimeUTC2]
FROM [dbo].[ClientCustomFieldValue] AS [Extent1]
LEFT OUTER JOIN [dbo].[ClientCustomField] AS [Extent2]
ON ([Extent2].[DeleteFlag] = 0)
AND ([Extent1].[ClientCustomFieldID] = [Extent2].[ClientCustomFieldID])
LEFT OUTER JOIN [dbo].[ClientCustomField] AS [Extent3]
ON ([Extent3].[DeleteFlag] = 0)
AND ([Extent1].[ClientCustomFieldID] = [Extent3].[ClientCustomFieldID])
LEFT OUTER JOIN [dbo].[CustomField] AS [Extent4]
ON ([Extent4].[DeleteFlag] = 0)
AND ([Extent3].[CustomFieldID] = [Extent4].[CustomFieldID])
WHERE ([Extent1].[DeleteFlag] = 0)
AND ([Extent1].[PassengerTripID] = #p__linq__0)
',N'#p__linq__0 int',#p__linq__0=96
I would like to know why there are two left join with 'ClientCustomField' table. Kindly help me understand this.
Here is an assumption.
First left join, denoted as Extent2, is for the SELECT clause to retrieve all necessary fields from ClientCustomField table. This would be presented in the query anyway, no matter if there is an Include method call.
Second left join, denoted as Extent3, is to retrieve CustomField table fields. As you can see it is not used anywhere except for the last left join clause that is created specifically for that as it joins everything with the CustomField. That is something produced by the Include call.
Apparently LINQ is not checking what tables where already joined in the query, and processing each of the parts of the query separately it generated two left joins for each of them.
I have a navigation property, say Items, of an object, say Order. If I want to get a list of Orders, including the Items, I can do something like:
var orders = dbContext.Order.Include(o => i.Items);
This works great, but now I'd like to get only 3 items for each order and I am wondering the best way to accomplish this.
One way is to perform the following:
var orders =
(from o in dbContext.Order
join i in dbContext.Items on o.Id equals i.OrderId
select new { o.Id, i })
.GroupBy(o => o.Id)
.SelectMany(i => i.Take(3))
This works well, although the generated SQL is is bit complex, but I am wondering if there is a more direct (or performant) way.
Thanks,
Eric
var orders = dbContext.Order
.Select(o => new
{
Order = o,
Items = o.Items.Take(3)
})
.AsEnumerable()
.Select(a => a.Order)
.ToList();
This will fill the Order.Items collection with the top 3 items automatically if
you don't disable change tracking (not the case in the query above)
the relationship between Order and Item is not many-to-many (probably not the case because orders and items usually have a one-to-many relationship)
Edit
The generated SQL query is:
SELECT
[Project2].[Id] AS [Id],
[Project2].[C1] AS [C1],
[Project2].[Id1] AS [Id1],
[Project2].[OrderId] AS [OrderId],
FROM (SELECT
[Extent1].[Id] AS [Id],
[Limit1].[Id] AS [Id1],
[Limit1].[OrderId] AS [OrderId],
CASE WHEN ([Limit1].[Id] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C1]
FROM [dbo].[Orders] AS [Extent1]
OUTER APPLY (SELECT TOP (3)
[Extent2].[Id] AS [Id],
[Extent2].[OrderId] AS [OrderId],
FROM [dbo].[Items] AS [Extent2]
WHERE [Extent1].[Id] = [Extent2].[OrderId] ) AS [Limit1]
) AS [Project2]
ORDER BY [Project2].[Id] ASC, [Project2].[C1] ASC
How bad is the performance? If it's tolerable then I'd leave it alone. There's not a straight-forward way to do this in SQL either. Usually you end up with a sub-query that computes the ROW_NUMBER partitioned by your grouping, then returning rows where the row number is less than n.
Since there's not a direct translation of that mechanism to Linq, I'd keep the Linq understandable and not worry about the complexity of the generated SQL unless it's a SIGNIFICANT performance problem.
You could also compare it to the performance of returning ALL items then filtering using Linq-to-Objects.
Another option would be to code this as a stored procedure instead of trying to do it in Linq.
This should generate simple SQL in the form of an OUTER APPLY with top 3 statement to Items. We then have to do some grouping using linq-to-objects, but only the data we need have been brought from the server.
var orders =
(from o in dbContext.Order
from i in (from x in dbContext.Items
where o.Id == x.OrderId
select x).Take(3).DefaultIfEmpty()
select new
{
Order = o,
Item = i
}).AsEnumerable()
.GroupBy(x => x.Order)
.Select(x => new { Order = x.Key, Items = x.Select (y => y.Item ) });
And if you only want the top 3 items per order without the order entity. Will generate CROSS APPLY with top statement in SQL to items.
var items =
from o in dbContext.Order
from i in (from x in dbContext.Items
where o.Id == x.OrderId
select x).Take(3)
select i;