How can i improve the performance of this LINQ? - c#

UPDATE
thanks to #usr I have got this down to ~3 seconds simply by changing
.Select(
log => log.OrderByDescending(
d => d.DateTimeUTC
).FirstOrDefault()
)
to
.Select(
log => log.OrderByDescending(
d => d.Id
).FirstOrDefault()
)
I have a database with two tables - Logs and Collectors - which I am using Entity Framework to read. There are 86 collector records and each one has 50000+ corresponding Log records.
I want to get the most recent log record for each collector which is easily done with this SQL
SELECT CollectorLogModels_1.Status, CollectorLogModels_1.NumericValue,
CollectorLogModels_1.StringValue, CollectorLogModels_1.DateTimeUTC,
CollectorSettingsModels.Target, CollectorSettingsModels.TypeName
FROM
(SELECT CollectorId, MAX(Id) AS Id
FROM CollectorLogModels GROUP BY CollectorId) AS RecentLogs
INNER JOIN CollectorLogModels AS CollectorLogModels_1
ON RecentLogs.Id = CollectorLogModels_1.Id
INNER JOIN CollectorSettingsModels
ON CollectorLogModels_1.CollectorId = CollectorSettingsModels.Id
This takes ~2 seconds to execute.
the closest I have been able to get with LINQ is the following
var logs = context.Logs.Include(co => co.Collector)
.GroupBy(
log => log.CollectorId, log => log
)
.Select(
log => log.OrderByDescending(
d => d.DateTimeUtc
).FirstOrDefault()
)
.Join(
context.Collectors,
(l => l.CollectorId),
(c => c.Id),
(l, c) => new
{
c.Target,
DateTimeUTC = l.DateTimeUtc,
l.Status,
l.StringValue,
CollectorName = c.TypeName
}
).OrderBy(
o => o.Target
).ThenBy(
o => o.CollectorName
)
;
This produces the results I want but takes ~35 seconds to execute.
This becomes the following SQL
SELECT
[Distinct1].[CollectorId] AS [CollectorId],
[Extent3].[Target] AS [Target],
[Limit1].[DateTimeUtc] AS [DateTimeUtc],
[Limit1].[Status] AS [Status],
[Limit1].[StringValue] AS [StringValue],
[Extent3].[TypeName] AS [TypeName]
FROM (SELECT DISTINCT
[Extent1].[CollectorId] AS [CollectorId]
FROM [dbo].[CollectorLogModels] AS [Extent1] ) AS [Distinct1]
OUTER APPLY (SELECT TOP (1) [Project2].[Status] AS [Status], [Project2].[StringValue] AS [StringValue], [Project2].[DateTimeUtc] AS [DateTimeUtc], [Project2].[CollectorId] AS [CollectorId]
FROM ( SELECT
[Extent2].[Status] AS [Status],
[Extent2].[StringValue] AS [StringValue],
[Extent2].[DateTimeUtc] AS [DateTimeUtc],
[Extent2].[CollectorId] AS [CollectorId]
FROM [dbo].[CollectorLogModels] AS [Extent2]
WHERE [Distinct1].[CollectorId] = [Extent2].[CollectorId]
) AS [Project2]
ORDER BY [Project2].[DateTimeUtc] DESC ) AS [Limit1]
INNER JOIN [dbo].[CollectorSettingsModels] AS [Extent3] ON [Limit1].[CollectorId] = [Extent3].[Id]
ORDER BY [Extent3].[Target] ASC, [Extent3].[TypeName] ASC
How can I get performance closer to what is achievable with SQL alone?

In your original SQL you can select a collection DateTimeUTC from a different row than the MAX(ID). That's probably a bug. The EF does not have that problem. It's not semantically identical, it is a harder query.
If you rewrite the EF query to be structurally the same as the SQL query you'll get identical performance. I see nothing here that EF would not support.
Compute the max(id) with EF as well and join on that.

I had the exact same issue, i solved it by adding indexes.
A query of mine would take 45 seconds to complete, i managed to get it completing in less than a second.

Related

Entity Framework v6.1 query compilation performance

I am confused how EF LINQ queries are compiled and executed. When I run a piece of program in LINQPad couple of times, I get varied performance results (each time the same query takes different amount of time). Please find below my test execution environment.
tools used: EF v6.1 & LINQPad v5.08.
Ref DB : ContosoUniversity DB downloaded from MSDN.
For queries, I am using Persons, Courses & Departments tables from the above DB; see below.
Now, I have below data:
Query goal: get the second person and associated departments.
Query:
var test = (
from p in Persons
join d in Departments on p.ID equals d.InstructorID
select new {
person = p,
dept = d
}
);
var result = (from pd in test
group pd by pd.person.ID into grp
orderby grp.Key
select new {
ID = grp.Key,
FirstName = grp.First().person.FirstName,
Deps = grp.Where(x => x.dept != null).Select(x => x.dept).Distinct().ToList()
}).Skip(1).Take(1).ToList();
foreach(var r in result)
{
Console.WriteLine("person is..." + r.FirstName);
Console.WriteLine(r.FirstName + "' deps are...");
foreach(var d in r.Deps){
Console.WriteLine(d.Name);
}
}
When I run this I get the result and LINQPad shows time taken value from 3.515 sec to 0.004 sec (depending how much gap I take between different runs).
If I take the generated SQL query and execute it, that query always runs between 0.015 sec to 0.001sec.
Generated query:
-- Region Parameters
DECLARE #p0 Int = 1
DECLARE #p1 Int = 1
-- EndRegion
SELECT [t7].[ID], [t7].[value] AS [FirstName]
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY [t6].[ID]) AS [ROW_NUMBER], [t6].[ID], [t6].[value]
FROM (
SELECT [t2].[ID], (
SELECT [t5].[FirstName]
FROM (
SELECT TOP (1) [t3].[FirstName]
FROM [Person] AS [t3]
INNER JOIN [Department] AS [t4] ON ([t3].[ID]) = [t4]. [InstructorID]
WHERE [t2].[ID] = [t3].[ID]
) AS [t5]
) AS [value]
FROM (
SELECT [t0].[ID]
FROM [Person] AS [t0]
INNER JOIN [Department] AS [t1] ON ([t0].[ID]) = [t1].[InstructorID]
GROUP BY [t0].[ID]
) AS [t2]
) AS [t6]
) AS [t7]
WHERE [t7].[ROW_NUMBER] BETWEEN #p0 + 1 AND #p0 + #p1
ORDER BY [t7].[ROW_NUMBER]
GO
-- Region Parameters
DECLARE #x1 Int = 2
-- EndRegion
SELECT DISTINCT [t1].[DepartmentID], [t1].[Name], [t1].[Budget], [t1]. [StartDate], [t1].[InstructorID], [t1].[RowVersion]
FROM [Person] AS [t0]
INNER JOIN [Department] AS [t1] ON ([t0].[ID]) = [t1].[InstructorID]
WHERE #x1 = [t0].[ID]
My questions:
1) Are those LINQ statements correct? Or can they be optimized?
2) Is the time difference for LINQ query execution normal?
Another different question:
I have modified the first query to execute immediately (called ToList before the second query). This time generated SQL is very simple as shown below (it doesn't look like there is a SQL query for the first LINQ statement with ToList() included):
SELECT [t0].[ID], [t0].[LastName], [t0].[FirstName], [t0].[HireDate], [t0]. [EnrollmentDate], [t0].[Discriminator], [t1].[DepartmentID], [t1].[Name], [t1]. [Budget], [t1].[StartDate], [t1].[InstructorID], [t1].[RowVersion]
FROM [Person] AS [t0]
INNER JOIN [Department] AS [t1] ON ([t0].[ID]) = [t1].[InstructorID]
Running this modified query also took varied amount of time but the difference is not as big as the first query set run.
In my application, there going to be lot of rows and I prefer first query set to second one but I am confused.
Please guide.
(Note: I have a little SQL Server knowledge so, I am using LINQPad to fine tune queries based on the performance)
Thanks

Entity Framework ignoring OrderByDescending

In my SQL Server database I have the following hierarchy
Inventory > Datasets > Resources > Renditions > Conformities
where each is a one to many relationship. I wanted to get the id of the three datasets with the most recently updated conformity. Conformity doesn't have its own date but takes the modified date of the parent rendition. I therefore created the following query:
var datasets = _inventoryRepository
.GetConformitiesIncludeAncestors()
.OrderByDescending(conformity => conformity.Rendition.Modified)
.Select(conformity => conformity.Rendition.Resource.DatasetID)
.Distinct()
.Take(3);
GetConformitiesIncludeAncestors is simply returning the conformities with includes as follows:
return _context.Conformities.Include(conformity => conformity.Rendition.Resource.Dataset.Inventory);
but the SQL statement shown when stepping through the code doesn't have an ORDER BY clause.
SELECT
[Limit1].[DatasetID] AS [DatasetID]
FROM ( SELECT DISTINCT TOP (3)
[Extent3].[DatasetID] AS [DatasetID]
FROM [dbo].[Conformity] AS [Extent1]
INNER JOIN [dbo].[Rendition] AS [Extent2] ON [Extent1].[RenditionID] = [Extent2].[ID]
INNER JOIN [dbo].[Resource] AS [Extent3] ON [Extent2].[ResourceID] = [Extent3].[ID]
) AS [Limit1]
Why is OrderByDescending being ignored? Entity Framework version is 6.0.1.
EDIT: I have a workaround that does the trick, but by querying in a different way. I'm still interested in why the OrderByDescending had no effect so will leave open.
My workaround using GroupBy
var datasets = _inventoryRepository
.GetConformitiesIncludeAncestors()
.GroupBy(conformity => conformity.Rendition.Resource.DatasetID)
.OrderByDescending(group => group.Max(conformity => conformity.Rendition.Modified))
.Take(3)
.Select(group => group.Key);
If you remove the Distinct, you should get similar result like this.
var datasets = inventoryRepository
.GetConformitiesIncludeAncestors()
.OrderByDescending(comformity => comformity.Rendition.Modified)
.Select(comformity => comformity.Rendition.Resource.DatasetId)
//.Distinct()
.Take(3)
SELECT TOP (3)
[Extent3].[DatasetId] AS [DatasetId]
FROM [dbo].[Comformities] AS [Extent1]
INNER JOIN [dbo].[Renditions] AS [Extent2] ON [Extent1].[RenditionId] = [Extent2].[Id]
INNER JOIN [dbo].[Resources] AS [Extent3] ON [Extent2].[ResourceId] = [Extent3].[Id]
ORDER BY [Extent2].[Modified] DESC
But after you add the Distinct, it doesn't guarantee the ordering, check the documentation.
The expected behavior is that it returns an unordered sequence of the
unique items in source.

Linq to SQL | Top 5 Distinct Order by Date

I have an SQL query which I want to call from LINQ to SQL in asp.net application.
SELECT TOP 5 *
FROM (SELECT SongId,
DateInserted,
ROW_NUMBER()
OVER(
PARTITION BY SongId
ORDER BY DateInserted DESC) rn
FROM DownloadHistory) t
WHERE t.rn = 1
ORDER BY DateInserted DESC
I don't know whether its possible or not through linq to sql, if not then please provide any other way around.
I think you'd have to change the SQL partition to a Linq group-by. (Effectively all the partition does is group by song, and select the newest row for each group.) So something like this:
IEnumerable<DownloadHistory> top5Results = DownloadHistory
// group by SongId
.GroupBy(row => row.SongId)
// for each group, select the newest row
.Select(grp =>
grp.OrderByDescending(historyItem => historyItem.DateInserted)
.FirstOrDefault()
)
// get the newest 5 from the results of the newest-1-per-song partition
.OrderByDescending(historyItem => historyItem.DateInserted)
.Take(5);
Although McGarnagle answer solves the problem, but when i see the execution plan for the two queries, it was really amazing to see that linq to sql was really too slow as compare to native sql queries. See the generated query for the above linq to sql:
--It took 99% of the two execution
SELECT TOP (5) [t3].[SongId], [t3].[DateInserted]
FROM (
SELECT [t0].[SongId]
FROM [dbo].[DownloadHistory] AS [t0]
GROUP BY [t0].[SongId]
) AS [t1]
OUTER APPLY (
SELECT TOP (1) [t2].[SongId], [t2].[DateInserted]
FROM [dbo].[DownloadHistory] AS [t2]
WHERE [t1].[SongId] = [t2].[SongId]
ORDER BY [t2].[DateInserted] DESC
) AS [t3]
ORDER BY [t3].[DateInserted] DESC
--It took 1% of the two execution
SELECT TOP 5 t.SongId,t.DateInserted
FROM (SELECT SongId,
DateInserted,
ROW_NUMBER()
OVER(
PARTITION BY SongId
ORDER BY DateInserted DESC) rn
FROM DownloadHistory) t
WHERE t.rn = 1
ORDER BY DateInserted DESC

Using LINQ, what are the ways to get the top x rows of a navigation property?

I have a navigation property, say Items, of an object, say Order. If I want to get a list of Orders, including the Items, I can do something like:
var orders = dbContext.Order.Include(o => i.Items);
This works great, but now I'd like to get only 3 items for each order and I am wondering the best way to accomplish this.
One way is to perform the following:
var orders =
(from o in dbContext.Order
join i in dbContext.Items on o.Id equals i.OrderId
select new { o.Id, i })
.GroupBy(o => o.Id)
.SelectMany(i => i.Take(3))
This works well, although the generated SQL is is bit complex, but I am wondering if there is a more direct (or performant) way.
Thanks,
Eric
var orders = dbContext.Order
.Select(o => new
{
Order = o,
Items = o.Items.Take(3)
})
.AsEnumerable()
.Select(a => a.Order)
.ToList();
This will fill the Order.Items collection with the top 3 items automatically if
you don't disable change tracking (not the case in the query above)
the relationship between Order and Item is not many-to-many (probably not the case because orders and items usually have a one-to-many relationship)
Edit
The generated SQL query is:
SELECT
[Project2].[Id] AS [Id],
[Project2].[C1] AS [C1],
[Project2].[Id1] AS [Id1],
[Project2].[OrderId] AS [OrderId],
FROM (SELECT
[Extent1].[Id] AS [Id],
[Limit1].[Id] AS [Id1],
[Limit1].[OrderId] AS [OrderId],
CASE WHEN ([Limit1].[Id] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C1]
FROM [dbo].[Orders] AS [Extent1]
OUTER APPLY (SELECT TOP (3)
[Extent2].[Id] AS [Id],
[Extent2].[OrderId] AS [OrderId],
FROM [dbo].[Items] AS [Extent2]
WHERE [Extent1].[Id] = [Extent2].[OrderId] ) AS [Limit1]
) AS [Project2]
ORDER BY [Project2].[Id] ASC, [Project2].[C1] ASC
How bad is the performance? If it's tolerable then I'd leave it alone. There's not a straight-forward way to do this in SQL either. Usually you end up with a sub-query that computes the ROW_NUMBER partitioned by your grouping, then returning rows where the row number is less than n.
Since there's not a direct translation of that mechanism to Linq, I'd keep the Linq understandable and not worry about the complexity of the generated SQL unless it's a SIGNIFICANT performance problem.
You could also compare it to the performance of returning ALL items then filtering using Linq-to-Objects.
Another option would be to code this as a stored procedure instead of trying to do it in Linq.
This should generate simple SQL in the form of an OUTER APPLY with top 3 statement to Items. We then have to do some grouping using linq-to-objects, but only the data we need have been brought from the server.
var orders =
(from o in dbContext.Order
from i in (from x in dbContext.Items
where o.Id == x.OrderId
select x).Take(3).DefaultIfEmpty()
select new
{
Order = o,
Item = i
}).AsEnumerable()
.GroupBy(x => x.Order)
.Select(x => new { Order = x.Key, Items = x.Select (y => y.Item ) });
And if you only want the top 3 items per order without the order entity. Will generate CROSS APPLY with top statement in SQL to items.
var items =
from o in dbContext.Order
from i in (from x in dbContext.Items
where o.Id == x.OrderId
select x).Take(3)
select i;

How can I optimise this Linq query to remove the unnecessary SELECT Count(*)

I have three tables, Entity, Period and Result. There is a 1:1 mapping between Entity and Period and a 1:Many between Period and Result.
This is the linq query:
int id = 100;
DateTime start = DateTime.Now;
from p in db.Periods
where p.Entity.ObjectId == id && p.Start == start
select new { Period = p, Results = p.Results })
This is relevant parts of the generated SQL:
SELECT [t0].[EntityId], [t2].[PeriodId], [t2].[Value], (
SELECT COUNT(*)
FROM [dbo].[Result] AS [t3]
WHERE [t3].[PeriodId] = [t0].[Id]
) AS [value2]
FROM [dbo].[Period] AS [t0]
INNER JOIN [dbo].[Entity] AS [t1] ON [t1].[Id] = [t0].[EntityId]
LEFT OUTER JOIN [dbo].[Result] AS [t2] ON [t2].[PeriodId] = [t0].[Id]
WHERE ([t1].[ObjectId] = 100) AND ([t0].[Start] = '2010-02-01 00:00:00')
Where is the SELECT Count(*) coming from and how can I get rid of it? I don't need a count of the "Results" for each "Period" and it slows the query down by an order of magnitude.
Consider using the Context.LoadOptions and specifying for Period to LoadWith(p => p.Results) to eager load the period with results without needing to project into an anonymous type.

Categories

Resources