LINQ to Entities Query Optimization - Poor Performance - c#

I've got some very poor performing queries all over the place in my EF6 implemented app. Here is one query that is taking nearly 3000 MS to be performed. (localhost to external sql server)
dash.UserActivities = db.Activities.Include(a => a.Customer).Include(a => a.ActivityType).Where(a => a.AssignedUserId == userId)
.Where(a => a.IsComplete == false).OrderBy(a => a.DueDateTime).Take(10).Select(
a => new ActivityViewModel()
{
Id = a.Id,
CustomerFirstName = a.Customer.FirstName,
CustomerLastName = a.Customer.LastName,
ActivityType = a.ActivityType.Name,
DueDateTime = a.DueDateTime,
}
).ToList();
Clearly something doesn't feel right about this, it is probably something obvious. But I have no clue what it is!
UPDATE
The SQL being generated from this is:
SELECT TOP (10)
[Project1].[C1] AS [C1],
[Project1].[Id] AS [Id],
[Project1].[FirstName] AS [FirstName],
[Project1].[LastName] AS [LastName],
[Project1].[Name] AS [Name],
[Project1].[DueDateTime] AS [DueDateTime]
FROM ( SELECT
[Extent1].[Id] AS [Id],
[Extent1].[DueDateTime] AS [DueDateTime],
[Extent2].[FirstName] AS [FirstName],
[Extent2].[LastName] AS [LastName],
[Extent3].[Name] AS [Name],
1 AS [C1]
FROM [dbo].[Activities] AS [Extent1]
INNER JOIN [dbo].[Customers] AS [Extent2] ON [Extent1].[CustomerId] = [Extent2].[Id]
INNER JOIN [dbo].[ActivityTypes] AS [Extent3] ON [Extent1].[ActivityTypeId] = [Extent3].[Id]
WHERE (0 = [Extent1].[IsComplete]) AND ([Extent1].[AssignedUserId] = 037da3f4-99cc-4338-8b36-491ca0fcfcb1 /* #p__linq__0 */)
) AS [Project1]
ORDER BY [Project1].[DueDateTime] ASC

As #MarcinJuraszek suggested in the comments, we needed to perform these queries locally and audit the performance here. Using tuning adviser, we found many opportunities for improvement.

First thing, turn of LazyLoading in the context if enabled... Performance are bad, really bad. And the include will be disregarded if enabled.
In your example, as you are using some real class in your query, the Select is not done on the SQL Server side, but via C#. That means it will do the join operation on all table, do a "select *" as he does not know what fields he will need, and then map it. Slow, really slow.
dash.UserActivities = db.Activities..Where(a => a.AssignedUserId == userId && a => !a.IsComplete).OrderBy(a => a.DueDateTime).Select( a => new {
Id = a.Id,
CustomerFirstName = a.Customer.FirstName,
CustomerLastName = a.Customer.LastName,
ActivityType = a.ActivityType.Name,
DueDateTime = a.DueDateTime
}).Take(10).Select(
a => new ActivityViewModel()
{
Id = a.Id,
CustomerFirstName = a.CustomerFirstName,
CustomerLastName = a.CustomerLastName ,
ActivityType = a.ActivityType
DueDateTime = a.DueDateTime,
}
).ToList();

Related

C# Entity Framework Group By Tie Breaker

I have a class called Person defined like so:
public class Person {
string Name; // "Joe", "Alex", etc.
string State; // "Minnesota", "Texas", etc.
int Age; // "12", "23", etc.
}
I'm using PostgresQL as my backing database w/ EF Core to bind the person table to the Person class.
I would like to perform a query that returns a List<Person> of the oldest person in each state. For example, if I had:
Person p1 = new{
Name = "Joe",
State = "Minnesota",
Age = 34
};
Person p2 = new{
Name = "Alex",
State = "Minnesota",
Age = 55
};
Person p3 = new{
Name = "George",
State = "Texas",
Age = 62
}
I would like to perform a query that returns p2 and p3 (the order of p2 and p3 within the resultant list doesn't matter).
Right now, I'm using the following query:
List<Person> oldestByState = await _context.Person.GroupBy(x => x.State).Select(x => x.OrderByDescending(y => y.Age).First()).ToListAsync();
The issue with this is that C# doesn't know how to translate it into SQL. I get an error that x.OrderByDescending(y => y.Age) can't be translated.
I know that a quick fix would be to do the grouping on the client, but I'd prefer to run the query on the database by fixing the Linq query.
What query should I use to return a list of the oldest people in each state that is compatible with Linq to PostgresQL translation?
EF Core: 5.0.7
Npgsql: 5.0.6
I tested this with SQL Server and the error received from EF is that "First can only be used as a final operator, try FirstOrDefault." Using FirstOrDefault worked returning the expected oldest row.
await _context.Person
.GroupBy(x => x.State)
.Select(x => x.OrderByDescending(y => y.Age).FirstOrDefault())
.ToListAsync();
If it still doesn't work with PostgreSQL then it may be a limitation of that provider.
Edit:
One possible work-around, pending a fix from the npgsql team, would be to use a raw SQL query. For my test schema which is a bit different, EF for SQL Server produced:
SELECT
[Limit1].[PersonId] AS [PersonId],
[Limit1].[Name] AS [Name],
[Limit1].[State] AS [State],
[Limit1].[Age] AS [Age],
FROM (SELECT DISTINCT
[Extent1].[State] AS [State]
FROM [dbo].[Persons] AS [Extent1] ) AS [Distinct1]
OUTER APPLY (SELECT TOP (1) [Project2].[PersonId] AS [PersonId], [Project2].[Name] AS [Name], [Project2].[Age] AS [Age], [Project2].[State] AS [State]
FROM ( SELECT
[Extent2].[PersonId] AS [PersonId],
[Extent2].[Name] AS [Name],
[Extent2].[Age] AS [Age],
[Extent2].[State] AS [State]
FROM [dbo].[Children] AS [Extent2]
WHERE [Distinct1].[State] = [Extent2].[State]
) AS [Project2]
ORDER BY [Project2].[Age] DESC ) AS [Limit1]
Then executed by:
await _context.Person.FromSqlRaw(#" SELECT
[Limit1].[PersonId] AS [PersonId],
[Limit1].[Name] AS [Name],
[Limit1].[State] AS [State],
[Limit1].[Age] AS [Age],
FROM (SELECT DISTINCT
[Extent1].[State] AS [State]
FROM [dbo].[Persons] AS [Extent1] ) AS [Distinct1]
OUTER APPLY (SELECT TOP (1) [Project2].[PersonId] AS [PersonId], [Project2].[Name] AS [Name], [Project2].[Age] AS [Age], [Project2].[State] AS [State]
FROM ( SELECT
[Extent2].[PersonId] AS [PersonId],
[Extent2].[Name] AS [Name],
[Extent2].[Age] AS [Age],
[Extent2].[State] AS [State]
FROM [dbo].[Children] AS [Extent2]
WHERE [Distinct1].[State] = [Extent2].[State]
) AS [Project2]
ORDER BY [Project2].[Age] DESC ) AS [Limit1]").ToListAsync();
Ugly as &*^, and may not match your schema but if you can write a query that returns the data then provided it returns the entity you want, EF should be able to populate it. This would be a lot more complicated if you are trying to return a more complex object graph. (I.e. Person with many columns and related entities) In those cases I would recommend running two queries, one that does the grouping and selection that just returns PersonId, then use those PersonIds to fetch the relevant Persons with their related data by ID.

How to improve LINQ statement to use INNER JOIN in resulting SQL statement?

Assuming the following code that applies filtering logic to a passed on collection.
private IQueryable<Customer> ApplyCustomerFilter(CustomerFilter filter, IQueryable<Customer> customers)
{
...
if (filter.HasProductInBackOrder == true)
{
customers = customers.Where(c => c.Orders.Any(o => o.Products.Any(p => p.Status == ProductStatus.BackOrder)))
}
....
return customers;
}
Results in this SQL statement:
SELECT [Extent1].[CustomerId] AS [CustomerId],
[Extent1].[Status] AS [Status]
FROM [Customers] AS [Extent1]
WHERE
(
EXISTS
(
SELECT 1 AS [C1]
FROM
(
SELECT [Extent3].[OrderId] AS [OrderId]
FROM [Orders] AS [Extent3]
WHERE [Extent1].[CustomerId] = [Extent3].[CustomerId]
) AS [Project1]
WHERE EXISTS
(
SELECT 1 AS [C1]
FROM [Products] AS [Extent4]
WHERE ([Project1].[OrderId] = [Extent4].[OrderId])
AND ([Extent4].[Status] = #p__linq__6)
)
)
)
However, I would like to optimize this by forcing to use INNER JOINS so that the result will be similar to this:
SELECT [Extent1].[CustomerId] AS [CustomerId],
[Extent1].[Status] AS [Status]
FROM [Customers] AS [Extent1]
INNER JOIN [Orders] AS [Extent2] ON [Extent1].[CustomerId] = [Extent2].[CustomerId]
INNER JOIN [Products] AS [Extent3] ON [Extent2].[OrderId] = [Extent3].[OrderId]
WHERE [Extent3].[Status] = #p__linq__6
I've tried multiple approaches, but I was unable to accomplish the desired result. Any suggestions on how to force the correct joins and avoiding subselects?

Entity Framework 6: Skip() & Take() do not generate SQL, instead the result set is filtered after loading into memory. Or am I doing something wrong?

I have the following code that should gets some book, and retrieve the first 2 tags (Tag entities) from that book (Book entity).
So Tags is a navigation property of the Book entity.
using (var context = new FakeEndavaBookLibraryEntities())
{
Book firstBook = context.Set<Book>().Take(1).First();
var firstTwoTags = firstBook.Tags.OrderBy(tag => tag.Id).Skip(0).Take(2).ToList();
}
I expect obtaining the following SQL query that has to be generated by EF.
SELECT TOP(2)
[Extent2].[Id] AS [Id],
[Extent2].[Version] AS [Version],
[Extent2].[Name] AS [Name]
FROM [Literature].[BookTagRelation] AS [Extent1]
INNER JOIN [Literature].[Tag] AS [Extent2]
ON [Extent1].[TagId] = [Extent2].[Id]
WHERE [Extent1].[BookId] = 1 /* #EntityKeyValue1 - [BookId] */
Instead, the EF Profiler shows me that the EF is generating unbounded result set (like SELECT * FROM ...)
SELECT [Extent2].[Id] AS [Id],
[Extent2].[Version] AS [Version],
[Extent2].[Name] AS [Name]
FROM [Literature].[BookTagRelation] AS [Extent1]
INNER JOIN [Literature].[Tag] AS [Extent2]
ON [Extent1].[TagId] = [Extent2].[Id]
WHERE [Extent1].[BookId] = 1 /* #EntityKeyValue1 - [BookId] */
Here is a scheme fragment if you need it
I also tried to append the .AsQueryable() to firstBook.Tags property and/or remove .Skip(0) method as is shown below, but this didn't help as well.
var firstTwoTags = firstBook.Tags.AsQueryable().OrderBy(tag => tag.Id).Skip(0).Take(2).ToList();
The same undesired behavior:
SELECT [Extent2].[Id] AS [Id],
[Extent2].[Version] AS [Version],
[Extent2].[Name] AS [Name]
FROM [Literature].[BookTagRelation] AS [Extent1]
INNER JOIN [Literature].[Tag] AS [Extent2]
ON [Extent1].[TagId] = [Extent2].[Id]
WHERE [Extent1].[BookId] = 1 /* #EntityKeyValue1 - [BookId] */
Have you ever encountered the same problem when working with Entity Framework 6?
Are there any workarounds to overcome this problem or I've designed the query in a wrong way...?
Thanks for any tip!
firstBook.Tags is a lazily-loaded IEnumerable<Tag>. On the first access, all tags are loaded, and subsequent attempts to turn it into an IQueryable<Tag> do not work, since you did not start from something from which you could sensibly query.
Instead, start from a known good IQueryable<Tag>. Something along the lines of
Tag firstTag = context.Set<Tag>()
.Where(tag => tag.Books.Contains(firstBook))
.OrderBy(tag => tag.Id).Skip(0).Take(1).SingleOrDefault();
should work. You might need minor tweaking to turn the filter condition into something EF understands.
As #hvd pointed out, I had to work with IQueryable<Tag>, whereas firstBook.Tags navigation property is just a lazy-loaded IEnumerable<Tag>.
So here is the solution of my problem, based on the #hvd's answer.
Tag firstTag = context.Set<Tag>() // or even context.Tags
.Where(tag => tag.Books.Any(book => book.Id == firstBook.Id))
.OrderBy(tag => tag.Id)
.Skip(0).Take(1)
.SingleOrDefault();
So the minor changes of #hvd's solution are: replacing the
.Where(tag => tag.Books.Contains(firstBook)) with
Something that EF understands
1) .Where(tag => tag.Books.Any(book => book.Id == firstBook.Id)).
or
2) .Where(tag => tag.Books.Select(book => book.Id).Contains(firstBook.Id))
Any sequence of code (1) or (2) generates the following SQL query, which is definitely no longer an unbounded result set.
SELECT [Project2].[Id] AS [Id],
[Project2].[Version] AS [Version],
[Project2].[Name] AS [Name]
FROM (SELECT [Extent1].[Id] AS [Id],
[Extent1].[Version] AS [Version],
[Extent1].[Name] AS [Name]
FROM [Literature].[Tag] AS [Extent1]
WHERE EXISTS (SELECT 1 AS [C1]
FROM [Literature].[BookTagRelation] AS [Extent2]
WHERE ([Extent1].[Id] = [Extent2].[TagId])
AND ([Extent2].[BookId] = 1 /* #p__linq__0 */))) AS [Project2]
ORDER BY [Project2].[Id] ASC
OFFSET 0 ROWS
FETCH NEXT 1 ROWS ONLY

LINQ and Entity Framework - Avoiding subqueries

I'm having really hard time tuning up one of my Entity Framework generated queries in my application. It is very basic query but for some reason EF uses multiple inner subqueries which seem to perform horribly in DB instead of using joins.
Here's my LINQ code:
Projects.Select(proj => new ProjectViewModel()
{
Name = proj.Name,
Id = proj.Id,
Total = proj.Subvalue.Where(subv =>
subv.Created >= startDate
&& subv.Created <= endDate
&&
(subv.StatusId == 1 ||
subv.StatusId == 2))
.Select(c => c.SubValueSum)
.DefaultIfEmpty()
.Sum()
})
.OrderByDescending(c => c.Total)
.Take(10);
EF generates really complex query with multiple subqueries which has awful query performance like this:
SELECT TOP (10)
[Project3].[Id] AS [Id],
[Project3].[Name] AS [Name],
[Project3].[C1] AS [C1]
FROM ( SELECT
[Project2].[Id] AS [Id],
[Project2].[Name] AS [Name],
[Project2].[C1] AS [C1]
FROM ( SELECT
[Extent1].[Id] AS [Id],
[Extent1].[Name] AS [Name],
(SELECT
SUM([Join1].[A1]) AS [A1]
FROM ( SELECT
CASE WHEN ([Project1].[C1] IS NULL) THEN cast(0 as decimal(18)) ELSE [Project1].[SubValueSum] END AS [A1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable1]
LEFT OUTER JOIN (SELECT
[Extent2].[SubValueSum] AS [SubValueSum],
cast(1 as tinyint) AS [C1]
FROM [dbo].[Subvalue] AS [Extent2]
WHERE ([Extent1].[Id] = [Extent2].[Id]) AND ([Extent2].[Created] >= '2015-08-01') AND ([Extent2].[Created] <= '2015-10-01') AND ([Extent2].[StatusId] IN (1,2)) ) AS [Project1] ON 1 = 1
) AS [Join1]) AS [C1]
FROM [dbo].[Project] AS [Extent1]
WHERE ([Extent1].[ProjectCountryId] = 77) AND ([Extent1].[Active] = 1)
) AS [Project2]
) AS [Project3]
ORDER BY [Project3].[C1] DESC;
The execution time of the query generated by EF is ~10 seconds. But when I write the query by hand like this:
select
TOP (10)
Proj.Id,
Proj.Name,
SUM(Subv.SubValueSum) AS Total
from
SubValue as Subv
left join
Project as Proj on Proj.Id = Subv.ProjectId
where
Subv.Created > '2015-08-01' AND Subv.Created <= '2015-10-01' AND Subv.StatusId IN (1,2)
group by
Proj.Id,
Proj.Name
order by
Total DESC
The execution time is near instant; below 30ms.
The problem clearly lies in my ability to write good EF queries with LINQ but no matter what I try to do (using Linqpad for testing) I just can't write similar performant query with LINQ\EF as I can write by hand. I've trie querying the SubValue table and Project table but the endcome is mostly the same: multiple ineffective nested subqueries instead of a single join doing the work.
How can I write a query which imitates the hand written SQL shown above? How can I control the actual query generated by EF? And most importantly: how can I get Linq2SQL and Entity Framework to use Joins when I want to instead of nested subqueries.
EF generates SQL from the LINQ expression you provide and you cannot expect this conversion to completely unravel the structure of whatever you put into the expression in order to optimize it. In your case you have created an expression tree that for each project will use a navigation property to sum some subvalues related to the project. This results in nested subqueries as you have discovered.
To improve on the generated SQL you need to avoid navigating from project to subvalue before doing all the operations on subvalue and you can do this by creating a join (which is also what you do in you hand crafted SQL):
var query = from proj in context.Project
join s in context.SubValue.Where(s => s.Created >= startDate && s.Created <= endDate && (s.StatusId == 1 || s.StatusId == 2)) on proj.Id equals s.ProjectId into s2
from subv in s2.DefaultIfEmpty()
select new { proj, subv } into x
group x by new { x.proj.Id, x.proj.Name } into g
select new {
g.Key.Id,
g.Key.Name,
Total = g.Select(y => y.subv.SubValueSum).Sum()
} into y
orderby y.Total descending
select y;
var result = query.Take(10);
The basic idea is to join projects on subvalues restricted by a where clause. To perform a left join you need the DefaultIfEmpty() but you already know that.
The joined values (x) are then grouped and the summation of SubValueSum is performed in each group.
Finally, ordering and TOP(10) is applied.
The generated SQL still contains subqueries but I would expect it to more efficient compared to SQL generated by your query:
SELECT TOP (10)
[Project1].[Id] AS [Id],
[Project1].[Name] AS [Name],
[Project1].[C1] AS [C1]
FROM ( SELECT
[GroupBy1].[A1] AS [C1],
[GroupBy1].[K1] AS [Id],
[GroupBy1].[K2] AS [Name]
FROM ( SELECT
[Extent1].[Id] AS [K1],
[Extent1].[Name] AS [K2],
SUM([Extent2].[SubValueSum]) AS [A1]
FROM [dbo].[Project] AS [Extent1]
LEFT OUTER JOIN [dbo].[SubValue] AS [Extent2] ON ([Extent2].[Created] >= #p__linq__0) AND ([Extent2].[Created] <= #p__linq__1) AND ([Extent2].[StatusId] IN (1,2)) AND ([Extent1].[Id] = [Extent2].[ProjectId])
GROUP BY [Extent1].[Id], [Extent1].[Name]
) AS [GroupBy1]
) AS [Project1]
ORDER BY [Project1].[C1] DESC

Linq Join with a Group By

Ok, I am trying to replicate the following SQL query into a Linq expression:
SELECT
I.EmployeeNumber,
E.TITLE,
E.FNAM,
E.LNAM
FROM
Incidents I INNER JOIN Employees E ON I.IncidentEmployee = E.EmployeeNumber
GROUP BY
I.EmployeeNumber,
E.TITLE,
E.FNAM,
E.LNAM
Simple enough (or at least I thought):
var query = (from e in contextDB.Employees
join i in contextDB.Incidents on i.IncidentEmployee = e.EmployeeNumber
group e by new { i.IncidentEmployee, e.TITLE, e.FNAM, e.LNAM } into allIncEmps
select new
{
IncEmpNum = allIncEmps.Key.IncidentEmployee
TITLE = allIncEmps.Key.TITLE,
USERFNAM = allIncEmps.Key.FNAM,
USERLNAM = allIncEmps.Key.LNAM
});
But I am not getting back the results I exprected, so I fire up SQL Profiler to see what is being sent down the pipe to SQL Server and this is what I see:
SELECT
[GroupBy1].[A1] AS [C1]
FROM ( SELECT
COUNT(1) AS [A1]
FROM ( SELECT DISTINCT
[Extent2].[IncidentEmployee] AS [IncidentEmployee],
[Extent1].[TITLE] AS [TITLE],
[Extent1].[FNAM] AS [FNAM],
[Extent1].[LNAM] AS [LNAM]
FROM [dbo].[Employees] AS [Extent1]
INNER JOIN [dbo].[INCIDENTS] AS [Extent2] ON ([Extent1].[EmployeeNumber] = [Extent2].[IncidentEmployee]) OR (([Extent1].[EmployeeNumber] IS NULL) AND ([Extent2].[IncidentEmployee] IS NULL))
) AS [Distinct1]
) AS [GroupBy1]
As you can see from the SQL string that was sent toSQL Server none of the fields that I was expecting to be return are being included in the Select clause. What am I doing wrong?
UPDATE
It has been a very long day, I re-ran the code again and now this is the SQL that is being sent down the pipe:
SELECT
[Distinct1].[IncidentEmployee] AS [IncidentEmployee],
[Distinct1].[TITLE] AS [TITLE],
[Distinct1].[FNAM] AS [FNAM],
[Distinct1].[LNAM] AS [LNAM]
FROM ( SELECT DISTINCT
[Extent1].[OFFNUM] AS [OFFNUM],
[Extent1].[TITLE] AS [TITLE],
[Extent1].[FNAM] AS [FNAM],
[Extent1].[LNAM] AS [LNAM]
FROM [dbo].[Employees] AS [Extent1]
INNER JOIN [dbo].[INCIDENTS] AS [Extent2] ON ([Extent1].[EmployeeNumber] = [Extent2].[IncidentEmployee]) OR (([Extent1].[EmployeeNumber] IS NULL) AND ([Extent2].[IncidentEmployee] IS NULL))
) AS [Distinct1]
But I am still not seeing results when I try to loop through the record set
foreach (var emps in query)
{
}
Not sure why the query does not return what it should return, but it occurred to me that since you only query the group key and not any grouped results you've got nothing but a Distinct():
var query =
(from e in contextDB.Employees
join i in contextDB.Incidents on i.IncidentEmployee equals e.EmployeeNumber
select new
{
IncEmpNum = i.IncidentEmployee
TITLE = e.TITLE,
USERFNAM = e.FNAM,
USERLNAM = e.LNAM
}).Distinct();
But EF was smart enough to see this as well and created a DISTINCT query too.
You don't specify which result you expected and in what way the actual result was different, but I really can't see how the grouping can produce a different result than a Distinct.
But how did your code compile? As xeondev noticed: there should be an equals in stead of an = in a join statement. My compiler (:D) does not swallow it otherwise. The generated SQL join is strange too: it also matches records where both joined values are NULL. This makes me suspect that at least one of your keys (i.IncidentEmployee or e.EmployeeNumber) is nullable and you should either use i.IncidentEmployee.Value or e.EmployeeNumber.Value or both.

Categories

Resources