I have a bunch of Services, who references Students (many to one), who references StudentEnrollments (one to many).
When I query these services, it is generating SQL that contains 2 blocks that looks the same which slows down performance. I cannot for the life of me figure out why.
Here is my C# Code (narrowed down):
IQueryable<StudentServiceDm> query = GetListQuery();
List<int> schoolIds = // from front-end: in this case: 20, 21, 22, 23, 89, 90, 93, 95
query = query.Where(m => m.Student.StudentEnrollments.Any(s => schoolIds.Contains(s.SchoolId.Value)));
IQueryable<StudentServiceDto> dtoQuery = query.Select(m => new StudentServiceDto
{
Id = m.Id,
Name = m.Name,
ParentParticipationCount = m.ParentCount,
StudentFirstName = m.Student.FirstName,
StudentLastName = m.Student.LastName,
StudentId = m.StudentId.Value,
StudentServiceType = m.StudentServiceType.Name,
StudentServiceSubType = m.StudentServiceSubType.Name,
Date = m.Date,
DurationInMinutes = m.DurationInMinutes
});
return dtoQuery;
Here is the generated SQL:
SELECT
[Extent1].[Id] AS [Id],
[Extent1].[Name] AS [Name],
[Extent1].[ParentCount] AS [ParentCount],
[Extent2].[FirstName] AS [FirstName],
[Extent2].[LastName] AS [LastName],
[Extent1].[StudentId] AS [StudentId],
[Extent3].[Name] AS [Name1],
[Extent4].[Name] AS [Name2],
[Extent1].[Date] AS [Date],
[Extent1].[DurationInMinutes] AS [DurationInMinutes]
FROM [dbo].[StudentService] AS [Extent1]
LEFT OUTER JOIN [dbo].[Student] AS [Extent2] ON ([Extent2].[Deleted] = 0) AND ([Extent1].[StudentId] = [Extent2].[Id])
LEFT OUTER JOIN [dbo].[StudentServiceType] AS [Extent3] ON ([Extent3].[Deleted] = 0) AND ([Extent1].[StudentServiceTypeId] = [Extent3].[Id])
LEFT OUTER JOIN [dbo].[StudentServiceSubType] AS [Extent4] ON ([Extent4].[Deleted] = 0) AND ([Extent1].[StudentServiceSubTypeId] = [Extent4].[Id])
WHERE ([Extent1].[Deleted] = 0) AND ( EXISTS (SELECT
1 AS [C1]
FROM [dbo].[StudentEnrollment] AS [Extent5]
INNER JOIN [dbo].[Student] AS [Extent6] ON [Extent6].[Id] = [Extent5].[StudentId]
WHERE ([Extent6].[Deleted] = 0) AND ([Extent1].[StudentId] = [Extent6].[Id]) AND ([Extent5].[Deleted] = 0) AND ([Extent5].[SchoolId] IN (20, 21, 22, 23, 89, 90, 93, 95))
)) AND ( EXISTS (SELECT
1 AS [C1]
FROM [dbo].[StudentEnrollment] AS [Extent7]
INNER JOIN [dbo].[Student] AS [Extent8] ON [Extent8].[Id] = [Extent7].[StudentId]
WHERE ([Extent8].[Deleted] = 0) AND ([Extent1].[StudentId] = [Extent8].[Id]) AND ([Extent7].[Deleted] = 0) AND ([Extent7].[SchoolId] IN (20, 21, 22, 23, 89, 90, 93, 95))
))
ORDER BY [Extent1].[Date] DESC, [Extent1].[Id] ASC
OFFSET 10 ROWS FETCH NEXT 10 ROWS ONLY
As you can see, the SQL is doing two boolean blocks (A AND B) where A and B looks exactly the same (with the [extend] suffix being different of course). I think my query is simple enough as to not confuse LINQ to generate such query. Can any expert tell me why this is happening? Or How I can write my query in another way.
Entity Framework makes little attempt to optimize the SQL being generated - quite the opposite in practice. It's meant to be convenient rather than fast.
LINQ and Entity Framework are free, but Windows Azure charges by the second for database access. The slower the queries are, the more money Microsoft makes.
So I'm sure Microsoft are working really, really hard to speed it up for you.
If you need speed but cannot get it from EF, there are options:
Write a SQL stored procedure or SQL view - both can be called from Entity Framework.
Write your own query in SQL and execute it using ADO.NET
Fiddle around with the LINQ query until it speeds up by itself
query = query.Where(m => m.Student.StudentEnrollments.Any(s => schoolIds.Contains(s.SchoolId.Value)));
to
query = query.Where(a => schoolIds.Any(b => a.Student.StudentEnrollments.Select(c => c.SchoolId.Value).Contains(b)));
I flipped the logic and it generates a query that increased the performance. Even though it's longer and not ideal, but at least it is "correct". The first LINQ just for some reason has those 2 duplicated blocks which really kills the performance in this case.
Related
I have a class called Person defined like so:
public class Person {
string Name; // "Joe", "Alex", etc.
string State; // "Minnesota", "Texas", etc.
int Age; // "12", "23", etc.
}
I'm using PostgresQL as my backing database w/ EF Core to bind the person table to the Person class.
I would like to perform a query that returns a List<Person> of the oldest person in each state. For example, if I had:
Person p1 = new{
Name = "Joe",
State = "Minnesota",
Age = 34
};
Person p2 = new{
Name = "Alex",
State = "Minnesota",
Age = 55
};
Person p3 = new{
Name = "George",
State = "Texas",
Age = 62
}
I would like to perform a query that returns p2 and p3 (the order of p2 and p3 within the resultant list doesn't matter).
Right now, I'm using the following query:
List<Person> oldestByState = await _context.Person.GroupBy(x => x.State).Select(x => x.OrderByDescending(y => y.Age).First()).ToListAsync();
The issue with this is that C# doesn't know how to translate it into SQL. I get an error that x.OrderByDescending(y => y.Age) can't be translated.
I know that a quick fix would be to do the grouping on the client, but I'd prefer to run the query on the database by fixing the Linq query.
What query should I use to return a list of the oldest people in each state that is compatible with Linq to PostgresQL translation?
EF Core: 5.0.7
Npgsql: 5.0.6
I tested this with SQL Server and the error received from EF is that "First can only be used as a final operator, try FirstOrDefault." Using FirstOrDefault worked returning the expected oldest row.
await _context.Person
.GroupBy(x => x.State)
.Select(x => x.OrderByDescending(y => y.Age).FirstOrDefault())
.ToListAsync();
If it still doesn't work with PostgreSQL then it may be a limitation of that provider.
Edit:
One possible work-around, pending a fix from the npgsql team, would be to use a raw SQL query. For my test schema which is a bit different, EF for SQL Server produced:
SELECT
[Limit1].[PersonId] AS [PersonId],
[Limit1].[Name] AS [Name],
[Limit1].[State] AS [State],
[Limit1].[Age] AS [Age],
FROM (SELECT DISTINCT
[Extent1].[State] AS [State]
FROM [dbo].[Persons] AS [Extent1] ) AS [Distinct1]
OUTER APPLY (SELECT TOP (1) [Project2].[PersonId] AS [PersonId], [Project2].[Name] AS [Name], [Project2].[Age] AS [Age], [Project2].[State] AS [State]
FROM ( SELECT
[Extent2].[PersonId] AS [PersonId],
[Extent2].[Name] AS [Name],
[Extent2].[Age] AS [Age],
[Extent2].[State] AS [State]
FROM [dbo].[Children] AS [Extent2]
WHERE [Distinct1].[State] = [Extent2].[State]
) AS [Project2]
ORDER BY [Project2].[Age] DESC ) AS [Limit1]
Then executed by:
await _context.Person.FromSqlRaw(#" SELECT
[Limit1].[PersonId] AS [PersonId],
[Limit1].[Name] AS [Name],
[Limit1].[State] AS [State],
[Limit1].[Age] AS [Age],
FROM (SELECT DISTINCT
[Extent1].[State] AS [State]
FROM [dbo].[Persons] AS [Extent1] ) AS [Distinct1]
OUTER APPLY (SELECT TOP (1) [Project2].[PersonId] AS [PersonId], [Project2].[Name] AS [Name], [Project2].[Age] AS [Age], [Project2].[State] AS [State]
FROM ( SELECT
[Extent2].[PersonId] AS [PersonId],
[Extent2].[Name] AS [Name],
[Extent2].[Age] AS [Age],
[Extent2].[State] AS [State]
FROM [dbo].[Children] AS [Extent2]
WHERE [Distinct1].[State] = [Extent2].[State]
) AS [Project2]
ORDER BY [Project2].[Age] DESC ) AS [Limit1]").ToListAsync();
Ugly as &*^, and may not match your schema but if you can write a query that returns the data then provided it returns the entity you want, EF should be able to populate it. This would be a lot more complicated if you are trying to return a more complex object graph. (I.e. Person with many columns and related entities) In those cases I would recommend running two queries, one that does the grouping and selection that just returns PersonId, then use those PersonIds to fetch the relevant Persons with their related data by ID.
I have created a TPH for a project, configured it and everything works nice with EF 6, however I noticed one time the query that is generated when I filter by type:
var existingTransaction = ctx.Transactions.Where(tr => tr.BalanceId == 1 && ids.Contains(tr.ParticipationId) && tr is DepositTransaction);
This generates the folowing query:
SELECT
[Extent1].[TransactionType] AS [TransactionType],
[Extent1].[Id] AS [Id],
[Extent1].[BalanceId] AS [BalanceId],
[Extent1].[ParticipationId] AS [ParticipationId],
[Extent1].[Name] AS [Name],
[Extent1].[TransactionAmount] AS [TransactionAmount]
FROM [dbo].[Transactions] AS [Extent1]
WHERE ([Extent1].[TransactionType] IN (1,2,4,3)) AND (1 = [Extent1].[BalanceId]) AND ([Extent1].[ParticipationId] IN (1, 2, 3, 4, 5)) AND ([Extent1].[TransactionType] = 2)
Notice the part with the transaction type, it contains all the transaction types configured, although the query is semantically correct, it is strange.
When I remove the additional filters and leave only the filter by type:
var existingTransaction = ctx.Transactions.Where(tr => tr is DepositTransaction);
the query looks as expected:
SELECT
'0X0X' AS [C1],
[Extent1].[Id] AS [Id],
[Extent1].[BalanceId] AS [BalanceId],
[Extent1].[ParticipationId] AS [ParticipationId],
[Extent1].[Name] AS [Name],
[Extent1].[TransactionAmount] AS [TransactionAmount]
FROM [dbo].[Transactions] AS [Extent1]
WHERE [Extent1].[TransactionType] = 2
So, why the first one is like that?
I have the following code that should gets some book, and retrieve the first 2 tags (Tag entities) from that book (Book entity).
So Tags is a navigation property of the Book entity.
using (var context = new FakeEndavaBookLibraryEntities())
{
Book firstBook = context.Set<Book>().Take(1).First();
var firstTwoTags = firstBook.Tags.OrderBy(tag => tag.Id).Skip(0).Take(2).ToList();
}
I expect obtaining the following SQL query that has to be generated by EF.
SELECT TOP(2)
[Extent2].[Id] AS [Id],
[Extent2].[Version] AS [Version],
[Extent2].[Name] AS [Name]
FROM [Literature].[BookTagRelation] AS [Extent1]
INNER JOIN [Literature].[Tag] AS [Extent2]
ON [Extent1].[TagId] = [Extent2].[Id]
WHERE [Extent1].[BookId] = 1 /* #EntityKeyValue1 - [BookId] */
Instead, the EF Profiler shows me that the EF is generating unbounded result set (like SELECT * FROM ...)
SELECT [Extent2].[Id] AS [Id],
[Extent2].[Version] AS [Version],
[Extent2].[Name] AS [Name]
FROM [Literature].[BookTagRelation] AS [Extent1]
INNER JOIN [Literature].[Tag] AS [Extent2]
ON [Extent1].[TagId] = [Extent2].[Id]
WHERE [Extent1].[BookId] = 1 /* #EntityKeyValue1 - [BookId] */
Here is a scheme fragment if you need it
I also tried to append the .AsQueryable() to firstBook.Tags property and/or remove .Skip(0) method as is shown below, but this didn't help as well.
var firstTwoTags = firstBook.Tags.AsQueryable().OrderBy(tag => tag.Id).Skip(0).Take(2).ToList();
The same undesired behavior:
SELECT [Extent2].[Id] AS [Id],
[Extent2].[Version] AS [Version],
[Extent2].[Name] AS [Name]
FROM [Literature].[BookTagRelation] AS [Extent1]
INNER JOIN [Literature].[Tag] AS [Extent2]
ON [Extent1].[TagId] = [Extent2].[Id]
WHERE [Extent1].[BookId] = 1 /* #EntityKeyValue1 - [BookId] */
Have you ever encountered the same problem when working with Entity Framework 6?
Are there any workarounds to overcome this problem or I've designed the query in a wrong way...?
Thanks for any tip!
firstBook.Tags is a lazily-loaded IEnumerable<Tag>. On the first access, all tags are loaded, and subsequent attempts to turn it into an IQueryable<Tag> do not work, since you did not start from something from which you could sensibly query.
Instead, start from a known good IQueryable<Tag>. Something along the lines of
Tag firstTag = context.Set<Tag>()
.Where(tag => tag.Books.Contains(firstBook))
.OrderBy(tag => tag.Id).Skip(0).Take(1).SingleOrDefault();
should work. You might need minor tweaking to turn the filter condition into something EF understands.
As #hvd pointed out, I had to work with IQueryable<Tag>, whereas firstBook.Tags navigation property is just a lazy-loaded IEnumerable<Tag>.
So here is the solution of my problem, based on the #hvd's answer.
Tag firstTag = context.Set<Tag>() // or even context.Tags
.Where(tag => tag.Books.Any(book => book.Id == firstBook.Id))
.OrderBy(tag => tag.Id)
.Skip(0).Take(1)
.SingleOrDefault();
So the minor changes of #hvd's solution are: replacing the
.Where(tag => tag.Books.Contains(firstBook)) with
Something that EF understands
1) .Where(tag => tag.Books.Any(book => book.Id == firstBook.Id)).
or
2) .Where(tag => tag.Books.Select(book => book.Id).Contains(firstBook.Id))
Any sequence of code (1) or (2) generates the following SQL query, which is definitely no longer an unbounded result set.
SELECT [Project2].[Id] AS [Id],
[Project2].[Version] AS [Version],
[Project2].[Name] AS [Name]
FROM (SELECT [Extent1].[Id] AS [Id],
[Extent1].[Version] AS [Version],
[Extent1].[Name] AS [Name]
FROM [Literature].[Tag] AS [Extent1]
WHERE EXISTS (SELECT 1 AS [C1]
FROM [Literature].[BookTagRelation] AS [Extent2]
WHERE ([Extent1].[Id] = [Extent2].[TagId])
AND ([Extent2].[BookId] = 1 /* #p__linq__0 */))) AS [Project2]
ORDER BY [Project2].[Id] ASC
OFFSET 0 ROWS
FETCH NEXT 1 ROWS ONLY
I'm having really hard time tuning up one of my Entity Framework generated queries in my application. It is very basic query but for some reason EF uses multiple inner subqueries which seem to perform horribly in DB instead of using joins.
Here's my LINQ code:
Projects.Select(proj => new ProjectViewModel()
{
Name = proj.Name,
Id = proj.Id,
Total = proj.Subvalue.Where(subv =>
subv.Created >= startDate
&& subv.Created <= endDate
&&
(subv.StatusId == 1 ||
subv.StatusId == 2))
.Select(c => c.SubValueSum)
.DefaultIfEmpty()
.Sum()
})
.OrderByDescending(c => c.Total)
.Take(10);
EF generates really complex query with multiple subqueries which has awful query performance like this:
SELECT TOP (10)
[Project3].[Id] AS [Id],
[Project3].[Name] AS [Name],
[Project3].[C1] AS [C1]
FROM ( SELECT
[Project2].[Id] AS [Id],
[Project2].[Name] AS [Name],
[Project2].[C1] AS [C1]
FROM ( SELECT
[Extent1].[Id] AS [Id],
[Extent1].[Name] AS [Name],
(SELECT
SUM([Join1].[A1]) AS [A1]
FROM ( SELECT
CASE WHEN ([Project1].[C1] IS NULL) THEN cast(0 as decimal(18)) ELSE [Project1].[SubValueSum] END AS [A1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable1]
LEFT OUTER JOIN (SELECT
[Extent2].[SubValueSum] AS [SubValueSum],
cast(1 as tinyint) AS [C1]
FROM [dbo].[Subvalue] AS [Extent2]
WHERE ([Extent1].[Id] = [Extent2].[Id]) AND ([Extent2].[Created] >= '2015-08-01') AND ([Extent2].[Created] <= '2015-10-01') AND ([Extent2].[StatusId] IN (1,2)) ) AS [Project1] ON 1 = 1
) AS [Join1]) AS [C1]
FROM [dbo].[Project] AS [Extent1]
WHERE ([Extent1].[ProjectCountryId] = 77) AND ([Extent1].[Active] = 1)
) AS [Project2]
) AS [Project3]
ORDER BY [Project3].[C1] DESC;
The execution time of the query generated by EF is ~10 seconds. But when I write the query by hand like this:
select
TOP (10)
Proj.Id,
Proj.Name,
SUM(Subv.SubValueSum) AS Total
from
SubValue as Subv
left join
Project as Proj on Proj.Id = Subv.ProjectId
where
Subv.Created > '2015-08-01' AND Subv.Created <= '2015-10-01' AND Subv.StatusId IN (1,2)
group by
Proj.Id,
Proj.Name
order by
Total DESC
The execution time is near instant; below 30ms.
The problem clearly lies in my ability to write good EF queries with LINQ but no matter what I try to do (using Linqpad for testing) I just can't write similar performant query with LINQ\EF as I can write by hand. I've trie querying the SubValue table and Project table but the endcome is mostly the same: multiple ineffective nested subqueries instead of a single join doing the work.
How can I write a query which imitates the hand written SQL shown above? How can I control the actual query generated by EF? And most importantly: how can I get Linq2SQL and Entity Framework to use Joins when I want to instead of nested subqueries.
EF generates SQL from the LINQ expression you provide and you cannot expect this conversion to completely unravel the structure of whatever you put into the expression in order to optimize it. In your case you have created an expression tree that for each project will use a navigation property to sum some subvalues related to the project. This results in nested subqueries as you have discovered.
To improve on the generated SQL you need to avoid navigating from project to subvalue before doing all the operations on subvalue and you can do this by creating a join (which is also what you do in you hand crafted SQL):
var query = from proj in context.Project
join s in context.SubValue.Where(s => s.Created >= startDate && s.Created <= endDate && (s.StatusId == 1 || s.StatusId == 2)) on proj.Id equals s.ProjectId into s2
from subv in s2.DefaultIfEmpty()
select new { proj, subv } into x
group x by new { x.proj.Id, x.proj.Name } into g
select new {
g.Key.Id,
g.Key.Name,
Total = g.Select(y => y.subv.SubValueSum).Sum()
} into y
orderby y.Total descending
select y;
var result = query.Take(10);
The basic idea is to join projects on subvalues restricted by a where clause. To perform a left join you need the DefaultIfEmpty() but you already know that.
The joined values (x) are then grouped and the summation of SubValueSum is performed in each group.
Finally, ordering and TOP(10) is applied.
The generated SQL still contains subqueries but I would expect it to more efficient compared to SQL generated by your query:
SELECT TOP (10)
[Project1].[Id] AS [Id],
[Project1].[Name] AS [Name],
[Project1].[C1] AS [C1]
FROM ( SELECT
[GroupBy1].[A1] AS [C1],
[GroupBy1].[K1] AS [Id],
[GroupBy1].[K2] AS [Name]
FROM ( SELECT
[Extent1].[Id] AS [K1],
[Extent1].[Name] AS [K2],
SUM([Extent2].[SubValueSum]) AS [A1]
FROM [dbo].[Project] AS [Extent1]
LEFT OUTER JOIN [dbo].[SubValue] AS [Extent2] ON ([Extent2].[Created] >= #p__linq__0) AND ([Extent2].[Created] <= #p__linq__1) AND ([Extent2].[StatusId] IN (1,2)) AND ([Extent1].[Id] = [Extent2].[ProjectId])
GROUP BY [Extent1].[Id], [Extent1].[Name]
) AS [GroupBy1]
) AS [Project1]
ORDER BY [Project1].[C1] DESC
I've got some very poor performing queries all over the place in my EF6 implemented app. Here is one query that is taking nearly 3000 MS to be performed. (localhost to external sql server)
dash.UserActivities = db.Activities.Include(a => a.Customer).Include(a => a.ActivityType).Where(a => a.AssignedUserId == userId)
.Where(a => a.IsComplete == false).OrderBy(a => a.DueDateTime).Take(10).Select(
a => new ActivityViewModel()
{
Id = a.Id,
CustomerFirstName = a.Customer.FirstName,
CustomerLastName = a.Customer.LastName,
ActivityType = a.ActivityType.Name,
DueDateTime = a.DueDateTime,
}
).ToList();
Clearly something doesn't feel right about this, it is probably something obvious. But I have no clue what it is!
UPDATE
The SQL being generated from this is:
SELECT TOP (10)
[Project1].[C1] AS [C1],
[Project1].[Id] AS [Id],
[Project1].[FirstName] AS [FirstName],
[Project1].[LastName] AS [LastName],
[Project1].[Name] AS [Name],
[Project1].[DueDateTime] AS [DueDateTime]
FROM ( SELECT
[Extent1].[Id] AS [Id],
[Extent1].[DueDateTime] AS [DueDateTime],
[Extent2].[FirstName] AS [FirstName],
[Extent2].[LastName] AS [LastName],
[Extent3].[Name] AS [Name],
1 AS [C1]
FROM [dbo].[Activities] AS [Extent1]
INNER JOIN [dbo].[Customers] AS [Extent2] ON [Extent1].[CustomerId] = [Extent2].[Id]
INNER JOIN [dbo].[ActivityTypes] AS [Extent3] ON [Extent1].[ActivityTypeId] = [Extent3].[Id]
WHERE (0 = [Extent1].[IsComplete]) AND ([Extent1].[AssignedUserId] = 037da3f4-99cc-4338-8b36-491ca0fcfcb1 /* #p__linq__0 */)
) AS [Project1]
ORDER BY [Project1].[DueDateTime] ASC
As #MarcinJuraszek suggested in the comments, we needed to perform these queries locally and audit the performance here. Using tuning adviser, we found many opportunities for improvement.
First thing, turn of LazyLoading in the context if enabled... Performance are bad, really bad. And the include will be disregarded if enabled.
In your example, as you are using some real class in your query, the Select is not done on the SQL Server side, but via C#. That means it will do the join operation on all table, do a "select *" as he does not know what fields he will need, and then map it. Slow, really slow.
dash.UserActivities = db.Activities..Where(a => a.AssignedUserId == userId && a => !a.IsComplete).OrderBy(a => a.DueDateTime).Select( a => new {
Id = a.Id,
CustomerFirstName = a.Customer.FirstName,
CustomerLastName = a.Customer.LastName,
ActivityType = a.ActivityType.Name,
DueDateTime = a.DueDateTime
}).Take(10).Select(
a => new ActivityViewModel()
{
Id = a.Id,
CustomerFirstName = a.CustomerFirstName,
CustomerLastName = a.CustomerLastName ,
ActivityType = a.ActivityType
DueDateTime = a.DueDateTime,
}
).ToList();