I'm having really hard time tuning up one of my Entity Framework generated queries in my application. It is very basic query but for some reason EF uses multiple inner subqueries which seem to perform horribly in DB instead of using joins.
Here's my LINQ code:
Projects.Select(proj => new ProjectViewModel()
{
Name = proj.Name,
Id = proj.Id,
Total = proj.Subvalue.Where(subv =>
subv.Created >= startDate
&& subv.Created <= endDate
&&
(subv.StatusId == 1 ||
subv.StatusId == 2))
.Select(c => c.SubValueSum)
.DefaultIfEmpty()
.Sum()
})
.OrderByDescending(c => c.Total)
.Take(10);
EF generates really complex query with multiple subqueries which has awful query performance like this:
SELECT TOP (10)
[Project3].[Id] AS [Id],
[Project3].[Name] AS [Name],
[Project3].[C1] AS [C1]
FROM ( SELECT
[Project2].[Id] AS [Id],
[Project2].[Name] AS [Name],
[Project2].[C1] AS [C1]
FROM ( SELECT
[Extent1].[Id] AS [Id],
[Extent1].[Name] AS [Name],
(SELECT
SUM([Join1].[A1]) AS [A1]
FROM ( SELECT
CASE WHEN ([Project1].[C1] IS NULL) THEN cast(0 as decimal(18)) ELSE [Project1].[SubValueSum] END AS [A1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable1]
LEFT OUTER JOIN (SELECT
[Extent2].[SubValueSum] AS [SubValueSum],
cast(1 as tinyint) AS [C1]
FROM [dbo].[Subvalue] AS [Extent2]
WHERE ([Extent1].[Id] = [Extent2].[Id]) AND ([Extent2].[Created] >= '2015-08-01') AND ([Extent2].[Created] <= '2015-10-01') AND ([Extent2].[StatusId] IN (1,2)) ) AS [Project1] ON 1 = 1
) AS [Join1]) AS [C1]
FROM [dbo].[Project] AS [Extent1]
WHERE ([Extent1].[ProjectCountryId] = 77) AND ([Extent1].[Active] = 1)
) AS [Project2]
) AS [Project3]
ORDER BY [Project3].[C1] DESC;
The execution time of the query generated by EF is ~10 seconds. But when I write the query by hand like this:
select
TOP (10)
Proj.Id,
Proj.Name,
SUM(Subv.SubValueSum) AS Total
from
SubValue as Subv
left join
Project as Proj on Proj.Id = Subv.ProjectId
where
Subv.Created > '2015-08-01' AND Subv.Created <= '2015-10-01' AND Subv.StatusId IN (1,2)
group by
Proj.Id,
Proj.Name
order by
Total DESC
The execution time is near instant; below 30ms.
The problem clearly lies in my ability to write good EF queries with LINQ but no matter what I try to do (using Linqpad for testing) I just can't write similar performant query with LINQ\EF as I can write by hand. I've trie querying the SubValue table and Project table but the endcome is mostly the same: multiple ineffective nested subqueries instead of a single join doing the work.
How can I write a query which imitates the hand written SQL shown above? How can I control the actual query generated by EF? And most importantly: how can I get Linq2SQL and Entity Framework to use Joins when I want to instead of nested subqueries.
EF generates SQL from the LINQ expression you provide and you cannot expect this conversion to completely unravel the structure of whatever you put into the expression in order to optimize it. In your case you have created an expression tree that for each project will use a navigation property to sum some subvalues related to the project. This results in nested subqueries as you have discovered.
To improve on the generated SQL you need to avoid navigating from project to subvalue before doing all the operations on subvalue and you can do this by creating a join (which is also what you do in you hand crafted SQL):
var query = from proj in context.Project
join s in context.SubValue.Where(s => s.Created >= startDate && s.Created <= endDate && (s.StatusId == 1 || s.StatusId == 2)) on proj.Id equals s.ProjectId into s2
from subv in s2.DefaultIfEmpty()
select new { proj, subv } into x
group x by new { x.proj.Id, x.proj.Name } into g
select new {
g.Key.Id,
g.Key.Name,
Total = g.Select(y => y.subv.SubValueSum).Sum()
} into y
orderby y.Total descending
select y;
var result = query.Take(10);
The basic idea is to join projects on subvalues restricted by a where clause. To perform a left join you need the DefaultIfEmpty() but you already know that.
The joined values (x) are then grouped and the summation of SubValueSum is performed in each group.
Finally, ordering and TOP(10) is applied.
The generated SQL still contains subqueries but I would expect it to more efficient compared to SQL generated by your query:
SELECT TOP (10)
[Project1].[Id] AS [Id],
[Project1].[Name] AS [Name],
[Project1].[C1] AS [C1]
FROM ( SELECT
[GroupBy1].[A1] AS [C1],
[GroupBy1].[K1] AS [Id],
[GroupBy1].[K2] AS [Name]
FROM ( SELECT
[Extent1].[Id] AS [K1],
[Extent1].[Name] AS [K2],
SUM([Extent2].[SubValueSum]) AS [A1]
FROM [dbo].[Project] AS [Extent1]
LEFT OUTER JOIN [dbo].[SubValue] AS [Extent2] ON ([Extent2].[Created] >= #p__linq__0) AND ([Extent2].[Created] <= #p__linq__1) AND ([Extent2].[StatusId] IN (1,2)) AND ([Extent1].[Id] = [Extent2].[ProjectId])
GROUP BY [Extent1].[Id], [Extent1].[Name]
) AS [GroupBy1]
) AS [Project1]
ORDER BY [Project1].[C1] DESC
Related
I want to get a list of customer IDs and the number of orders placed by that customer. Filter conditions are:
Orders with a total of $10 or less will not be counted.
Customers who did not place at least 3 orders (each with a total of $10 or more) will not be listed.
So, I would do the following in SQL:
SELECT customerID, COUNT(*)
FROM Orders
WHERE orderTotal > 10
GROUP BY customerID
HAVING COUNT(*) > 2
And in EF, I think this could be expressed as:
dbContext.Order
.Where(o => o.orderTotal > 10)
.GroupBy(o => o.customerID)
.Where(g => g.Count() > 2)
.ToList();
But this produces the following SQL that uses a derived table and a join rather than simply using a HAVING clause. I think this would be far from optimal in terms of performance. Is there a better way to formulate the case in EF so that the translated query will use the HAVING clause as it should?
SELECT
[Project1].[C1] AS [C1],
[Project1].[customerID] AS [customerID],
[Project1].[C2] AS [C2],
[Project1].[ID] AS [ID],
FROM ( SELECT
[GroupBy1].[K1] AS [customerID],
1 AS [C1],
[Extent2].[ID] AS [ID],
CASE WHEN ([Extent2].[storeID] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C2]
FROM (SELECT
[Extent1].[customerID] AS [K1],
COUNT(1) AS [A1]
FROM [dbo].[Orders] AS [Extent1]
WHERE [Extent1].[orderTotal] > cast(10 as decimal(18))
GROUP BY [Extent1].[customerID] ) AS [GroupBy1]
LEFT OUTER JOIN [dbo].[Orders] AS [Extent2] ON ([Extent2].[orderTotal] > cast(10 as decimal(18))) AND (([GroupBy1].[K1] = [Extent2].[customerID]) OR (([GroupBy1].[K1] IS NULL) AND ([Extent2].[customerID] IS NULL)))
WHERE [GroupBy1].[A1] > 2
) AS [Project1]
ORDER BY [Project1].[customerID] ASC, [Project1].[C2] ASC
Well, the LINQ to Entities query is not equivalent of the SQL query because it returns a list of groupings (pair of key and matching elements) which has no SQL equivalent at all.
If you return just the customerId and the Count as in the SQL query:
db.Orders
.Where(o => o.orderTotal > 10)
.GroupBy(o => o.customerID)
.Select(g => new { customerId = g.Key, orderCount = g.Count() })
.Where(g => g.Count > 2)
.ToList();
then the SQL generated by EF would be pretty much the same as (or functionally equivalent to) the expected:
SELECT
[GroupBy1].[K1] AS [customerID],
[GroupBy1].[A1] AS [C1]
FROM ( SELECT
[Extent1].[customerID] AS [K1],
COUNT(1) AS [A1]
FROM [dbo].[Orders] AS [Extent1]
WHERE [Extent1].[orderTotal] > cast(10 as decimal(18))
GROUP BY [Extent1].[customerID]
) AS [GroupBy1]
WHERE [GroupBy1].[A1] > 2
I have the following code that should gets some book, and retrieve the first 2 tags (Tag entities) from that book (Book entity).
So Tags is a navigation property of the Book entity.
using (var context = new FakeEndavaBookLibraryEntities())
{
Book firstBook = context.Set<Book>().Take(1).First();
var firstTwoTags = firstBook.Tags.OrderBy(tag => tag.Id).Skip(0).Take(2).ToList();
}
I expect obtaining the following SQL query that has to be generated by EF.
SELECT TOP(2)
[Extent2].[Id] AS [Id],
[Extent2].[Version] AS [Version],
[Extent2].[Name] AS [Name]
FROM [Literature].[BookTagRelation] AS [Extent1]
INNER JOIN [Literature].[Tag] AS [Extent2]
ON [Extent1].[TagId] = [Extent2].[Id]
WHERE [Extent1].[BookId] = 1 /* #EntityKeyValue1 - [BookId] */
Instead, the EF Profiler shows me that the EF is generating unbounded result set (like SELECT * FROM ...)
SELECT [Extent2].[Id] AS [Id],
[Extent2].[Version] AS [Version],
[Extent2].[Name] AS [Name]
FROM [Literature].[BookTagRelation] AS [Extent1]
INNER JOIN [Literature].[Tag] AS [Extent2]
ON [Extent1].[TagId] = [Extent2].[Id]
WHERE [Extent1].[BookId] = 1 /* #EntityKeyValue1 - [BookId] */
Here is a scheme fragment if you need it
I also tried to append the .AsQueryable() to firstBook.Tags property and/or remove .Skip(0) method as is shown below, but this didn't help as well.
var firstTwoTags = firstBook.Tags.AsQueryable().OrderBy(tag => tag.Id).Skip(0).Take(2).ToList();
The same undesired behavior:
SELECT [Extent2].[Id] AS [Id],
[Extent2].[Version] AS [Version],
[Extent2].[Name] AS [Name]
FROM [Literature].[BookTagRelation] AS [Extent1]
INNER JOIN [Literature].[Tag] AS [Extent2]
ON [Extent1].[TagId] = [Extent2].[Id]
WHERE [Extent1].[BookId] = 1 /* #EntityKeyValue1 - [BookId] */
Have you ever encountered the same problem when working with Entity Framework 6?
Are there any workarounds to overcome this problem or I've designed the query in a wrong way...?
Thanks for any tip!
firstBook.Tags is a lazily-loaded IEnumerable<Tag>. On the first access, all tags are loaded, and subsequent attempts to turn it into an IQueryable<Tag> do not work, since you did not start from something from which you could sensibly query.
Instead, start from a known good IQueryable<Tag>. Something along the lines of
Tag firstTag = context.Set<Tag>()
.Where(tag => tag.Books.Contains(firstBook))
.OrderBy(tag => tag.Id).Skip(0).Take(1).SingleOrDefault();
should work. You might need minor tweaking to turn the filter condition into something EF understands.
As #hvd pointed out, I had to work with IQueryable<Tag>, whereas firstBook.Tags navigation property is just a lazy-loaded IEnumerable<Tag>.
So here is the solution of my problem, based on the #hvd's answer.
Tag firstTag = context.Set<Tag>() // or even context.Tags
.Where(tag => tag.Books.Any(book => book.Id == firstBook.Id))
.OrderBy(tag => tag.Id)
.Skip(0).Take(1)
.SingleOrDefault();
So the minor changes of #hvd's solution are: replacing the
.Where(tag => tag.Books.Contains(firstBook)) with
Something that EF understands
1) .Where(tag => tag.Books.Any(book => book.Id == firstBook.Id)).
or
2) .Where(tag => tag.Books.Select(book => book.Id).Contains(firstBook.Id))
Any sequence of code (1) or (2) generates the following SQL query, which is definitely no longer an unbounded result set.
SELECT [Project2].[Id] AS [Id],
[Project2].[Version] AS [Version],
[Project2].[Name] AS [Name]
FROM (SELECT [Extent1].[Id] AS [Id],
[Extent1].[Version] AS [Version],
[Extent1].[Name] AS [Name]
FROM [Literature].[Tag] AS [Extent1]
WHERE EXISTS (SELECT 1 AS [C1]
FROM [Literature].[BookTagRelation] AS [Extent2]
WHERE ([Extent1].[Id] = [Extent2].[TagId])
AND ([Extent2].[BookId] = 1 /* #p__linq__0 */))) AS [Project2]
ORDER BY [Project2].[Id] ASC
OFFSET 0 ROWS
FETCH NEXT 1 ROWS ONLY
I have a list of StudentGrades which contains TermNumber, I want to group them by TermNumber. However, when I look at the results, the ones with null TermNumbers are not returned/grouped.
IQueryable<StudentGradeDm> query = GetListQuery().Where(m => m.StudentId == studentId);
IQueryable<StudentGradeDto> groupedQuery = query.GroupBy(m => m.TermNumber)
.Select(m => new StudentGradeDto
{
TermNumber = m.Key,
StudentGrades = m.ToList()
});
return groupedQuery;
As you can see from the lovely screenshot i've taken here .. there are 3 groups, however, the first group is null because their TermNumber is null. But theoretically, who cares if its null? The StudentGrades with null TermNumber should still be a group.
I understand that one fix to this would be to call
query.ToList().GroupBy ......
But this is not an option for me, as the streamlined application will take a query such as the one above, and feed it into a generic pagination and sort function, in which only a subset of records will be fetched from the database to improve performance.
Any expert inputs would be greatly appreciated!
Update: Here is the generated SQL
{SELECT
[Project2].[C2] AS [C1],
[Project2].[C1] AS [C2],
[Project2].[C3] AS [C3],
[Project2].[Id] AS [Id],
[Project2].[StudentId] AS [StudentId],
[Project2].[Year] AS [Year],
[Project2].[TermTypeId] AS [TermTypeId],
[Project2].[TermNumber] AS [TermNumber],
[Project2].[ClassId] AS [ClassId],
[Project2].[GradeId] AS [GradeId],
[Project2].[FileId] AS [FileId],
[Project2].[CreatedById] AS [CreatedById],
[Project2].[CreatedDate] AS [CreatedDate],
[Project2].[ModifiedById] AS [ModifiedById],
[Project2].[ModifiedDate] AS [ModifiedDate],
[Project2].[Deleted] AS [Deleted]
FROM ( SELECT
[Distinct1].[C1] AS [C1],
1 AS [C2],
[Extent2].[Id] AS [Id],
[Extent2].[StudentId] AS [StudentId],
[Extent2].[Year] AS [Year],
[Extent2].[TermTypeId] AS [TermTypeId],
[Extent2].[TermNumber] AS [TermNumber],
[Extent2].[ClassId] AS [ClassId],
[Extent2].[GradeId] AS [GradeId],
[Extent2].[FileId] AS [FileId],
[Extent2].[CreatedById] AS [CreatedById],
[Extent2].[CreatedDate] AS [CreatedDate],
[Extent2].[ModifiedById] AS [ModifiedById],
[Extent2].[ModifiedDate] AS [ModifiedDate],
[Extent2].[Deleted] AS [Deleted],
CASE WHEN ([Extent2].[Id] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C3]
FROM (SELECT DISTINCT
[Extent1].[TermNumber] AS [C1]
FROM [dbo].[StudentGrade] AS [Extent1]
WHERE ([Extent1].[Deleted] <> 1) AND ([Extent1].[StudentId] = #p__linq__0) ) AS [Distinct1]
LEFT OUTER JOIN [dbo].[StudentGrade] AS [Extent2] ON ([Extent2].[Deleted] <> 1) AND ([Extent2].[StudentId] = #p__linq__0) AND ([Distinct1].[C1] = [Extent2].[TermNumber])
) AS [Project2]
ORDER BY [Project2].[C1] ASC, [Project2].[C3] ASC}
I'm trying to write a simple SQL query in LinQ, and no matter how hard I try, I always get a complex query.
Here is the SQL I am trying to achieve (this is not what I'm getting):
SELECT
ClearingAccounts.ID,
SUM(CASE WHEN Payments.StatusID = 1 THEN Payments.TotalAmount ELSE 0 END) AS Sum1,
SUM(CASE WHEN DirectDebits.StatusID = 2 THEN DirectDebits.TotalAmount ELSE 0 END) AS Sum2,
SUM(CASE WHEN Payments.StatusID = 2 THEN Payments.TotalAmount ELSE 0 END) AS Sum3,
SUM(CASE WHEN DirectDebits.StatusID = 1 THEN DirectDebits.TotalAmount ELSE 0 END) AS Sum4
FROM ClearingAccounts
LEFT JOIN Payments ON Payments.ClearingAccountID = ClearingAccounts.ID
LEFT JOIN DirectDebits ON DirectDebits.ClearingAccountID = ClearingAccounts.ID
GROUP BY ClearingAccounts.ID
Here is the code:
from clearingAccount in clearingAccounts
let payments = clearingAccount.Payments
let directDebits = clearingAccount.DirectDebits
select new
{
ID = clearingAccount.ID,
Sum1 = payments.Sum(p => p.StatusID == 1 ? p.TotalAmount : 0),
Sum2 = directDebits.Sum(p => p.StatusID == 2 ? p.TotalAmount : 0),
Sum3 = payments.Sum(p => p.StatusID == 2 ? p.TotalAmount : 0),
Sum4 = directDebits.Sum(p => p.StatusID == 1 ? p.TotalAmount : 0),
}
The generated query gets the data from the respective table for each sum, so four times. I'm not sure if it's even possible to optimize this?
EDIT Here the is generated query:
SELECT
[Project5].[ID] AS [ID],
[Project5].[C1] AS [C1],
[Project5].[C2] AS [C2],
[Project5].[C3] AS [C3],
[Project5].[C4] AS [C4]
FROM ( SELECT
[Project4].[ID] AS [ID],
[Project4].[C1] AS [C1],
[Project4].[C2] AS [C2],
[Project4].[C3] AS [C3],
(SELECT
SUM([Filter5].[A1]) AS [A1]
FROM ( SELECT
CASE WHEN (1 = [Extent5].[StatusID]) THEN [Extent5].[TotalAmount] ELSE cast(0 as decimal(18)) END AS [A1]
FROM [dbo].[DirectDebits] AS [Extent5]
WHERE [Project4].[ID] = [Extent5].[ClearingAccountID]
) AS [Filter5]) AS [C4]
FROM ( SELECT
[Project3].[ID] AS [ID],
[Project3].[C1] AS [C1],
[Project3].[C2] AS [C2],
(SELECT
SUM([Filter4].[A1]) AS [A1]
FROM ( SELECT
CASE WHEN (2 = [Extent4].[StatusID]) THEN [Extent4].[TotalAmount] ELSE cast(0 as decimal(18)) END AS [A1]
FROM [dbo].[Payments] AS [Extent4]
WHERE [Project3].[ID] = [Extent4].[ClearingAccountID]
) AS [Filter4]) AS [C3]
FROM ( SELECT
[Project2].[ID] AS [ID],
[Project2].[C1] AS [C1],
(SELECT
SUM([Filter3].[A1]) AS [A1]
FROM ( SELECT
CASE WHEN (2 = [Extent3].[StatusID]) THEN [Extent3].[TotalAmount] ELSE cast(0 as decimal(18)) END AS [A1]
FROM [dbo].[DirectDebits] AS [Extent3]
WHERE [Project2].[ID] = [Extent3].[ClearingAccountID]
) AS [Filter3]) AS [C2]
FROM ( SELECT
[Project1].[ID] AS [ID],
(SELECT
SUM([Filter2].[A1]) AS [A1]
FROM ( SELECT
CASE WHEN (1 = [Extent2].[StatusID]) THEN [Extent2].[TotalAmount] ELSE cast(0 as decimal(18)) END AS [A1]
FROM [dbo].[Payments] AS [Extent2]
WHERE [Project1].[ID] = [Extent2].[ClearingAccountID]
) AS [Filter2]) AS [C1]
FROM ( SELECT
[Extent1].[ID] AS [ID]
FROM [dbo].[ClearingAccounts] AS [Extent1]
WHERE ([Extent1].[CustomerID] = 3) AND ([Extent1].[Deleted] <> 1)
) AS [Project1]
) AS [Project2]
) AS [Project3]
) AS [Project4]
) AS [Project5]
Edit
Note that as per #usr's comment, that your original Sql Query is broken. By LEFT OUTER joining on two independent tables, and then grouping on the common join key, as soon as one of the DirectDebits or Payments tables returns more than one row, you will erroneously duplicate the TotalAmount value in the 'other' SUMmed colums (and vice versa). e.g. If a given ClearingAccount has 3 DirectDebits and 4 Payments, you will get a total of 12 rows (whereas you should be summing 3 and 4 rows independently for the two tables). A better Sql Query would be:
WITH ctePayments AS
(
SELECT
ClearingAccounts.ID,
-- Note the ELSE 0 projection isn't required as nulls are eliminated from aggregates
SUM(CASE WHEN Payments.StatusID = 1 THEN Payments.TotalAmount END) AS Sum1,
SUM(CASE WHEN Payments.StatusID = 2 THEN Payments.TotalAmount END) AS Sum3
FROM ClearingAccounts
INNER JOIN Payments ON Payments.ClearingAccountID = ClearingAccounts.ID
GROUP BY ClearingAccounts.ID
),
cteDirectDebits AS
(
SELECT
ClearingAccounts.ID,
SUM(CASE WHEN DirectDebits.StatusID = 2 THEN DirectDebits.TotalAmount END) AS Sum2,
SUM(CASE WHEN DirectDebits.StatusID = 1 THEN DirectDebits.TotalAmount END) AS Sum4
FROM ClearingAccounts
INNER JOIN DirectDebits ON DirectDebits.ClearingAccountID = ClearingAccounts.ID
GROUP BY ClearingAccounts.ID
)
SELECT ca.ID, COALESCE(p.Sum1, 0) AS Sum1, COALESCE(d.Sum2, 0) AS Sum2,
COALESCE(p.Sum3, 0) AS Sum3, COALESCE(d.Sum4, 0) AS Sum4
FROM
ClearingAccounts ca
LEFT OUTER JOIN ctePayments p
ON ca.ID = p.ID
LEFT OUTER JOIN cteDirectDebits d
ON ca.ID = d.ID;
-- GROUP BY not required, since we have already guaranteed at most one row
-- per joined table in the CTE's, assuming ClearingAccounts.ID is unique;
You'll want to fix and test this with test cases before you even contemplate conversion to LINQ.
Old Answer(s)
The Sql construct:
SELECT SUM(CASE WHEN ... THEN 1 ELSE 0 END) AS Something
when applied in a SELECT list, is a common hack 'alternative' to pivot data from the 'greater' select into columns which meet the projection criteria (and hence the zero if not matched) . It isn't really a sum at all, its a 'matched' count.
With regards to optimizing the Sql generated, another alternative would be to materialize the data after joining and grouping (and of course, if there is a predicate WHERE clause, apply that in Sql too via IQueryable), and then do the conditional summation in memory:
var result2 = Db.ClearingAccounts
.Include(c => c.Payments)
.Include(c => c.DirectDebits)
.GroupBy(c => c.Id)
.ToList() // or any other means to force materialization here.
.ToDictionary(
grp => grp.Key,
grp => new
{
PaymentsByStatus = grp.SelectMany(x => x.Payments)
.GroupBy(p => p.StatusId),
DirectDebitByStatus = grp.SelectMany(x => x.Payments)
.GroupBy(p => p.StatusId),
})
.Select(ca => new
{
ID = ca.Key,
Sum1 = ca.Value.PaymentsByStatus.Where(pbs => pbs.Key == 1)
.Select(pbs => pbs.Select(x => x.TotalAmount).Sum()),
Sum2 = ca.Value.DirectDebitByStatus.Where(pbs => pbs.Key == 2)
.Select(ddbs => ddbs.Select(x => x.TotalAmount).Sum()),
Sum3 = ca.Value.PaymentsByStatus.Where(pbs => pbs.Key == 2)
.Select(pbs => pbs.Select(x => x.TotalAmount).Sum()),
Sum4 = ca.Value.DirectDebitByStatus.Where(pbs => pbs.Key == 1)
.Select(ddbs => ddbs.Select(x => x.TotalAmount).Sum())
});
However, personally, I would leave this pivot projection directly in Sql, and then use something like SqlQuery to then deserialize the result back from Sql
directly into the final Entity type.
1)
Add AsNoTracking in EF to avoid tracking changes.
Check that you have indexes on the columns you are using for the JOINs. Especially the column that you are using to group by. Profile the query and optimize it. EF has also overhead over a stored procedure.
or
2) If you cannot find a way to make it as fast as you need, create a stored procedure and call it from EF. Even the same query will be faster.
In my project I use Entity Framework 4.4.0.0 and I have the following dilemma. I have to check if an user is activated. My query looks like:
Any()
_context.Users.Any(u => u.Id == userId && u.IsActivated);
The generated sql is:
SELECT CASE
WHEN ( EXISTS (SELECT 1 AS [C1]
FROM [dbo].[Users] AS [Extent1]
WHERE ( [Extent1].[Id] = #p__linq__0 )
AND ( [Extent1].[IsActivated] = 1 )) ) THEN cast(1 AS BIT)
WHEN ( NOT EXISTS (SELECT 1 AS [C1]
FROM [dbo].[Users] AS [Extent2]
WHERE ( [Extent2].[Id] = #p__linq__0 )
AND ( [Extent2].[IsActivated] = 1 )) ) THEN cast(0 AS BIT)
END AS [C1]
FROM (SELECT 1 AS X) AS [SingleRowTable1]
For Count() I get this query:
SELECT [GroupBy1].[A1] AS [C1]
FROM (SELECT COUNT(1) AS [A1]
FROM [dbo].[Users] AS [Extent1]
WHERE ( [Extent1].[Id] = #p__linq__0 )
AND ( [Extent1].[IsActivated] = 1 )) AS [GroupBy1]
Does this looks right? I am not very good as sql ... but it looks not very efficient to me. Am I missing something?
Is 'select count(*) from dbo.Users where id=#id and IsActivated=1' less efficient?
It depends.
The EXISTS implementation isn't that great either. It will perform the check twice if there are 0 rows. In that case the COUNT one will be better as it only has to search for the non existent row and count it once.
You may find that checking
_context.Users
.Where(u => u.Id == userId && u.IsActivated)
.Select(u=> true)
.FirstOrDefault();
gives a better plan than both (amended following Luke's suggestion). Testing on EF4 the query generated is along the lines of
SELECT TOP (1) cast(1 AS BIT) AS [C1]
FROM Users
WHERE userId = #userId
AND IsActivated = 1
Meaning it doesn't process unnecessary additional rows if more than one exists and only performs the search for rows matching the WHERE once.
Yes it is.
When you perform a count you will select all the entries that match your clause and count then. Using Any() your query will return at a first sign of a registry that match the clause.
I'm my opnion it's always better to use Any() than count(), except when you really need that number