Entity Framework SUM CASE not optimized - c#

I'm trying to write a simple SQL query in LinQ, and no matter how hard I try, I always get a complex query.
Here is the SQL I am trying to achieve (this is not what I'm getting):
SELECT
ClearingAccounts.ID,
SUM(CASE WHEN Payments.StatusID = 1 THEN Payments.TotalAmount ELSE 0 END) AS Sum1,
SUM(CASE WHEN DirectDebits.StatusID = 2 THEN DirectDebits.TotalAmount ELSE 0 END) AS Sum2,
SUM(CASE WHEN Payments.StatusID = 2 THEN Payments.TotalAmount ELSE 0 END) AS Sum3,
SUM(CASE WHEN DirectDebits.StatusID = 1 THEN DirectDebits.TotalAmount ELSE 0 END) AS Sum4
FROM ClearingAccounts
LEFT JOIN Payments ON Payments.ClearingAccountID = ClearingAccounts.ID
LEFT JOIN DirectDebits ON DirectDebits.ClearingAccountID = ClearingAccounts.ID
GROUP BY ClearingAccounts.ID
Here is the code:
from clearingAccount in clearingAccounts
let payments = clearingAccount.Payments
let directDebits = clearingAccount.DirectDebits
select new
{
ID = clearingAccount.ID,
Sum1 = payments.Sum(p => p.StatusID == 1 ? p.TotalAmount : 0),
Sum2 = directDebits.Sum(p => p.StatusID == 2 ? p.TotalAmount : 0),
Sum3 = payments.Sum(p => p.StatusID == 2 ? p.TotalAmount : 0),
Sum4 = directDebits.Sum(p => p.StatusID == 1 ? p.TotalAmount : 0),
}
The generated query gets the data from the respective table for each sum, so four times. I'm not sure if it's even possible to optimize this?
EDIT Here the is generated query:
SELECT
[Project5].[ID] AS [ID],
[Project5].[C1] AS [C1],
[Project5].[C2] AS [C2],
[Project5].[C3] AS [C3],
[Project5].[C4] AS [C4]
FROM ( SELECT
[Project4].[ID] AS [ID],
[Project4].[C1] AS [C1],
[Project4].[C2] AS [C2],
[Project4].[C3] AS [C3],
(SELECT
SUM([Filter5].[A1]) AS [A1]
FROM ( SELECT
CASE WHEN (1 = [Extent5].[StatusID]) THEN [Extent5].[TotalAmount] ELSE cast(0 as decimal(18)) END AS [A1]
FROM [dbo].[DirectDebits] AS [Extent5]
WHERE [Project4].[ID] = [Extent5].[ClearingAccountID]
) AS [Filter5]) AS [C4]
FROM ( SELECT
[Project3].[ID] AS [ID],
[Project3].[C1] AS [C1],
[Project3].[C2] AS [C2],
(SELECT
SUM([Filter4].[A1]) AS [A1]
FROM ( SELECT
CASE WHEN (2 = [Extent4].[StatusID]) THEN [Extent4].[TotalAmount] ELSE cast(0 as decimal(18)) END AS [A1]
FROM [dbo].[Payments] AS [Extent4]
WHERE [Project3].[ID] = [Extent4].[ClearingAccountID]
) AS [Filter4]) AS [C3]
FROM ( SELECT
[Project2].[ID] AS [ID],
[Project2].[C1] AS [C1],
(SELECT
SUM([Filter3].[A1]) AS [A1]
FROM ( SELECT
CASE WHEN (2 = [Extent3].[StatusID]) THEN [Extent3].[TotalAmount] ELSE cast(0 as decimal(18)) END AS [A1]
FROM [dbo].[DirectDebits] AS [Extent3]
WHERE [Project2].[ID] = [Extent3].[ClearingAccountID]
) AS [Filter3]) AS [C2]
FROM ( SELECT
[Project1].[ID] AS [ID],
(SELECT
SUM([Filter2].[A1]) AS [A1]
FROM ( SELECT
CASE WHEN (1 = [Extent2].[StatusID]) THEN [Extent2].[TotalAmount] ELSE cast(0 as decimal(18)) END AS [A1]
FROM [dbo].[Payments] AS [Extent2]
WHERE [Project1].[ID] = [Extent2].[ClearingAccountID]
) AS [Filter2]) AS [C1]
FROM ( SELECT
[Extent1].[ID] AS [ID]
FROM [dbo].[ClearingAccounts] AS [Extent1]
WHERE ([Extent1].[CustomerID] = 3) AND ([Extent1].[Deleted] <> 1)
) AS [Project1]
) AS [Project2]
) AS [Project3]
) AS [Project4]
) AS [Project5]

Edit
Note that as per #usr's comment, that your original Sql Query is broken. By LEFT OUTER joining on two independent tables, and then grouping on the common join key, as soon as one of the DirectDebits or Payments tables returns more than one row, you will erroneously duplicate the TotalAmount value in the 'other' SUMmed colums (and vice versa). e.g. If a given ClearingAccount has 3 DirectDebits and 4 Payments, you will get a total of 12 rows (whereas you should be summing 3 and 4 rows independently for the two tables). A better Sql Query would be:
WITH ctePayments AS
(
SELECT
ClearingAccounts.ID,
-- Note the ELSE 0 projection isn't required as nulls are eliminated from aggregates
SUM(CASE WHEN Payments.StatusID = 1 THEN Payments.TotalAmount END) AS Sum1,
SUM(CASE WHEN Payments.StatusID = 2 THEN Payments.TotalAmount END) AS Sum3
FROM ClearingAccounts
INNER JOIN Payments ON Payments.ClearingAccountID = ClearingAccounts.ID
GROUP BY ClearingAccounts.ID
),
cteDirectDebits AS
(
SELECT
ClearingAccounts.ID,
SUM(CASE WHEN DirectDebits.StatusID = 2 THEN DirectDebits.TotalAmount END) AS Sum2,
SUM(CASE WHEN DirectDebits.StatusID = 1 THEN DirectDebits.TotalAmount END) AS Sum4
FROM ClearingAccounts
INNER JOIN DirectDebits ON DirectDebits.ClearingAccountID = ClearingAccounts.ID
GROUP BY ClearingAccounts.ID
)
SELECT ca.ID, COALESCE(p.Sum1, 0) AS Sum1, COALESCE(d.Sum2, 0) AS Sum2,
COALESCE(p.Sum3, 0) AS Sum3, COALESCE(d.Sum4, 0) AS Sum4
FROM
ClearingAccounts ca
LEFT OUTER JOIN ctePayments p
ON ca.ID = p.ID
LEFT OUTER JOIN cteDirectDebits d
ON ca.ID = d.ID;
-- GROUP BY not required, since we have already guaranteed at most one row
-- per joined table in the CTE's, assuming ClearingAccounts.ID is unique;
You'll want to fix and test this with test cases before you even contemplate conversion to LINQ.
Old Answer(s)
The Sql construct:
SELECT SUM(CASE WHEN ... THEN 1 ELSE 0 END) AS Something
when applied in a SELECT list, is a common hack 'alternative' to pivot data from the 'greater' select into columns which meet the projection criteria (and hence the zero if not matched) . It isn't really a sum at all, its a 'matched' count.
With regards to optimizing the Sql generated, another alternative would be to materialize the data after joining and grouping (and of course, if there is a predicate WHERE clause, apply that in Sql too via IQueryable), and then do the conditional summation in memory:
var result2 = Db.ClearingAccounts
.Include(c => c.Payments)
.Include(c => c.DirectDebits)
.GroupBy(c => c.Id)
.ToList() // or any other means to force materialization here.
.ToDictionary(
grp => grp.Key,
grp => new
{
PaymentsByStatus = grp.SelectMany(x => x.Payments)
.GroupBy(p => p.StatusId),
DirectDebitByStatus = grp.SelectMany(x => x.Payments)
.GroupBy(p => p.StatusId),
})
.Select(ca => new
{
ID = ca.Key,
Sum1 = ca.Value.PaymentsByStatus.Where(pbs => pbs.Key == 1)
.Select(pbs => pbs.Select(x => x.TotalAmount).Sum()),
Sum2 = ca.Value.DirectDebitByStatus.Where(pbs => pbs.Key == 2)
.Select(ddbs => ddbs.Select(x => x.TotalAmount).Sum()),
Sum3 = ca.Value.PaymentsByStatus.Where(pbs => pbs.Key == 2)
.Select(pbs => pbs.Select(x => x.TotalAmount).Sum()),
Sum4 = ca.Value.DirectDebitByStatus.Where(pbs => pbs.Key == 1)
.Select(ddbs => ddbs.Select(x => x.TotalAmount).Sum())
});
However, personally, I would leave this pivot projection directly in Sql, and then use something like SqlQuery to then deserialize the result back from Sql
directly into the final Entity type.

1)
Add AsNoTracking in EF to avoid tracking changes.
Check that you have indexes on the columns you are using for the JOINs. Especially the column that you are using to group by. Profile the query and optimize it. EF has also overhead over a stored procedure.
or
2) If you cannot find a way to make it as fast as you need, create a stored procedure and call it from EF. Even the same query will be faster.

Related

HAVING clause in EF Linq

I want to get a list of customer IDs and the number of orders placed by that customer. Filter conditions are:
Orders with a total of $10 or less will not be counted.
Customers who did not place at least 3 orders (each with a total of $10 or more) will not be listed.
So, I would do the following in SQL:
SELECT customerID, COUNT(*)
FROM Orders
WHERE orderTotal > 10
GROUP BY customerID
HAVING COUNT(*) > 2
And in EF, I think this could be expressed as:
dbContext.Order
.Where(o => o.orderTotal > 10)
.GroupBy(o => o.customerID)
.Where(g => g.Count() > 2)
.ToList();
But this produces the following SQL that uses a derived table and a join rather than simply using a HAVING clause. I think this would be far from optimal in terms of performance. Is there a better way to formulate the case in EF so that the translated query will use the HAVING clause as it should?
SELECT
[Project1].[C1] AS [C1],
[Project1].[customerID] AS [customerID],
[Project1].[C2] AS [C2],
[Project1].[ID] AS [ID],
FROM ( SELECT
[GroupBy1].[K1] AS [customerID],
1 AS [C1],
[Extent2].[ID] AS [ID],
CASE WHEN ([Extent2].[storeID] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C2]
FROM (SELECT
[Extent1].[customerID] AS [K1],
COUNT(1) AS [A1]
FROM [dbo].[Orders] AS [Extent1]
WHERE [Extent1].[orderTotal] > cast(10 as decimal(18))
GROUP BY [Extent1].[customerID] ) AS [GroupBy1]
LEFT OUTER JOIN [dbo].[Orders] AS [Extent2] ON ([Extent2].[orderTotal] > cast(10 as decimal(18))) AND (([GroupBy1].[K1] = [Extent2].[customerID]) OR (([GroupBy1].[K1] IS NULL) AND ([Extent2].[customerID] IS NULL)))
WHERE [GroupBy1].[A1] > 2
) AS [Project1]
ORDER BY [Project1].[customerID] ASC, [Project1].[C2] ASC
Well, the LINQ to Entities query is not equivalent of the SQL query because it returns a list of groupings (pair of key and matching elements) which has no SQL equivalent at all.
If you return just the customerId and the Count as in the SQL query:
db.Orders
.Where(o => o.orderTotal > 10)
.GroupBy(o => o.customerID)
.Select(g => new { customerId = g.Key, orderCount = g.Count() })
.Where(g => g.Count > 2)
.ToList();
then the SQL generated by EF would be pretty much the same as (or functionally equivalent to) the expected:
SELECT
[GroupBy1].[K1] AS [customerID],
[GroupBy1].[A1] AS [C1]
FROM ( SELECT
[Extent1].[customerID] AS [K1],
COUNT(1) AS [A1]
FROM [dbo].[Orders] AS [Extent1]
WHERE [Extent1].[orderTotal] > cast(10 as decimal(18))
GROUP BY [Extent1].[customerID]
) AS [GroupBy1]
WHERE [GroupBy1].[A1] > 2

Improving SQL query generated by LINQ to EF

I have student table and I want to write a Linq query to get multiple counts. The SQL query that Linq generates is too complicated and un-optimized.
Following is definition of my table:
[Id] [int] IDENTITY(1,1) NOT NULL,
[Name] [varchar](100) NULL,
[Age] [int] NULL,
I need to get one count with students with name = test and one count for students with age > 10.
This is one of the query I have tried:
var sql = from st in school.Students
group st by 1 into grp
select new
{
NameCount = grp.Count(k => k.Name == "Test"),
AgeCount = grp.Count(k => k.Age > 5)
};
The SQL query that is generated is:
SELECT
[Limit1].[C1] AS [C1],
[Limit1].[C2] AS [C2],
[Limit1].[C3] AS [C3]
FROM ( SELECT TOP (1)
[Project2].[C1] AS [C1],
[Project2].[C2] AS [C2],
(SELECT
COUNT(1) AS [A1]
FROM [dbo].[Student] AS [Extent3]
WHERE ([Project2].[C1] = 1) AND ([Extent3].[Age] > 5)) AS [C3]
FROM ( SELECT
[Distinct1].[C1] AS [C1],
(SELECT
COUNT(1) AS [A1]
FROM [dbo].[Student] AS [Extent2]
WHERE ([Distinct1].[C1] = 1) AND (N'Test' = [Extent2].[Name])) AS [C2]
FROM ( SELECT DISTINCT
1 AS [C1]
FROM [dbo].[Student] AS [Extent1]
) AS [Distinct1]
) AS [Project2]
) AS [Limit1]
For me this seems to be complex. This can be achieved by following simple query:
select COUNT(CASE WHEN st.Name = 'Test' THEN 1 ELSE 0 END) NameCount,
COUNT(CASE WHEN st.Age > 5 THEN 1 ELSE 0 END) AgeCount from Student st
Is there a way in LINQ with which the SQL query that gets generated will have both the aggregation rather than having it two separate queries joined with nested queries?
From my experience with EF6, conditional Sum (i.e. Sum(condition ? 1 : 0)) is translated much better to SQL than Count with predicate (i.e. Count(condition)):
var query =
from st in school.Students
group st by 1 into grp
select new
{
NameCount = grp.Sum(k => k.Name == "Test" ? 1 : 0),
AgeCount = grp.Sum(k => k.Age > 5 ? 1 : 0)
};
Btw, your SQL example should be using SUM as well. In order to utilize the SQL COUNT which excludes NULLs, it should be ELSE NULL or no ELSE:
select COUNT(CASE WHEN st.Name = 'Test' THEN 1 END) NameCount,
COUNT(CASE WHEN st.Age > 5 THEN 1 END) AgeCount
from Student st
But there is no equivalent LINQ construct for this, hence no way to let EF6 generate such translation. But IMO the SUM is good enough equivalent.
A much simpler query results from not using the unnecessary group by and just querying the table twice in the select:
var sql = from st in school.Students.Take(1)
select new {
NameCount = school.Students.Count(k => k.Name == "Test"),
AgeCount = school.Students.Count(k => k.Age > 5)
};

Linq to Entities complexity

I'm starting with Linq to entities and maybe someone could shed some light.
I have two tables - Vizite (parent table) and AngajatiVizite (child table). I'm using Database first, so I have created the relationship between them using Vizite.Id and AngajatiVizite.IdVizita.
I need to get the rows from Vizite and one more bit field which must be 0 if the DataStart or DataEnd fields are null or count of child records from AngajatiVizite is zero. That's it, if the Vizite has zero subordinate records or any of those Data#### fields is null, the calculated field is 0.
So far so good, the linq I'm using works properly. The syntax I have used is this one:
var list = ctx.Vizite
.OrderBy(p => p.DataEnd != null && p.DataStart != null && p.AngajatiVizite.Count > 0)
.ThenBy(p => p.Data)
.Select(p => new
{
p.Id,
p.Numar,
p.Data,
p.DataStart,
p.DataEnd,
Programat = p.DataEnd != null && p.DataStart != null && p.AngajatiVizite.Count > 0
})
.ToList();
The sql command generated by Linq is extremely complex and I don't understand why it has to be that complex and what's the difference.
What I'm getting from linq is this:
SELECT
[Project6].[Numar] AS [Numar],
[Project6].[Id] AS [Id],
[Project6].[Data] AS [Data],
[Project6].[DataStart] AS [DataStart],
[Project6].[DataEnd] AS [DataEnd],
[Project6].[C2] AS [C1]
FROM ( SELECT
[Project5].[C1] AS [C1],
[Project5].[Id] AS [Id],
[Project5].[Numar] AS [Numar],
[Project5].[Data] AS [Data],
[Project5].[DataStart] AS [DataStart],
[Project5].[DataEnd] AS [DataEnd],
CASE WHEN ([Project5].[C2] > 0) THEN cast(1 as bit) WHEN ( NOT ([Project5].[C3] > 0)) THEN cast(0 as bit) END AS [C2]
FROM ( SELECT
[Project4].[C1] AS [C1],
[Project4].[Id] AS [Id],
[Project4].[Numar] AS [Numar],
[Project4].[Data] AS [Data],
[Project4].[DataStart] AS [DataStart],
[Project4].[DataEnd] AS [DataEnd],
[Project4].[C2] AS [C2],
(SELECT
COUNT(1) AS [A1]
FROM [dbo].[AngajatiVizite] AS [Extent5]
WHERE [Project4].[Id] = [Extent5].[IdVizita]) AS [C3]
FROM ( SELECT
[Project3].[C1] AS [C1],
[Project3].[Id] AS [Id],
[Project3].[Numar] AS [Numar],
[Project3].[Data] AS [Data],
[Project3].[DataStart] AS [DataStart],
[Project3].[DataEnd] AS [DataEnd],
(SELECT
COUNT(1) AS [A1]
FROM [dbo].[AngajatiVizite] AS [Extent4]
WHERE [Project3].[Id] = [Extent4].[IdVizita]) AS [C2]
FROM ( SELECT
CASE WHEN ([Project2].[C1] > 0) THEN cast(1 as bit) WHEN ( NOT ([Project2].[C2] > 0)) THEN cast(0 as bit) END AS [C1],
[Project2].[Id] AS [Id],
[Project2].[Numar] AS [Numar],
[Project2].[Data] AS [Data],
[Project2].[DataStart] AS [DataStart],
[Project2].[DataEnd] AS [DataEnd]
FROM ( SELECT
[Project1].[Id] AS [Id],
[Project1].[Numar] AS [Numar],
[Project1].[Data] AS [Data],
[Project1].[DataStart] AS [DataStart],
[Project1].[DataEnd] AS [DataEnd],
[Project1].[C1] AS [C1],
(SELECT
COUNT(1) AS [A1]
FROM [dbo].[AngajatiVizite] AS [Extent3]
WHERE [Project1].[Id] = [Extent3].[IdVizita]) AS [C2]
FROM ( SELECT
[Extent1].[Id] AS [Id],
[Extent1].[Numar] AS [Numar],
[Extent1].[Data] AS [Data],
[Extent1].[DataStart] AS [DataStart],
[Extent1].[DataEnd] AS [DataEnd],
(SELECT
COUNT(1) AS [A1]
FROM [dbo].[AngajatiVizite] AS [Extent2]
WHERE [Extent1].[Id] = [Extent2].[IdVizita]) AS [C1]
FROM [dbo].[Vizite] AS [Extent1]
) AS [Project1]
) AS [Project2]
) AS [Project3]
) AS [Project4]
) AS [Project5]
) AS [Project6]
when all I needed was actually this:
Select
Vizite.Id
, Vizite.Numar
, Vizite.Data
, Vizite.DataStart
, Vizite.DataEnd
, Case
When DataStart != Null And DataEnd != Null And (Select Count(Id) From AngajatiVizite Where Vizite.Id = AngajatiVizite.IdVizita) > 0 Then 1
Else 0
End As Programat
From Vizite
Order By Programat, Data
Can anyone please explain to me why the generated SQL is that complex that's even almost impossible to figure it out by simply reading the sql syntax?
Thank you
Entity Framework doesn't build handsome queries, that's a fact; and it's a nuisance sometimes, because it can be really hard to trace SQL logging back to LINQ statements.
It would be a problem though, if the query plan optimizer wouldn't know how to handle them. Fortunately, when Sql Server is concerned, the EF team has managed to make SQL better optimizable in each release since EF5. So generally you shouldn't worry about it too much and only start looking into it when performance is worse than can reasonably be expected.
There are some rules of the thumb though. One of them is to calculate computed values only once. This is where the let keyword comes in handy:
var list = (from p in ctx.Vizite
let Programat = p.DataEnd != null && p.DataStart != null
&& p.AngajatiVizite.Count > 0
order by Programat, p.Data
select new
{
p.Id,
p.Numar,
p.Data,
p.DataStart,
p.DataEnd,
Programat
}).ToList();
This works well in LINQ query syntax. In fluent (method) syntax you can do the exact same thing, but that requires two subsequent Select statements.
What happens to the complexity of your SQL statement if you do the following?
var list = ctx.Vizite
.Select(p => new
{
p.Id,
p.Numar,
p.Data,
p.DataStart,
p.DataEnd,
Programat =
p.DataEnd != null && p.DataStart != null && p.AngajatiVizite.Count > 0
}
.OrderBy(p => p.Programat)
.ThenBy(p => p.Data)
.ToList();
My hope is that by not repeating p.DataEnd != null && p.DataStart != null && p.AngajatiVizite.Count > 0 and moving the OrderBy and ThenBy after the select, you'll get a simpler query.
Edit
To potentially simplify the SQL even further, you could opt to do some of the work after obtaining the raw data from the database:
var list = ctx.Vizite
.Select(p => new
{
p.Id,
p.Numar,
p.Data,
p.DataStart,
p.DataEnd,
AngajatiViziteCount = p.AngajatiVizite.Count
}
.AsEnumerable() // do the rest of the work using LINQ to objects
.OrderBy(p => p.DataEnd != null && p.DataStart != null && p.AngajatiViziteCount > 0)
.ThenBy(p => p.Data)
.ToList();

LINQ and Entity Framework - Avoiding subqueries

I'm having really hard time tuning up one of my Entity Framework generated queries in my application. It is very basic query but for some reason EF uses multiple inner subqueries which seem to perform horribly in DB instead of using joins.
Here's my LINQ code:
Projects.Select(proj => new ProjectViewModel()
{
Name = proj.Name,
Id = proj.Id,
Total = proj.Subvalue.Where(subv =>
subv.Created >= startDate
&& subv.Created <= endDate
&&
(subv.StatusId == 1 ||
subv.StatusId == 2))
.Select(c => c.SubValueSum)
.DefaultIfEmpty()
.Sum()
})
.OrderByDescending(c => c.Total)
.Take(10);
EF generates really complex query with multiple subqueries which has awful query performance like this:
SELECT TOP (10)
[Project3].[Id] AS [Id],
[Project3].[Name] AS [Name],
[Project3].[C1] AS [C1]
FROM ( SELECT
[Project2].[Id] AS [Id],
[Project2].[Name] AS [Name],
[Project2].[C1] AS [C1]
FROM ( SELECT
[Extent1].[Id] AS [Id],
[Extent1].[Name] AS [Name],
(SELECT
SUM([Join1].[A1]) AS [A1]
FROM ( SELECT
CASE WHEN ([Project1].[C1] IS NULL) THEN cast(0 as decimal(18)) ELSE [Project1].[SubValueSum] END AS [A1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable1]
LEFT OUTER JOIN (SELECT
[Extent2].[SubValueSum] AS [SubValueSum],
cast(1 as tinyint) AS [C1]
FROM [dbo].[Subvalue] AS [Extent2]
WHERE ([Extent1].[Id] = [Extent2].[Id]) AND ([Extent2].[Created] >= '2015-08-01') AND ([Extent2].[Created] <= '2015-10-01') AND ([Extent2].[StatusId] IN (1,2)) ) AS [Project1] ON 1 = 1
) AS [Join1]) AS [C1]
FROM [dbo].[Project] AS [Extent1]
WHERE ([Extent1].[ProjectCountryId] = 77) AND ([Extent1].[Active] = 1)
) AS [Project2]
) AS [Project3]
ORDER BY [Project3].[C1] DESC;
The execution time of the query generated by EF is ~10 seconds. But when I write the query by hand like this:
select
TOP (10)
Proj.Id,
Proj.Name,
SUM(Subv.SubValueSum) AS Total
from
SubValue as Subv
left join
Project as Proj on Proj.Id = Subv.ProjectId
where
Subv.Created > '2015-08-01' AND Subv.Created <= '2015-10-01' AND Subv.StatusId IN (1,2)
group by
Proj.Id,
Proj.Name
order by
Total DESC
The execution time is near instant; below 30ms.
The problem clearly lies in my ability to write good EF queries with LINQ but no matter what I try to do (using Linqpad for testing) I just can't write similar performant query with LINQ\EF as I can write by hand. I've trie querying the SubValue table and Project table but the endcome is mostly the same: multiple ineffective nested subqueries instead of a single join doing the work.
How can I write a query which imitates the hand written SQL shown above? How can I control the actual query generated by EF? And most importantly: how can I get Linq2SQL and Entity Framework to use Joins when I want to instead of nested subqueries.
EF generates SQL from the LINQ expression you provide and you cannot expect this conversion to completely unravel the structure of whatever you put into the expression in order to optimize it. In your case you have created an expression tree that for each project will use a navigation property to sum some subvalues related to the project. This results in nested subqueries as you have discovered.
To improve on the generated SQL you need to avoid navigating from project to subvalue before doing all the operations on subvalue and you can do this by creating a join (which is also what you do in you hand crafted SQL):
var query = from proj in context.Project
join s in context.SubValue.Where(s => s.Created >= startDate && s.Created <= endDate && (s.StatusId == 1 || s.StatusId == 2)) on proj.Id equals s.ProjectId into s2
from subv in s2.DefaultIfEmpty()
select new { proj, subv } into x
group x by new { x.proj.Id, x.proj.Name } into g
select new {
g.Key.Id,
g.Key.Name,
Total = g.Select(y => y.subv.SubValueSum).Sum()
} into y
orderby y.Total descending
select y;
var result = query.Take(10);
The basic idea is to join projects on subvalues restricted by a where clause. To perform a left join you need the DefaultIfEmpty() but you already know that.
The joined values (x) are then grouped and the summation of SubValueSum is performed in each group.
Finally, ordering and TOP(10) is applied.
The generated SQL still contains subqueries but I would expect it to more efficient compared to SQL generated by your query:
SELECT TOP (10)
[Project1].[Id] AS [Id],
[Project1].[Name] AS [Name],
[Project1].[C1] AS [C1]
FROM ( SELECT
[GroupBy1].[A1] AS [C1],
[GroupBy1].[K1] AS [Id],
[GroupBy1].[K2] AS [Name]
FROM ( SELECT
[Extent1].[Id] AS [K1],
[Extent1].[Name] AS [K2],
SUM([Extent2].[SubValueSum]) AS [A1]
FROM [dbo].[Project] AS [Extent1]
LEFT OUTER JOIN [dbo].[SubValue] AS [Extent2] ON ([Extent2].[Created] >= #p__linq__0) AND ([Extent2].[Created] <= #p__linq__1) AND ([Extent2].[StatusId] IN (1,2)) AND ([Extent1].[Id] = [Extent2].[ProjectId])
GROUP BY [Extent1].[Id], [Extent1].[Name]
) AS [GroupBy1]
) AS [Project1]
ORDER BY [Project1].[C1] DESC

Limit results from multiple individual tables in a single LINQ-to-Entities query. Resultant T-SQL is wrong

I need to query multiple tables with one query, and I need to limit the results from each table individually.
An example ...
I have a ContentItem, Retailer, and Product table.
ContentItem has a Type (int) field that corresponds to an enum of content types like "Retailer" and "Product." I am filtering ContentItem using this field for each sub-subquery.
ContentItem has an Id (pkey) field.
Retailer and Product have an Id (pkey) field. Id is also an FK to ContentItem.Id.
I can select from all three tables with a LEFT JOIN query. From there, I can then limit the total number of rows returned, let's say 6 rows total.
What I want to do is limit the number of rows returned from Retailer and Product individually. This way, I will have 12 rows (max) total: 6 from Retailer, and 6 from Product.
I can already accomplish this with SQL, but I am having a difficult time getting LINQ-to-Entities to "do the right thing."
Here's my SQL
SELECT * From
(
(SELECT * FROM (SELECT * FROM [dbo].[ContentItem] WHERE Type = 0 ORDER BY ContentItem.mtime OFFSET 0 ROWS FETCH NEXT 6 ROWS ONLY) Retailers)
UNION ALL
(SELECT * FROM (SELECT * FROM [dbo].[ContentItem] WHERE Type = 1 ORDER BY ContentItem.mtime OFFSET 0 ROWS FETCH NEXT 6 ROWS ONLY) Brands)
UNION ALL
(SELECT * FROM (SELECT * FROM [dbo].[ContentItem] WHERE Type = 2 ORDER BY ContentItem.mtime OFFSET 0 ROWS FETCH NEXT 6 ROWS ONLY) Products)
UNION ALL
(SELECT * FROM (SELECT * FROM [dbo].[ContentItem] WHERE Type = 3 ORDER BY ContentItem.mtime OFFSET 0 ROWS FETCH NEXT 6 ROWS ONLY) Certifications)
UNION ALL
(SELECT * FROM (SELECT * FROM [dbo].[ContentItem] WHERE Type = 4 ORDER BY ContentItem.mtime OFFSET 0 ROWS FETCH NEXT 6 ROWS ONLY) Claims)
) as ContentItem
LEFT JOIN [dbo].[Retailer] ON (Retailer.Id = ContentItem.Id)
LEFT JOIN [dbo].[Brand] ON (Brand.Id = ContentItem.Id)
LEFT JOIN [dbo].[Product] ON (Product.Id = ContentItem.Id)
LEFT JOIN [dbo].[Certification] ON (Certification.Id = ContentItem.Id)
LEFT JOIN [dbo].[Claim] ON (Claim.Id = ContentItem.Id);
Here's one of my many iterations of LINQ queries (which is not returning the desired result).
var queryRetailers = contentItemModel
.Where(contentItem => contentItem.Type == ContentTypeEnum.Retailer)
.OrderByDescending(o => o.mtime).Skip(skip).Take(take).Select(o => new { Id = o.Id });
var queryBrands = contentItemModel
.Where(contentItem => contentItem.Type == ContentTypeEnum.Brand)
.OrderByDescending(o => o.mtime).Skip(skip).Take(take).Select(o => new { Id = o.Id });
var queryProducts = contentItemModel
.Where(contentItem => contentItem.Type == ContentTypeEnum.Product)
.OrderByDescending(o => o.mtime).Skip(skip).Take(take).Select(o => new { Id = o.Id });
var queryCertifications = contentItemModel
.Where(contentItem => contentItem.Type == ContentTypeEnum.Certification)
.OrderByDescending(o => o.mtime).Skip(skip).Take(take).Select(o => new { Id = o.Id });
var queryClaims = contentItemModel
.Where(contentItem => contentItem.Type == ContentTypeEnum.Claim)
.OrderByDescending(o => o.mtime).Skip(skip).Take(take).Select(o => new { Id = o.Id });
var query = from contentItem in
queryRetailers
.Concat(queryBrands)
.Concat(queryProducts)
.Concat(queryCertifications)
.Concat(queryClaims)
join item in context.Retailer on contentItem.Id equals item.Id into retailerGroup
from retailer in retailerGroup.DefaultIfEmpty(null)
join item in context.Brand on contentItem.Id equals item.Id into brandGroup
from brand in brandGroup.DefaultIfEmpty(null)
join item in context.Product on contentItem.Id equals item.Id into productGroup
from product in productGroup.DefaultIfEmpty(null)
join item in context.Certification on contentItem.Id equals item.Id into certificationGroup
from certification in certificationGroup.DefaultIfEmpty(null)
join item in context.Claim on contentItem.Id equals item.Id into claimGroup
from claim in claimGroup.DefaultIfEmpty(null)
select new
{
contentItem,
retailer,
brand,
product,
certification,
claim
};
var results = query.ToList();
This query returns SQL that essentially "nests" my UNION ALL statements, and the server returns all rows from the database.
SELECT
[Distinct4].[C1] AS [C1],
[Distinct4].[C2] AS [C2],
[Extent6].[Id] AS [Id],
[Extent6].[RowVersion] AS [RowVersion],
[Extent6].[ctime] AS [ctime],
[Extent6].[mtime] AS [mtime],
[Extent7].[Id] AS [Id1],
[Extent7].[Recommended] AS [Recommended],
[Extent7].[RowVersion] AS [RowVersion1],
[Extent7].[ctime] AS [ctime1],
[Extent7].[mtime] AS [mtime1],
[Extent8].[Id] AS [Id2],
[Extent8].[OverrideGrade] AS [OverrideGrade],
[Extent8].[PlantBased] AS [PlantBased],
[Extent8].[Recommended] AS [Recommended1],
[Extent8].[RowVersion] AS [RowVersion2],
[Extent8].[ctime] AS [ctime2],
[Extent8].[mtime] AS [mtime2],
[Extent8].[Brand_Id] AS [Brand_Id],
[Extent8].[Grade_Name] AS [Grade_Name],
[Extent8].[Grade_Value] AS [Grade_Value],
[Extent9].[Id] AS [Id3],
[Extent9].[RowVersion] AS [RowVersion3],
[Extent9].[ctime] AS [ctime3],
[Extent9].[mtime] AS [mtime3],
[Extent9].[Grade_Name] AS [Grade_Name1],
[Extent9].[Grade_Value] AS [Grade_Value1],
[Extent10].[Id] AS [Id4],
[Extent10].[RowVersion] AS [RowVersion4],
[Extent10].[ctime] AS [ctime4],
[Extent10].[mtime] AS [mtime4],
[Extent10].[Grade_Name] AS [Grade_Name2],
[Extent10].[Grade_Value] AS [Grade_Value2]
FROM (SELECT DISTINCT
[UnionAll4].[C1] AS [C1],
[UnionAll4].[C2] AS [C2]
FROM (SELECT
[Distinct3].[C1] AS [C1],
[Distinct3].[C2] AS [C2]
FROM ( SELECT DISTINCT
[UnionAll3].[C1] AS [C1],
[UnionAll3].[C2] AS [C2]
FROM (SELECT
[Distinct2].[C1] AS [C1],
[Distinct2].[C2] AS [C2]
FROM ( SELECT DISTINCT
[UnionAll2].[C1] AS [C1],
[UnionAll2].[C2] AS [C2]
FROM (SELECT
[Distinct1].[C1] AS [C1],
[Distinct1].[C2] AS [C2]
FROM ( SELECT DISTINCT
[UnionAll1].[C1] AS [C1],
[UnionAll1].[Id] AS [C2]
FROM (SELECT TOP (1000)
[Project1].[C1] AS [C1],
[Project1].[Id] AS [Id]
FROM ( SELECT [Project1].[Id] AS [Id], [Project1].[mtime] AS [mtime], [Project1].[C1] AS [C1], row_number() OVER (ORDER BY [Project1].[mtime] DESC) AS [row_number]
FROM ( SELECT
[Extent1].[Id] AS [Id],
[Extent1].[mtime] AS [mtime],
1 AS [C1]
FROM [dbo].[ContentItem] AS [Extent1]
WHERE 0 = CAST( [Extent1].[Type] AS int)
) AS [Project1]
) AS [Project1]
WHERE [Project1].[row_number] > 0
ORDER BY [Project1].[mtime] DESC
UNION ALL
SELECT TOP (1000)
[Project3].[C1] AS [C1],
[Project3].[Id] AS [Id]
FROM ( SELECT [Project3].[Id] AS [Id], [Project3].[mtime] AS [mtime], [Project3].[C1] AS [C1], row_number() OVER (ORDER BY [Project3].[mtime] DESC) AS [row_number]
FROM ( SELECT
[Extent2].[Id] AS [Id],
[Extent2].[mtime] AS [mtime],
1 AS [C1]
FROM [dbo].[ContentItem] AS [Extent2]
WHERE 1 = CAST( [Extent2].[Type] AS int)
) AS [Project3]
) AS [Project3]
WHERE [Project3].[row_number] > 0
ORDER BY [Project3].[mtime] DESC) AS [UnionAll1]
) AS [Distinct1]
UNION ALL
SELECT TOP (1000)
[Project7].[C1] AS [C1],
[Project7].[Id] AS [Id]
FROM ( SELECT [Project7].[Id] AS [Id], [Project7].[mtime] AS [mtime], [Project7].[C1] AS [C1], row_number() OVER (ORDER BY [Project7].[mtime] DESC) AS [row_number]
FROM ( SELECT
[Extent3].[Id] AS [Id],
[Extent3].[mtime] AS [mtime],
1 AS [C1]
FROM [dbo].[ContentItem] AS [Extent3]
WHERE 2 = CAST( [Extent3].[Type] AS int)
) AS [Project7]
) AS [Project7]
WHERE [Project7].[row_number] > 0
ORDER BY [Project7].[mtime] DESC) AS [UnionAll2]
) AS [Distinct2]
UNION ALL
SELECT TOP (1000)
[Project11].[C1] AS [C1],
[Project11].[Id] AS [Id]
FROM ( SELECT [Project11].[Id] AS [Id], [Project11].[mtime] AS [mtime], [Project11].[C1] AS [C1], row_number() OVER (ORDER BY [Project11].[mtime] DESC) AS [row_number]
FROM ( SELECT
[Extent4].[Id] AS [Id],
[Extent4].[mtime] AS [mtime],
1 AS [C1]
FROM [dbo].[ContentItem] AS [Extent4]
WHERE 3 = CAST( [Extent4].[Type] AS int)
) AS [Project11]
) AS [Project11]
WHERE [Project11].[row_number] > 0
ORDER BY [Project11].[mtime] DESC) AS [UnionAll3]
) AS [Distinct3]
UNION ALL
SELECT TOP (1000)
[Project15].[C1] AS [C1],
[Project15].[Id] AS [Id]
FROM ( SELECT [Project15].[Id] AS [Id], [Project15].[mtime] AS [mtime], [Project15].[C1] AS [C1], row_number() OVER (ORDER BY [Project15].[mtime] DESC) AS [row_number]
FROM ( SELECT
[Extent5].[Id] AS [Id],
[Extent5].[mtime] AS [mtime],
1 AS [C1]
FROM [dbo].[ContentItem] AS [Extent5]
WHERE 4 = CAST( [Extent5].[Type] AS int)
) AS [Project15]
) AS [Project15]
WHERE [Project15].[row_number] > 0
ORDER BY [Project15].[mtime] DESC) AS [UnionAll4] ) AS [Distinct4]
LEFT OUTER JOIN [dbo].[Retailer] AS [Extent6] ON [Distinct4].[C2] = [Extent6].[Id]
LEFT OUTER JOIN [dbo].[Brand] AS [Extent7] ON [Distinct4].[C2] = [Extent7].[Id]
LEFT OUTER JOIN [dbo].[Product] AS [Extent8] ON [Distinct4].[C2] = [Extent8].[Id]
LEFT OUTER JOIN [dbo].[Certification] AS [Extent9] ON [Distinct4].[C2] = [Extent9].[Id]
LEFT OUTER JOIN [dbo].[Claim] AS [Extent10] ON [Distinct4].[C2] = [Extent10].[Id]
So my overall questions are:
1) Is there a simpler SQL query I can execute to get the same results? I know that T-SQL doesn't support offsets per table in a subquery, hence the subquery wrapping.
2) If there isn't, what am I doing wrong in my LINQ query? Is this even possible with LINQ?
I wanted to add the SQL from #radar here all nice and formatted. It at least appears to be an elegant solution to avoid the sub-subqueries, and still accomplishes the offset/fetch.
SELECT *
FROM (SELECT
[ContentItem].*,
row_number() OVER ( PARTITION BY Type ORDER BY ContentItem.mtime ) as rn
FROM [dbo].[ContentItem]
LEFT JOIN [dbo].[Retailer] ON (Retailer.Id = ContentItem.Id)
LEFT JOIN [dbo].[Brand] ON (Brand.Id = ContentItem.Id)
LEFT JOIN [dbo].[Product] ON (Product.Id = ContentItem.Id)
LEFT JOIN [dbo].[Certification] ON (Certification.Id = ContentItem.Id)
LEFT JOIN [dbo].[Claim] ON (Claim.Id = ContentItem.Id)
) as x
WHERE x.rn >= a AND x.rn <= b;
a is the lower threshold (offset) and b is the upper threshold (fetch-ish). The only catch is that b now equals fetch + a instead of just fetch. The first set of results would be WHERE x.rn >= 0 AND x.rn <= 6, the second set WHERE x.rn >= 6 AND x.rn <= 12, third WHERE x.rn >= 12 AND x.rn <= 18, and so on.
As you are looking simpler SQL, you can use row_number analytic function, which would be faster
You need to try and see as there are many left joins and also proper index need to exists in these tables.
select *
from (
select *, row_number() over ( partition by Type order by ContentItem.mtime ) as rn
from [dbo].[ContentItem]
LEFT JOIN [dbo].[Retailer] ON (Retailer.Id = ContentItem.Id)
LEFT JOIN [dbo].[Brand] ON (Brand.Id = ContentItem.Id)
LEFT JOIN [dbo].[Product] ON (Product.Id = ContentItem.Id)
LEFT JOIN [dbo].[Certification] ON (Certification.Id = ContentItem.Id)
LEFT JOIN [dbo].[Claim] ON (Claim.Id = ContentItem.Id);
)
where rn <= 6
Well, it appears that I'm an idiot. That TOP(1000) call should have tipped me off. I assumed that my take variable was set to 6 but it was, in fact, set to 1000. Turns out my giant LINQ query works as expected, but the nested UNION ALL statements threw me off.
Still, I'm going to investigate #radar's answer further. It's hard to argue with better performance.

Categories

Resources