I want to get a list of customer IDs and the number of orders placed by that customer. Filter conditions are:
Orders with a total of $10 or less will not be counted.
Customers who did not place at least 3 orders (each with a total of $10 or more) will not be listed.
So, I would do the following in SQL:
SELECT customerID, COUNT(*)
FROM Orders
WHERE orderTotal > 10
GROUP BY customerID
HAVING COUNT(*) > 2
And in EF, I think this could be expressed as:
dbContext.Order
.Where(o => o.orderTotal > 10)
.GroupBy(o => o.customerID)
.Where(g => g.Count() > 2)
.ToList();
But this produces the following SQL that uses a derived table and a join rather than simply using a HAVING clause. I think this would be far from optimal in terms of performance. Is there a better way to formulate the case in EF so that the translated query will use the HAVING clause as it should?
SELECT
[Project1].[C1] AS [C1],
[Project1].[customerID] AS [customerID],
[Project1].[C2] AS [C2],
[Project1].[ID] AS [ID],
FROM ( SELECT
[GroupBy1].[K1] AS [customerID],
1 AS [C1],
[Extent2].[ID] AS [ID],
CASE WHEN ([Extent2].[storeID] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C2]
FROM (SELECT
[Extent1].[customerID] AS [K1],
COUNT(1) AS [A1]
FROM [dbo].[Orders] AS [Extent1]
WHERE [Extent1].[orderTotal] > cast(10 as decimal(18))
GROUP BY [Extent1].[customerID] ) AS [GroupBy1]
LEFT OUTER JOIN [dbo].[Orders] AS [Extent2] ON ([Extent2].[orderTotal] > cast(10 as decimal(18))) AND (([GroupBy1].[K1] = [Extent2].[customerID]) OR (([GroupBy1].[K1] IS NULL) AND ([Extent2].[customerID] IS NULL)))
WHERE [GroupBy1].[A1] > 2
) AS [Project1]
ORDER BY [Project1].[customerID] ASC, [Project1].[C2] ASC
Well, the LINQ to Entities query is not equivalent of the SQL query because it returns a list of groupings (pair of key and matching elements) which has no SQL equivalent at all.
If you return just the customerId and the Count as in the SQL query:
db.Orders
.Where(o => o.orderTotal > 10)
.GroupBy(o => o.customerID)
.Select(g => new { customerId = g.Key, orderCount = g.Count() })
.Where(g => g.Count > 2)
.ToList();
then the SQL generated by EF would be pretty much the same as (or functionally equivalent to) the expected:
SELECT
[GroupBy1].[K1] AS [customerID],
[GroupBy1].[A1] AS [C1]
FROM ( SELECT
[Extent1].[customerID] AS [K1],
COUNT(1) AS [A1]
FROM [dbo].[Orders] AS [Extent1]
WHERE [Extent1].[orderTotal] > cast(10 as decimal(18))
GROUP BY [Extent1].[customerID]
) AS [GroupBy1]
WHERE [GroupBy1].[A1] > 2
Related
Assuming the following code that applies filtering logic to a passed on collection.
private IQueryable<Customer> ApplyCustomerFilter(CustomerFilter filter, IQueryable<Customer> customers)
{
...
if (filter.HasProductInBackOrder == true)
{
customers = customers.Where(c => c.Orders.Any(o => o.Products.Any(p => p.Status == ProductStatus.BackOrder)))
}
....
return customers;
}
Results in this SQL statement:
SELECT [Extent1].[CustomerId] AS [CustomerId],
[Extent1].[Status] AS [Status]
FROM [Customers] AS [Extent1]
WHERE
(
EXISTS
(
SELECT 1 AS [C1]
FROM
(
SELECT [Extent3].[OrderId] AS [OrderId]
FROM [Orders] AS [Extent3]
WHERE [Extent1].[CustomerId] = [Extent3].[CustomerId]
) AS [Project1]
WHERE EXISTS
(
SELECT 1 AS [C1]
FROM [Products] AS [Extent4]
WHERE ([Project1].[OrderId] = [Extent4].[OrderId])
AND ([Extent4].[Status] = #p__linq__6)
)
)
)
However, I would like to optimize this by forcing to use INNER JOINS so that the result will be similar to this:
SELECT [Extent1].[CustomerId] AS [CustomerId],
[Extent1].[Status] AS [Status]
FROM [Customers] AS [Extent1]
INNER JOIN [Orders] AS [Extent2] ON [Extent1].[CustomerId] = [Extent2].[CustomerId]
INNER JOIN [Products] AS [Extent3] ON [Extent2].[OrderId] = [Extent3].[OrderId]
WHERE [Extent3].[Status] = #p__linq__6
I've tried multiple approaches, but I was unable to accomplish the desired result. Any suggestions on how to force the correct joins and avoiding subselects?
I have a problem with an unwanted ORDER BY on Entity Frameworks GroupJoin.
But at first my query. Its a simple query chain with a JOIN and ORDER.
First step - Simple condition query:
var dataQuery = model.InverterData
.Where(d => d.Timestamp >= from && d.Timestamp < till)
.Select(d => new
{
d.InverterID,
d.Timestamp
});
Second step - OUTER JOIN query:
var dayDataQuery = model.InverterDayData
.GroupJoin(dataQuery
, outer => outer.InverterID
, inner => inner.InverterID
, (outer, inner) => new
{
outer.InverterID,
outer.Date,
outer.DayYield,
InverterData = inner.Select(d => new
{
d.Timestamp
})
});
Third step - ORDER BY:
var orderedQuery = dayDataQuery
.OrderBy(d => d.DayYield)
.ThenBy(d => d.InverterID);
My Problem now is that the GroupJoin builds up a query with an unwanted ORDER BY at the end. That made my order complete senseless.
But in details...
This is the generated SQL after Step 1:
SELECT
1 AS [C1],
[Extent1].[InverterID] AS [InverterID],
[Extent1].[Timestamp] AS [Timestamp]
FROM [data].[InverterData] AS [Extent1]
WHERE ([Extent1].[Timestamp] >= #p__linq__0) AND ([Extent1].[Timestamp] < #p__linq__1)
Looks good!
Generated SQL after Step 2:
SELECT
[Project1].[C1] AS [C1],
[Project1].[InverterID] AS [InverterID],
[Project1].[Date] AS [Date],
[Project1].[DayYield] AS [DayYield],
[Project1].[C2] AS [C2],
[Project1].[Timestamp] AS [Timestamp]
FROM ( SELECT
[Extent1].[InverterID] AS [InverterID],
[Extent1].[Date] AS [Date],
[Extent1].[DayYield] AS [DayYield],
1 AS [C1],
[Extent2].[Timestamp] AS [Timestamp],
CASE WHEN ([Extent2].[InverterID] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C2]
FROM [data].[InverterDayData] AS [Extent1]
LEFT OUTER JOIN [data].[InverterData] AS [Extent2] ON ([Extent2].[Timestamp] >= #p__linq__0) AND ([Extent2].[Timestamp] < #p__linq__1) AND ([Extent1].[InverterID] = [Extent2].[InverterID])
) AS [Project1]
ORDER BY [Project1].[InverterID] ASC, [Project1].[Date] ASC, [Project1].[C2] ASC
An unwanted ORDER BY is added ORDER BY [Project1].[InverterID] ASC, [Project1].[Date] ASC, [Project1].[C2] ASC!
Generated SQL after Step 3:
SELECT
[Project1].[InverterID] AS [InverterID],
[Project1].[C1] AS [C1],
[Project1].[Date] AS [Date],
[Project1].[DayYield] AS [DayYield],
[Project1].[C2] AS [C2],
[Project1].[Timestamp] AS [Timestamp]
FROM ( SELECT
[Extent1].[InverterID] AS [InverterID],
[Extent1].[Date] AS [Date],
[Extent1].[DayYield] AS [DayYield],
1 AS [C1],
[Extent2].[Timestamp] AS [Timestamp],
CASE WHEN ([Extent2].[InverterID] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C2]
FROM [data].[InverterDayData] AS [Extent1]
LEFT OUTER JOIN [data].[InverterData] AS [Extent2] ON ([Extent2].[Timestamp] >= #p__linq__0) AND ([Extent2].[Timestamp] < #p__linq__1) AND ([Extent1].[InverterID] = [Extent2].[InverterID])
) AS [Project1]
ORDER BY [Project1].[DayYield] ASC, [Project1].[InverterID] ASC, [Project1].[Date] ASC, [Project1].[C2] ASC
My order (by DayYield and InverterID) will not work because Date and C2 is now also include.
Question:
Why and/or how can I make it work correct?
On-server order is need because on-server paging.
Answer related to #IvanStoev
How do you mean this? Its made an huge different on the resulting order.
I'm having really hard time tuning up one of my Entity Framework generated queries in my application. It is very basic query but for some reason EF uses multiple inner subqueries which seem to perform horribly in DB instead of using joins.
Here's my LINQ code:
Projects.Select(proj => new ProjectViewModel()
{
Name = proj.Name,
Id = proj.Id,
Total = proj.Subvalue.Where(subv =>
subv.Created >= startDate
&& subv.Created <= endDate
&&
(subv.StatusId == 1 ||
subv.StatusId == 2))
.Select(c => c.SubValueSum)
.DefaultIfEmpty()
.Sum()
})
.OrderByDescending(c => c.Total)
.Take(10);
EF generates really complex query with multiple subqueries which has awful query performance like this:
SELECT TOP (10)
[Project3].[Id] AS [Id],
[Project3].[Name] AS [Name],
[Project3].[C1] AS [C1]
FROM ( SELECT
[Project2].[Id] AS [Id],
[Project2].[Name] AS [Name],
[Project2].[C1] AS [C1]
FROM ( SELECT
[Extent1].[Id] AS [Id],
[Extent1].[Name] AS [Name],
(SELECT
SUM([Join1].[A1]) AS [A1]
FROM ( SELECT
CASE WHEN ([Project1].[C1] IS NULL) THEN cast(0 as decimal(18)) ELSE [Project1].[SubValueSum] END AS [A1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable1]
LEFT OUTER JOIN (SELECT
[Extent2].[SubValueSum] AS [SubValueSum],
cast(1 as tinyint) AS [C1]
FROM [dbo].[Subvalue] AS [Extent2]
WHERE ([Extent1].[Id] = [Extent2].[Id]) AND ([Extent2].[Created] >= '2015-08-01') AND ([Extent2].[Created] <= '2015-10-01') AND ([Extent2].[StatusId] IN (1,2)) ) AS [Project1] ON 1 = 1
) AS [Join1]) AS [C1]
FROM [dbo].[Project] AS [Extent1]
WHERE ([Extent1].[ProjectCountryId] = 77) AND ([Extent1].[Active] = 1)
) AS [Project2]
) AS [Project3]
ORDER BY [Project3].[C1] DESC;
The execution time of the query generated by EF is ~10 seconds. But when I write the query by hand like this:
select
TOP (10)
Proj.Id,
Proj.Name,
SUM(Subv.SubValueSum) AS Total
from
SubValue as Subv
left join
Project as Proj on Proj.Id = Subv.ProjectId
where
Subv.Created > '2015-08-01' AND Subv.Created <= '2015-10-01' AND Subv.StatusId IN (1,2)
group by
Proj.Id,
Proj.Name
order by
Total DESC
The execution time is near instant; below 30ms.
The problem clearly lies in my ability to write good EF queries with LINQ but no matter what I try to do (using Linqpad for testing) I just can't write similar performant query with LINQ\EF as I can write by hand. I've trie querying the SubValue table and Project table but the endcome is mostly the same: multiple ineffective nested subqueries instead of a single join doing the work.
How can I write a query which imitates the hand written SQL shown above? How can I control the actual query generated by EF? And most importantly: how can I get Linq2SQL and Entity Framework to use Joins when I want to instead of nested subqueries.
EF generates SQL from the LINQ expression you provide and you cannot expect this conversion to completely unravel the structure of whatever you put into the expression in order to optimize it. In your case you have created an expression tree that for each project will use a navigation property to sum some subvalues related to the project. This results in nested subqueries as you have discovered.
To improve on the generated SQL you need to avoid navigating from project to subvalue before doing all the operations on subvalue and you can do this by creating a join (which is also what you do in you hand crafted SQL):
var query = from proj in context.Project
join s in context.SubValue.Where(s => s.Created >= startDate && s.Created <= endDate && (s.StatusId == 1 || s.StatusId == 2)) on proj.Id equals s.ProjectId into s2
from subv in s2.DefaultIfEmpty()
select new { proj, subv } into x
group x by new { x.proj.Id, x.proj.Name } into g
select new {
g.Key.Id,
g.Key.Name,
Total = g.Select(y => y.subv.SubValueSum).Sum()
} into y
orderby y.Total descending
select y;
var result = query.Take(10);
The basic idea is to join projects on subvalues restricted by a where clause. To perform a left join you need the DefaultIfEmpty() but you already know that.
The joined values (x) are then grouped and the summation of SubValueSum is performed in each group.
Finally, ordering and TOP(10) is applied.
The generated SQL still contains subqueries but I would expect it to more efficient compared to SQL generated by your query:
SELECT TOP (10)
[Project1].[Id] AS [Id],
[Project1].[Name] AS [Name],
[Project1].[C1] AS [C1]
FROM ( SELECT
[GroupBy1].[A1] AS [C1],
[GroupBy1].[K1] AS [Id],
[GroupBy1].[K2] AS [Name]
FROM ( SELECT
[Extent1].[Id] AS [K1],
[Extent1].[Name] AS [K2],
SUM([Extent2].[SubValueSum]) AS [A1]
FROM [dbo].[Project] AS [Extent1]
LEFT OUTER JOIN [dbo].[SubValue] AS [Extent2] ON ([Extent2].[Created] >= #p__linq__0) AND ([Extent2].[Created] <= #p__linq__1) AND ([Extent2].[StatusId] IN (1,2)) AND ([Extent1].[Id] = [Extent2].[ProjectId])
GROUP BY [Extent1].[Id], [Extent1].[Name]
) AS [GroupBy1]
) AS [Project1]
ORDER BY [Project1].[C1] DESC
I'm trying to write a simple SQL query in LinQ, and no matter how hard I try, I always get a complex query.
Here is the SQL I am trying to achieve (this is not what I'm getting):
SELECT
ClearingAccounts.ID,
SUM(CASE WHEN Payments.StatusID = 1 THEN Payments.TotalAmount ELSE 0 END) AS Sum1,
SUM(CASE WHEN DirectDebits.StatusID = 2 THEN DirectDebits.TotalAmount ELSE 0 END) AS Sum2,
SUM(CASE WHEN Payments.StatusID = 2 THEN Payments.TotalAmount ELSE 0 END) AS Sum3,
SUM(CASE WHEN DirectDebits.StatusID = 1 THEN DirectDebits.TotalAmount ELSE 0 END) AS Sum4
FROM ClearingAccounts
LEFT JOIN Payments ON Payments.ClearingAccountID = ClearingAccounts.ID
LEFT JOIN DirectDebits ON DirectDebits.ClearingAccountID = ClearingAccounts.ID
GROUP BY ClearingAccounts.ID
Here is the code:
from clearingAccount in clearingAccounts
let payments = clearingAccount.Payments
let directDebits = clearingAccount.DirectDebits
select new
{
ID = clearingAccount.ID,
Sum1 = payments.Sum(p => p.StatusID == 1 ? p.TotalAmount : 0),
Sum2 = directDebits.Sum(p => p.StatusID == 2 ? p.TotalAmount : 0),
Sum3 = payments.Sum(p => p.StatusID == 2 ? p.TotalAmount : 0),
Sum4 = directDebits.Sum(p => p.StatusID == 1 ? p.TotalAmount : 0),
}
The generated query gets the data from the respective table for each sum, so four times. I'm not sure if it's even possible to optimize this?
EDIT Here the is generated query:
SELECT
[Project5].[ID] AS [ID],
[Project5].[C1] AS [C1],
[Project5].[C2] AS [C2],
[Project5].[C3] AS [C3],
[Project5].[C4] AS [C4]
FROM ( SELECT
[Project4].[ID] AS [ID],
[Project4].[C1] AS [C1],
[Project4].[C2] AS [C2],
[Project4].[C3] AS [C3],
(SELECT
SUM([Filter5].[A1]) AS [A1]
FROM ( SELECT
CASE WHEN (1 = [Extent5].[StatusID]) THEN [Extent5].[TotalAmount] ELSE cast(0 as decimal(18)) END AS [A1]
FROM [dbo].[DirectDebits] AS [Extent5]
WHERE [Project4].[ID] = [Extent5].[ClearingAccountID]
) AS [Filter5]) AS [C4]
FROM ( SELECT
[Project3].[ID] AS [ID],
[Project3].[C1] AS [C1],
[Project3].[C2] AS [C2],
(SELECT
SUM([Filter4].[A1]) AS [A1]
FROM ( SELECT
CASE WHEN (2 = [Extent4].[StatusID]) THEN [Extent4].[TotalAmount] ELSE cast(0 as decimal(18)) END AS [A1]
FROM [dbo].[Payments] AS [Extent4]
WHERE [Project3].[ID] = [Extent4].[ClearingAccountID]
) AS [Filter4]) AS [C3]
FROM ( SELECT
[Project2].[ID] AS [ID],
[Project2].[C1] AS [C1],
(SELECT
SUM([Filter3].[A1]) AS [A1]
FROM ( SELECT
CASE WHEN (2 = [Extent3].[StatusID]) THEN [Extent3].[TotalAmount] ELSE cast(0 as decimal(18)) END AS [A1]
FROM [dbo].[DirectDebits] AS [Extent3]
WHERE [Project2].[ID] = [Extent3].[ClearingAccountID]
) AS [Filter3]) AS [C2]
FROM ( SELECT
[Project1].[ID] AS [ID],
(SELECT
SUM([Filter2].[A1]) AS [A1]
FROM ( SELECT
CASE WHEN (1 = [Extent2].[StatusID]) THEN [Extent2].[TotalAmount] ELSE cast(0 as decimal(18)) END AS [A1]
FROM [dbo].[Payments] AS [Extent2]
WHERE [Project1].[ID] = [Extent2].[ClearingAccountID]
) AS [Filter2]) AS [C1]
FROM ( SELECT
[Extent1].[ID] AS [ID]
FROM [dbo].[ClearingAccounts] AS [Extent1]
WHERE ([Extent1].[CustomerID] = 3) AND ([Extent1].[Deleted] <> 1)
) AS [Project1]
) AS [Project2]
) AS [Project3]
) AS [Project4]
) AS [Project5]
Edit
Note that as per #usr's comment, that your original Sql Query is broken. By LEFT OUTER joining on two independent tables, and then grouping on the common join key, as soon as one of the DirectDebits or Payments tables returns more than one row, you will erroneously duplicate the TotalAmount value in the 'other' SUMmed colums (and vice versa). e.g. If a given ClearingAccount has 3 DirectDebits and 4 Payments, you will get a total of 12 rows (whereas you should be summing 3 and 4 rows independently for the two tables). A better Sql Query would be:
WITH ctePayments AS
(
SELECT
ClearingAccounts.ID,
-- Note the ELSE 0 projection isn't required as nulls are eliminated from aggregates
SUM(CASE WHEN Payments.StatusID = 1 THEN Payments.TotalAmount END) AS Sum1,
SUM(CASE WHEN Payments.StatusID = 2 THEN Payments.TotalAmount END) AS Sum3
FROM ClearingAccounts
INNER JOIN Payments ON Payments.ClearingAccountID = ClearingAccounts.ID
GROUP BY ClearingAccounts.ID
),
cteDirectDebits AS
(
SELECT
ClearingAccounts.ID,
SUM(CASE WHEN DirectDebits.StatusID = 2 THEN DirectDebits.TotalAmount END) AS Sum2,
SUM(CASE WHEN DirectDebits.StatusID = 1 THEN DirectDebits.TotalAmount END) AS Sum4
FROM ClearingAccounts
INNER JOIN DirectDebits ON DirectDebits.ClearingAccountID = ClearingAccounts.ID
GROUP BY ClearingAccounts.ID
)
SELECT ca.ID, COALESCE(p.Sum1, 0) AS Sum1, COALESCE(d.Sum2, 0) AS Sum2,
COALESCE(p.Sum3, 0) AS Sum3, COALESCE(d.Sum4, 0) AS Sum4
FROM
ClearingAccounts ca
LEFT OUTER JOIN ctePayments p
ON ca.ID = p.ID
LEFT OUTER JOIN cteDirectDebits d
ON ca.ID = d.ID;
-- GROUP BY not required, since we have already guaranteed at most one row
-- per joined table in the CTE's, assuming ClearingAccounts.ID is unique;
You'll want to fix and test this with test cases before you even contemplate conversion to LINQ.
Old Answer(s)
The Sql construct:
SELECT SUM(CASE WHEN ... THEN 1 ELSE 0 END) AS Something
when applied in a SELECT list, is a common hack 'alternative' to pivot data from the 'greater' select into columns which meet the projection criteria (and hence the zero if not matched) . It isn't really a sum at all, its a 'matched' count.
With regards to optimizing the Sql generated, another alternative would be to materialize the data after joining and grouping (and of course, if there is a predicate WHERE clause, apply that in Sql too via IQueryable), and then do the conditional summation in memory:
var result2 = Db.ClearingAccounts
.Include(c => c.Payments)
.Include(c => c.DirectDebits)
.GroupBy(c => c.Id)
.ToList() // or any other means to force materialization here.
.ToDictionary(
grp => grp.Key,
grp => new
{
PaymentsByStatus = grp.SelectMany(x => x.Payments)
.GroupBy(p => p.StatusId),
DirectDebitByStatus = grp.SelectMany(x => x.Payments)
.GroupBy(p => p.StatusId),
})
.Select(ca => new
{
ID = ca.Key,
Sum1 = ca.Value.PaymentsByStatus.Where(pbs => pbs.Key == 1)
.Select(pbs => pbs.Select(x => x.TotalAmount).Sum()),
Sum2 = ca.Value.DirectDebitByStatus.Where(pbs => pbs.Key == 2)
.Select(ddbs => ddbs.Select(x => x.TotalAmount).Sum()),
Sum3 = ca.Value.PaymentsByStatus.Where(pbs => pbs.Key == 2)
.Select(pbs => pbs.Select(x => x.TotalAmount).Sum()),
Sum4 = ca.Value.DirectDebitByStatus.Where(pbs => pbs.Key == 1)
.Select(ddbs => ddbs.Select(x => x.TotalAmount).Sum())
});
However, personally, I would leave this pivot projection directly in Sql, and then use something like SqlQuery to then deserialize the result back from Sql
directly into the final Entity type.
1)
Add AsNoTracking in EF to avoid tracking changes.
Check that you have indexes on the columns you are using for the JOINs. Especially the column that you are using to group by. Profile the query and optimize it. EF has also overhead over a stored procedure.
or
2) If you cannot find a way to make it as fast as you need, create a stored procedure and call it from EF. Even the same query will be faster.
I need to query multiple tables with one query, and I need to limit the results from each table individually.
An example ...
I have a ContentItem, Retailer, and Product table.
ContentItem has a Type (int) field that corresponds to an enum of content types like "Retailer" and "Product." I am filtering ContentItem using this field for each sub-subquery.
ContentItem has an Id (pkey) field.
Retailer and Product have an Id (pkey) field. Id is also an FK to ContentItem.Id.
I can select from all three tables with a LEFT JOIN query. From there, I can then limit the total number of rows returned, let's say 6 rows total.
What I want to do is limit the number of rows returned from Retailer and Product individually. This way, I will have 12 rows (max) total: 6 from Retailer, and 6 from Product.
I can already accomplish this with SQL, but I am having a difficult time getting LINQ-to-Entities to "do the right thing."
Here's my SQL
SELECT * From
(
(SELECT * FROM (SELECT * FROM [dbo].[ContentItem] WHERE Type = 0 ORDER BY ContentItem.mtime OFFSET 0 ROWS FETCH NEXT 6 ROWS ONLY) Retailers)
UNION ALL
(SELECT * FROM (SELECT * FROM [dbo].[ContentItem] WHERE Type = 1 ORDER BY ContentItem.mtime OFFSET 0 ROWS FETCH NEXT 6 ROWS ONLY) Brands)
UNION ALL
(SELECT * FROM (SELECT * FROM [dbo].[ContentItem] WHERE Type = 2 ORDER BY ContentItem.mtime OFFSET 0 ROWS FETCH NEXT 6 ROWS ONLY) Products)
UNION ALL
(SELECT * FROM (SELECT * FROM [dbo].[ContentItem] WHERE Type = 3 ORDER BY ContentItem.mtime OFFSET 0 ROWS FETCH NEXT 6 ROWS ONLY) Certifications)
UNION ALL
(SELECT * FROM (SELECT * FROM [dbo].[ContentItem] WHERE Type = 4 ORDER BY ContentItem.mtime OFFSET 0 ROWS FETCH NEXT 6 ROWS ONLY) Claims)
) as ContentItem
LEFT JOIN [dbo].[Retailer] ON (Retailer.Id = ContentItem.Id)
LEFT JOIN [dbo].[Brand] ON (Brand.Id = ContentItem.Id)
LEFT JOIN [dbo].[Product] ON (Product.Id = ContentItem.Id)
LEFT JOIN [dbo].[Certification] ON (Certification.Id = ContentItem.Id)
LEFT JOIN [dbo].[Claim] ON (Claim.Id = ContentItem.Id);
Here's one of my many iterations of LINQ queries (which is not returning the desired result).
var queryRetailers = contentItemModel
.Where(contentItem => contentItem.Type == ContentTypeEnum.Retailer)
.OrderByDescending(o => o.mtime).Skip(skip).Take(take).Select(o => new { Id = o.Id });
var queryBrands = contentItemModel
.Where(contentItem => contentItem.Type == ContentTypeEnum.Brand)
.OrderByDescending(o => o.mtime).Skip(skip).Take(take).Select(o => new { Id = o.Id });
var queryProducts = contentItemModel
.Where(contentItem => contentItem.Type == ContentTypeEnum.Product)
.OrderByDescending(o => o.mtime).Skip(skip).Take(take).Select(o => new { Id = o.Id });
var queryCertifications = contentItemModel
.Where(contentItem => contentItem.Type == ContentTypeEnum.Certification)
.OrderByDescending(o => o.mtime).Skip(skip).Take(take).Select(o => new { Id = o.Id });
var queryClaims = contentItemModel
.Where(contentItem => contentItem.Type == ContentTypeEnum.Claim)
.OrderByDescending(o => o.mtime).Skip(skip).Take(take).Select(o => new { Id = o.Id });
var query = from contentItem in
queryRetailers
.Concat(queryBrands)
.Concat(queryProducts)
.Concat(queryCertifications)
.Concat(queryClaims)
join item in context.Retailer on contentItem.Id equals item.Id into retailerGroup
from retailer in retailerGroup.DefaultIfEmpty(null)
join item in context.Brand on contentItem.Id equals item.Id into brandGroup
from brand in brandGroup.DefaultIfEmpty(null)
join item in context.Product on contentItem.Id equals item.Id into productGroup
from product in productGroup.DefaultIfEmpty(null)
join item in context.Certification on contentItem.Id equals item.Id into certificationGroup
from certification in certificationGroup.DefaultIfEmpty(null)
join item in context.Claim on contentItem.Id equals item.Id into claimGroup
from claim in claimGroup.DefaultIfEmpty(null)
select new
{
contentItem,
retailer,
brand,
product,
certification,
claim
};
var results = query.ToList();
This query returns SQL that essentially "nests" my UNION ALL statements, and the server returns all rows from the database.
SELECT
[Distinct4].[C1] AS [C1],
[Distinct4].[C2] AS [C2],
[Extent6].[Id] AS [Id],
[Extent6].[RowVersion] AS [RowVersion],
[Extent6].[ctime] AS [ctime],
[Extent6].[mtime] AS [mtime],
[Extent7].[Id] AS [Id1],
[Extent7].[Recommended] AS [Recommended],
[Extent7].[RowVersion] AS [RowVersion1],
[Extent7].[ctime] AS [ctime1],
[Extent7].[mtime] AS [mtime1],
[Extent8].[Id] AS [Id2],
[Extent8].[OverrideGrade] AS [OverrideGrade],
[Extent8].[PlantBased] AS [PlantBased],
[Extent8].[Recommended] AS [Recommended1],
[Extent8].[RowVersion] AS [RowVersion2],
[Extent8].[ctime] AS [ctime2],
[Extent8].[mtime] AS [mtime2],
[Extent8].[Brand_Id] AS [Brand_Id],
[Extent8].[Grade_Name] AS [Grade_Name],
[Extent8].[Grade_Value] AS [Grade_Value],
[Extent9].[Id] AS [Id3],
[Extent9].[RowVersion] AS [RowVersion3],
[Extent9].[ctime] AS [ctime3],
[Extent9].[mtime] AS [mtime3],
[Extent9].[Grade_Name] AS [Grade_Name1],
[Extent9].[Grade_Value] AS [Grade_Value1],
[Extent10].[Id] AS [Id4],
[Extent10].[RowVersion] AS [RowVersion4],
[Extent10].[ctime] AS [ctime4],
[Extent10].[mtime] AS [mtime4],
[Extent10].[Grade_Name] AS [Grade_Name2],
[Extent10].[Grade_Value] AS [Grade_Value2]
FROM (SELECT DISTINCT
[UnionAll4].[C1] AS [C1],
[UnionAll4].[C2] AS [C2]
FROM (SELECT
[Distinct3].[C1] AS [C1],
[Distinct3].[C2] AS [C2]
FROM ( SELECT DISTINCT
[UnionAll3].[C1] AS [C1],
[UnionAll3].[C2] AS [C2]
FROM (SELECT
[Distinct2].[C1] AS [C1],
[Distinct2].[C2] AS [C2]
FROM ( SELECT DISTINCT
[UnionAll2].[C1] AS [C1],
[UnionAll2].[C2] AS [C2]
FROM (SELECT
[Distinct1].[C1] AS [C1],
[Distinct1].[C2] AS [C2]
FROM ( SELECT DISTINCT
[UnionAll1].[C1] AS [C1],
[UnionAll1].[Id] AS [C2]
FROM (SELECT TOP (1000)
[Project1].[C1] AS [C1],
[Project1].[Id] AS [Id]
FROM ( SELECT [Project1].[Id] AS [Id], [Project1].[mtime] AS [mtime], [Project1].[C1] AS [C1], row_number() OVER (ORDER BY [Project1].[mtime] DESC) AS [row_number]
FROM ( SELECT
[Extent1].[Id] AS [Id],
[Extent1].[mtime] AS [mtime],
1 AS [C1]
FROM [dbo].[ContentItem] AS [Extent1]
WHERE 0 = CAST( [Extent1].[Type] AS int)
) AS [Project1]
) AS [Project1]
WHERE [Project1].[row_number] > 0
ORDER BY [Project1].[mtime] DESC
UNION ALL
SELECT TOP (1000)
[Project3].[C1] AS [C1],
[Project3].[Id] AS [Id]
FROM ( SELECT [Project3].[Id] AS [Id], [Project3].[mtime] AS [mtime], [Project3].[C1] AS [C1], row_number() OVER (ORDER BY [Project3].[mtime] DESC) AS [row_number]
FROM ( SELECT
[Extent2].[Id] AS [Id],
[Extent2].[mtime] AS [mtime],
1 AS [C1]
FROM [dbo].[ContentItem] AS [Extent2]
WHERE 1 = CAST( [Extent2].[Type] AS int)
) AS [Project3]
) AS [Project3]
WHERE [Project3].[row_number] > 0
ORDER BY [Project3].[mtime] DESC) AS [UnionAll1]
) AS [Distinct1]
UNION ALL
SELECT TOP (1000)
[Project7].[C1] AS [C1],
[Project7].[Id] AS [Id]
FROM ( SELECT [Project7].[Id] AS [Id], [Project7].[mtime] AS [mtime], [Project7].[C1] AS [C1], row_number() OVER (ORDER BY [Project7].[mtime] DESC) AS [row_number]
FROM ( SELECT
[Extent3].[Id] AS [Id],
[Extent3].[mtime] AS [mtime],
1 AS [C1]
FROM [dbo].[ContentItem] AS [Extent3]
WHERE 2 = CAST( [Extent3].[Type] AS int)
) AS [Project7]
) AS [Project7]
WHERE [Project7].[row_number] > 0
ORDER BY [Project7].[mtime] DESC) AS [UnionAll2]
) AS [Distinct2]
UNION ALL
SELECT TOP (1000)
[Project11].[C1] AS [C1],
[Project11].[Id] AS [Id]
FROM ( SELECT [Project11].[Id] AS [Id], [Project11].[mtime] AS [mtime], [Project11].[C1] AS [C1], row_number() OVER (ORDER BY [Project11].[mtime] DESC) AS [row_number]
FROM ( SELECT
[Extent4].[Id] AS [Id],
[Extent4].[mtime] AS [mtime],
1 AS [C1]
FROM [dbo].[ContentItem] AS [Extent4]
WHERE 3 = CAST( [Extent4].[Type] AS int)
) AS [Project11]
) AS [Project11]
WHERE [Project11].[row_number] > 0
ORDER BY [Project11].[mtime] DESC) AS [UnionAll3]
) AS [Distinct3]
UNION ALL
SELECT TOP (1000)
[Project15].[C1] AS [C1],
[Project15].[Id] AS [Id]
FROM ( SELECT [Project15].[Id] AS [Id], [Project15].[mtime] AS [mtime], [Project15].[C1] AS [C1], row_number() OVER (ORDER BY [Project15].[mtime] DESC) AS [row_number]
FROM ( SELECT
[Extent5].[Id] AS [Id],
[Extent5].[mtime] AS [mtime],
1 AS [C1]
FROM [dbo].[ContentItem] AS [Extent5]
WHERE 4 = CAST( [Extent5].[Type] AS int)
) AS [Project15]
) AS [Project15]
WHERE [Project15].[row_number] > 0
ORDER BY [Project15].[mtime] DESC) AS [UnionAll4] ) AS [Distinct4]
LEFT OUTER JOIN [dbo].[Retailer] AS [Extent6] ON [Distinct4].[C2] = [Extent6].[Id]
LEFT OUTER JOIN [dbo].[Brand] AS [Extent7] ON [Distinct4].[C2] = [Extent7].[Id]
LEFT OUTER JOIN [dbo].[Product] AS [Extent8] ON [Distinct4].[C2] = [Extent8].[Id]
LEFT OUTER JOIN [dbo].[Certification] AS [Extent9] ON [Distinct4].[C2] = [Extent9].[Id]
LEFT OUTER JOIN [dbo].[Claim] AS [Extent10] ON [Distinct4].[C2] = [Extent10].[Id]
So my overall questions are:
1) Is there a simpler SQL query I can execute to get the same results? I know that T-SQL doesn't support offsets per table in a subquery, hence the subquery wrapping.
2) If there isn't, what am I doing wrong in my LINQ query? Is this even possible with LINQ?
I wanted to add the SQL from #radar here all nice and formatted. It at least appears to be an elegant solution to avoid the sub-subqueries, and still accomplishes the offset/fetch.
SELECT *
FROM (SELECT
[ContentItem].*,
row_number() OVER ( PARTITION BY Type ORDER BY ContentItem.mtime ) as rn
FROM [dbo].[ContentItem]
LEFT JOIN [dbo].[Retailer] ON (Retailer.Id = ContentItem.Id)
LEFT JOIN [dbo].[Brand] ON (Brand.Id = ContentItem.Id)
LEFT JOIN [dbo].[Product] ON (Product.Id = ContentItem.Id)
LEFT JOIN [dbo].[Certification] ON (Certification.Id = ContentItem.Id)
LEFT JOIN [dbo].[Claim] ON (Claim.Id = ContentItem.Id)
) as x
WHERE x.rn >= a AND x.rn <= b;
a is the lower threshold (offset) and b is the upper threshold (fetch-ish). The only catch is that b now equals fetch + a instead of just fetch. The first set of results would be WHERE x.rn >= 0 AND x.rn <= 6, the second set WHERE x.rn >= 6 AND x.rn <= 12, third WHERE x.rn >= 12 AND x.rn <= 18, and so on.
As you are looking simpler SQL, you can use row_number analytic function, which would be faster
You need to try and see as there are many left joins and also proper index need to exists in these tables.
select *
from (
select *, row_number() over ( partition by Type order by ContentItem.mtime ) as rn
from [dbo].[ContentItem]
LEFT JOIN [dbo].[Retailer] ON (Retailer.Id = ContentItem.Id)
LEFT JOIN [dbo].[Brand] ON (Brand.Id = ContentItem.Id)
LEFT JOIN [dbo].[Product] ON (Product.Id = ContentItem.Id)
LEFT JOIN [dbo].[Certification] ON (Certification.Id = ContentItem.Id)
LEFT JOIN [dbo].[Claim] ON (Claim.Id = ContentItem.Id);
)
where rn <= 6
Well, it appears that I'm an idiot. That TOP(1000) call should have tipped me off. I assumed that my take variable was set to 6 but it was, in fact, set to 1000. Turns out my giant LINQ query works as expected, but the nested UNION ALL statements threw me off.
Still, I'm going to investigate #radar's answer further. It's hard to argue with better performance.