Convert SQL Sub Query to In to Linq Lambda - c#

How do I convert the following SQL statement into Lambda Expression or Linq Query?
The following query get the single most recent Answer for each Question. Or to phrase it another way, get each Question with the newest Answer.
Also this will be execute by Entity Framework.
SELECT Answers.*
FROM Answers
Where AnswerID IN
(
SELECT Max(AnswerID) AnswerID
FROM Answers
GROUP BY QuestionID
)
Here another way to look at the previous query using an Inner Join
SELECT answers.*
FROM answers
INNER JOIN
(
SELECT Max(answerID) answerID --, QuestionSiteID
FROM answers
GROUP BY QuestionID
) t ON
answers.answerID = t.answerID
I have read that the LINQ Contains method is sub optimal for queries that access SQL.
LINQ to Sql and .Contains() trap.

I think you could do this using something like:
var subQuery = from a in answers
group a by a.QuestionID into grouping
select new
{
QuestionID = grouping.Key,
MaxAnswerID = grouping.Max(x => x.AnswerID)
};
var query = from a in answers
from s in subQuery
where a.AnswerID == s.MaxAnswerID
select a;
This results in a CROSS JOIN in the generated SQL
Also, you could use join in the second part of the query:
var query = from a in answers
join s in subQuery on a.AnswerID equals s.MaxAnswerID
select a;
This results in a INNER JOIN in the SQL
Note for side cases - the above answers make the reasonable assumption that AnswerID is the primary key of Answers - if you happen to have instead a table design which is keyed on (AnswerID, QuestionID) then you will need to join by both AnswerID and QuestionID like:
var subQuery = from a in answers
group a by a.QuestionID into grouping
select new
{
QuestionID = grouping.Key,
MaxAnswerID = grouping.Max(x => x.AnswerID)
};
var query = from a in answers
from s in subQuery
where a.AnswerID == s.MaxAnswerID
&& a.QuestionID == s.QuestionID
select a;
See the comment trail for more discussion on this alternate table design...

You could use a let statement to select the first answer per QuestionID group:
from answer in Answers
group answer by answer.QuestionID into question
let firstAnswer = question.OrderByDescending(q => q.AnswerID).First()
select firstAnswer
EDIT: Linq2Sql translates the above query into a N+1 database calls. This query gets translated to just one SQL query:
from a in Answers
group a by a.QuestionID into grouping
join a2 in Answers on
new {AnswerID = grouping.Max(x => x.AnswerID), QuestionID = grouping.Key}
equals new {a2.AnswerID, a2.QuestionID}
select a2
Makes me wonder in what way Linq2Sql is supposed to be simpler than SQL.

Try to use this query:
var query = from c in context.Childs
group c by c.ParentEntityId into pc
select pc.OrderByDescending(pcc => pcc.Id).Take(1);
I just checked the query in profiler and it produces single SQL query (the ugly one):
SELECT
[Project3].[ParentEntityId] AS [ParentEntityId],
[Project3].[C1] AS [C1],
[Project3].[Id] AS [Id],
[Project3].[Name] AS [Name],
[Project3].[ParentEntityId1] AS [ParentEntityId1]
FROM ( SELECT
[Distinct1].[ParentEntityId] AS [ParentEntityId],
[Limit1].[Id] AS [Id],
[Limit1].[Name] AS [Name],
[Limit1].[ParentEntityId] AS [ParentEntityId1],
CASE WHEN ([Limit1].[Id] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C1]
FROM (SELECT DISTINCT
[Extent1].[ParentEntityId] AS [ParentEntityId]
FROM [dbo].[ChildEntities] AS [Extent1] ) AS [Distinct1]
OUTER APPLY (SELECT TOP (1) [Project2].[Id] AS [Id], [Project2].[Name] AS [Name], [Project2].[ParentEntityId] AS [ParentEntityId]
FROM ( SELECT
[Extent2].[Id] AS [Id],
[Extent2].[Name] AS [Name],
[Extent2].[ParentEntityId] AS [ParentEntityId]
FROM [dbo].[ChildEntities] AS [Extent2]
WHERE ([Distinct1].[ParentEntityId] = [Extent2].[ParentEntityId]) OR (([Distinct1].[ParentEntityId] IS NULL) AND ([Extent2].[ParentEntityId] IS NULL))
) AS [Project2]
ORDER BY [Project2].[Id] DESC ) AS [Limit1]
) AS [Project3]
ORDER BY [Project3].[ParentEntityId] ASC, [Project3].[C1] ASC

Related

Anyone have an idea how to express complex join logic within a LINQ query?

I've been at it all day. For the life of me, I cannot figure out how to translate either of the final two select statements found within the below code snippet:
declare #Person table
(
[Name] varchar(50),
[ABA] varchar(9)
)
declare #Entity table
(
[Name] varchar(50),
[Respondent] varchar(9),
[TierRespondent] varchar(9)
)
insert into #Person ([Name], [ABA])
select 'Steve', '000000001'
union
select 'Mary', '000000002'
union
select 'Carey', '000000003'
insert into #Entity ([Name], [Respondent], [TierRespondent])
select 'Steve', '000000001', '000000006'
union
select 'Mary', '000000004', '000000002'
union
select 'Carey', '000000005', '000000008'
select *
FROM #Entity e
LEFT JOIN #Person p
ON p.[ABA] = e.Respondent
or p.[ABA] = e.[TierRespondent]
select *
FROM #Entity e
LEFT JOIN #Person p
ON p.[ABA] in (e.Respondent ,e.[TierRespondent])
The thing that boggles my mind is the logic found within the ON clause of the join statements.
I'm not a SQL wiz, so I've even failed at trying to restructure these SELECT statements into a different form that gives me the same results, but is also easier to translate to LINQ.
Any ideas, anyone?
Thanks.
After some hours of pondering, I found the following sql to be an acceptable refactoring that can easily be translated to a linq query:
select *
from #entity e
left join #Person p
on 1 = 1
where p.[ABA] = e.Respondent
or p.[ABA] = e.[TierRespondent]
and p.ABA is not null
And here is the corresponding linq query:
from entity in entities
join p in persons on 1 equals 1 into groupJoin
from person in groupJoin.DefaultIfEmpty()
where person.ABA == entity.Respondent || person.ABA == entity.TierRespondent
&& person.ABA != null
select new
{
entity.Name,
entity.Respondent,
entity.TierRespondent,
person?.Name,
person?.ABA
}
The select statements of the original post contained logic within the join clause. This logic cannot be expressed within a linq join clasue. This refactoring moves the logic from the join, and into the where clause, which can easily be handled expressed in linq.

LINQ and Entity Framework - Avoiding subqueries

I'm having really hard time tuning up one of my Entity Framework generated queries in my application. It is very basic query but for some reason EF uses multiple inner subqueries which seem to perform horribly in DB instead of using joins.
Here's my LINQ code:
Projects.Select(proj => new ProjectViewModel()
{
Name = proj.Name,
Id = proj.Id,
Total = proj.Subvalue.Where(subv =>
subv.Created >= startDate
&& subv.Created <= endDate
&&
(subv.StatusId == 1 ||
subv.StatusId == 2))
.Select(c => c.SubValueSum)
.DefaultIfEmpty()
.Sum()
})
.OrderByDescending(c => c.Total)
.Take(10);
EF generates really complex query with multiple subqueries which has awful query performance like this:
SELECT TOP (10)
[Project3].[Id] AS [Id],
[Project3].[Name] AS [Name],
[Project3].[C1] AS [C1]
FROM ( SELECT
[Project2].[Id] AS [Id],
[Project2].[Name] AS [Name],
[Project2].[C1] AS [C1]
FROM ( SELECT
[Extent1].[Id] AS [Id],
[Extent1].[Name] AS [Name],
(SELECT
SUM([Join1].[A1]) AS [A1]
FROM ( SELECT
CASE WHEN ([Project1].[C1] IS NULL) THEN cast(0 as decimal(18)) ELSE [Project1].[SubValueSum] END AS [A1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable1]
LEFT OUTER JOIN (SELECT
[Extent2].[SubValueSum] AS [SubValueSum],
cast(1 as tinyint) AS [C1]
FROM [dbo].[Subvalue] AS [Extent2]
WHERE ([Extent1].[Id] = [Extent2].[Id]) AND ([Extent2].[Created] >= '2015-08-01') AND ([Extent2].[Created] <= '2015-10-01') AND ([Extent2].[StatusId] IN (1,2)) ) AS [Project1] ON 1 = 1
) AS [Join1]) AS [C1]
FROM [dbo].[Project] AS [Extent1]
WHERE ([Extent1].[ProjectCountryId] = 77) AND ([Extent1].[Active] = 1)
) AS [Project2]
) AS [Project3]
ORDER BY [Project3].[C1] DESC;
The execution time of the query generated by EF is ~10 seconds. But when I write the query by hand like this:
select
TOP (10)
Proj.Id,
Proj.Name,
SUM(Subv.SubValueSum) AS Total
from
SubValue as Subv
left join
Project as Proj on Proj.Id = Subv.ProjectId
where
Subv.Created > '2015-08-01' AND Subv.Created <= '2015-10-01' AND Subv.StatusId IN (1,2)
group by
Proj.Id,
Proj.Name
order by
Total DESC
The execution time is near instant; below 30ms.
The problem clearly lies in my ability to write good EF queries with LINQ but no matter what I try to do (using Linqpad for testing) I just can't write similar performant query with LINQ\EF as I can write by hand. I've trie querying the SubValue table and Project table but the endcome is mostly the same: multiple ineffective nested subqueries instead of a single join doing the work.
How can I write a query which imitates the hand written SQL shown above? How can I control the actual query generated by EF? And most importantly: how can I get Linq2SQL and Entity Framework to use Joins when I want to instead of nested subqueries.
EF generates SQL from the LINQ expression you provide and you cannot expect this conversion to completely unravel the structure of whatever you put into the expression in order to optimize it. In your case you have created an expression tree that for each project will use a navigation property to sum some subvalues related to the project. This results in nested subqueries as you have discovered.
To improve on the generated SQL you need to avoid navigating from project to subvalue before doing all the operations on subvalue and you can do this by creating a join (which is also what you do in you hand crafted SQL):
var query = from proj in context.Project
join s in context.SubValue.Where(s => s.Created >= startDate && s.Created <= endDate && (s.StatusId == 1 || s.StatusId == 2)) on proj.Id equals s.ProjectId into s2
from subv in s2.DefaultIfEmpty()
select new { proj, subv } into x
group x by new { x.proj.Id, x.proj.Name } into g
select new {
g.Key.Id,
g.Key.Name,
Total = g.Select(y => y.subv.SubValueSum).Sum()
} into y
orderby y.Total descending
select y;
var result = query.Take(10);
The basic idea is to join projects on subvalues restricted by a where clause. To perform a left join you need the DefaultIfEmpty() but you already know that.
The joined values (x) are then grouped and the summation of SubValueSum is performed in each group.
Finally, ordering and TOP(10) is applied.
The generated SQL still contains subqueries but I would expect it to more efficient compared to SQL generated by your query:
SELECT TOP (10)
[Project1].[Id] AS [Id],
[Project1].[Name] AS [Name],
[Project1].[C1] AS [C1]
FROM ( SELECT
[GroupBy1].[A1] AS [C1],
[GroupBy1].[K1] AS [Id],
[GroupBy1].[K2] AS [Name]
FROM ( SELECT
[Extent1].[Id] AS [K1],
[Extent1].[Name] AS [K2],
SUM([Extent2].[SubValueSum]) AS [A1]
FROM [dbo].[Project] AS [Extent1]
LEFT OUTER JOIN [dbo].[SubValue] AS [Extent2] ON ([Extent2].[Created] >= #p__linq__0) AND ([Extent2].[Created] <= #p__linq__1) AND ([Extent2].[StatusId] IN (1,2)) AND ([Extent1].[Id] = [Extent2].[ProjectId])
GROUP BY [Extent1].[Id], [Extent1].[Name]
) AS [GroupBy1]
) AS [Project1]
ORDER BY [Project1].[C1] DESC

LINQ to SQL, multiple table join, generated SQL missing 2nd INNER JOIN

Can anyone tell me why the generated SQL does not contain a 2nd INNER JOIN? It seems to have been replaced with a NULL check in the WHERE clause? I'm not clear on why the 2nd INNER JOIN is not in the generated SQL.
C# code:
var cycleList = from o in entities.Orders
join c in entities.Cycles on o.Id equals c.OrderId
join calendar in entities.Calendars on c.CalendarId equals calendar.Id
where o.UnitId == unitId && o.CompanyId == companyId
select c.Id;
Generated SQL:
SELECT
[Extent2].[Id] AS [Id]
FROM [dbo].[Orders] AS [Extent1]
INNER JOIN [dbo].[Cycles] AS [Extent2] ON [Extent1].[Id] = [Extent2].[OrderId]
WHERE ([Extent2].[CalendarId] IS NOT NULL) AND ( CAST( [Extent1].[UnitId] AS int) = #p__linq__0) AND ( CAST( [Extent1].[CompanyId] AS int) = #p__linq__1)
It looks like the query generator is optimizing your query.
Since you are not selecting (or using in your where clause) any fields from the Calendars table in your query, only one join is done between the Orders table and the Cycles table. It's likely faster to check for the non-NULL foreign key than it is to join on a table from which no fields will be used.

Using Linq to Entities and havign a NOT IN clause

I have a SQL query that I am trying to convert to LINQ:
SELECT * FROM TABLE1
WHERE LICENSE_RTK NOT IN(
SELECT KEY_VALUE FROM TABLE2
WHERE REFERENCE_RTK = 'FOO')
So I wrote one query for inner query and then one query for the outer one and used Except:
var insideQuery = (from pkcr in this.Repository.Context.TABLE2 where pkcr.Reference_RTK == "FOO" select pkcr.Key_Value);
var outerQuery = (from pl in this.Repository.Context.TABLE1 select pl).Except(insideQuery);
But this is wrong. Cannot even compile it. What is the correct way of writing this?
You cannot compile second query, because Except should be used on Queryables of same type. But you are trying to apply it on Queryable<TABLE1> and Queryable<TypeOfTABLE2Key_Value>. Also I think you should use Contains here:
var keys = from pkcr in this.Repository.Context.TABLE2
where pkcr.Reference_RTK == "FOO"
select pkcr.Key_Value;
var query = from pl in this.Repository.Context.TABLE1
where !keys.Contains(pl.License_RTK)
select pl;
NOTE: Generated query will be NOT EXISTS instead of NOT IN, but that's what you want
SELECT * FROM FROM [dbo].[TABLE1] AS [Extent1]
WHERE NOT EXISTS
(SELECT 1 AS [C1]
FROM [dbo].[TABLE2] AS [Extent2]
WHERE ([Extent2].[Reference_RTK] == #p0) AND
([Extent2].[Key_Value] = [Extent1].[License_RTK]))

Linq Join with a Group By

Ok, I am trying to replicate the following SQL query into a Linq expression:
SELECT
I.EmployeeNumber,
E.TITLE,
E.FNAM,
E.LNAM
FROM
Incidents I INNER JOIN Employees E ON I.IncidentEmployee = E.EmployeeNumber
GROUP BY
I.EmployeeNumber,
E.TITLE,
E.FNAM,
E.LNAM
Simple enough (or at least I thought):
var query = (from e in contextDB.Employees
join i in contextDB.Incidents on i.IncidentEmployee = e.EmployeeNumber
group e by new { i.IncidentEmployee, e.TITLE, e.FNAM, e.LNAM } into allIncEmps
select new
{
IncEmpNum = allIncEmps.Key.IncidentEmployee
TITLE = allIncEmps.Key.TITLE,
USERFNAM = allIncEmps.Key.FNAM,
USERLNAM = allIncEmps.Key.LNAM
});
But I am not getting back the results I exprected, so I fire up SQL Profiler to see what is being sent down the pipe to SQL Server and this is what I see:
SELECT
[GroupBy1].[A1] AS [C1]
FROM ( SELECT
COUNT(1) AS [A1]
FROM ( SELECT DISTINCT
[Extent2].[IncidentEmployee] AS [IncidentEmployee],
[Extent1].[TITLE] AS [TITLE],
[Extent1].[FNAM] AS [FNAM],
[Extent1].[LNAM] AS [LNAM]
FROM [dbo].[Employees] AS [Extent1]
INNER JOIN [dbo].[INCIDENTS] AS [Extent2] ON ([Extent1].[EmployeeNumber] = [Extent2].[IncidentEmployee]) OR (([Extent1].[EmployeeNumber] IS NULL) AND ([Extent2].[IncidentEmployee] IS NULL))
) AS [Distinct1]
) AS [GroupBy1]
As you can see from the SQL string that was sent toSQL Server none of the fields that I was expecting to be return are being included in the Select clause. What am I doing wrong?
UPDATE
It has been a very long day, I re-ran the code again and now this is the SQL that is being sent down the pipe:
SELECT
[Distinct1].[IncidentEmployee] AS [IncidentEmployee],
[Distinct1].[TITLE] AS [TITLE],
[Distinct1].[FNAM] AS [FNAM],
[Distinct1].[LNAM] AS [LNAM]
FROM ( SELECT DISTINCT
[Extent1].[OFFNUM] AS [OFFNUM],
[Extent1].[TITLE] AS [TITLE],
[Extent1].[FNAM] AS [FNAM],
[Extent1].[LNAM] AS [LNAM]
FROM [dbo].[Employees] AS [Extent1]
INNER JOIN [dbo].[INCIDENTS] AS [Extent2] ON ([Extent1].[EmployeeNumber] = [Extent2].[IncidentEmployee]) OR (([Extent1].[EmployeeNumber] IS NULL) AND ([Extent2].[IncidentEmployee] IS NULL))
) AS [Distinct1]
But I am still not seeing results when I try to loop through the record set
foreach (var emps in query)
{
}
Not sure why the query does not return what it should return, but it occurred to me that since you only query the group key and not any grouped results you've got nothing but a Distinct():
var query =
(from e in contextDB.Employees
join i in contextDB.Incidents on i.IncidentEmployee equals e.EmployeeNumber
select new
{
IncEmpNum = i.IncidentEmployee
TITLE = e.TITLE,
USERFNAM = e.FNAM,
USERLNAM = e.LNAM
}).Distinct();
But EF was smart enough to see this as well and created a DISTINCT query too.
You don't specify which result you expected and in what way the actual result was different, but I really can't see how the grouping can produce a different result than a Distinct.
But how did your code compile? As xeondev noticed: there should be an equals in stead of an = in a join statement. My compiler (:D) does not swallow it otherwise. The generated SQL join is strange too: it also matches records where both joined values are NULL. This makes me suspect that at least one of your keys (i.IncidentEmployee or e.EmployeeNumber) is nullable and you should either use i.IncidentEmployee.Value or e.EmployeeNumber.Value or both.

Categories

Resources