Linq Join with a Group By

Linq Join with a Group By - c#

Ok, I am trying to replicate the following SQL query into a Linq expression:
SELECT
I.EmployeeNumber,
E.TITLE,
E.FNAM,
E.LNAM
FROM
Incidents I INNER JOIN Employees E ON I.IncidentEmployee = E.EmployeeNumber
GROUP BY
I.EmployeeNumber,
E.TITLE,
E.FNAM,
E.LNAM
Simple enough (or at least I thought):
var query = (from e in contextDB.Employees
join i in contextDB.Incidents on i.IncidentEmployee = e.EmployeeNumber
group e by new { i.IncidentEmployee, e.TITLE, e.FNAM, e.LNAM } into allIncEmps
select new
{
IncEmpNum = allIncEmps.Key.IncidentEmployee
TITLE = allIncEmps.Key.TITLE,
USERFNAM = allIncEmps.Key.FNAM,
USERLNAM = allIncEmps.Key.LNAM
});
But I am not getting back the results I exprected, so I fire up SQL Profiler to see what is being sent down the pipe to SQL Server and this is what I see:
SELECT
[GroupBy1].[A1] AS [C1]
FROM ( SELECT
COUNT(1) AS [A1]
FROM ( SELECT DISTINCT
[Extent2].[IncidentEmployee] AS [IncidentEmployee],
[Extent1].[TITLE] AS [TITLE],
[Extent1].[FNAM] AS [FNAM],
[Extent1].[LNAM] AS [LNAM]
FROM [dbo].[Employees] AS [Extent1]
INNER JOIN [dbo].[INCIDENTS] AS [Extent2] ON ([Extent1].[EmployeeNumber] = [Extent2].[IncidentEmployee]) OR (([Extent1].[EmployeeNumber] IS NULL) AND ([Extent2].[IncidentEmployee] IS NULL))
) AS [Distinct1]
) AS [GroupBy1]
As you can see from the SQL string that was sent toSQL Server none of the fields that I was expecting to be return are being included in the Select clause. What am I doing wrong?
UPDATE
It has been a very long day, I re-ran the code again and now this is the SQL that is being sent down the pipe:
SELECT
[Distinct1].[IncidentEmployee] AS [IncidentEmployee],
[Distinct1].[TITLE] AS [TITLE],
[Distinct1].[FNAM] AS [FNAM],
[Distinct1].[LNAM] AS [LNAM]
FROM ( SELECT DISTINCT
[Extent1].[OFFNUM] AS [OFFNUM],
[Extent1].[TITLE] AS [TITLE],
[Extent1].[FNAM] AS [FNAM],
[Extent1].[LNAM] AS [LNAM]
FROM [dbo].[Employees] AS [Extent1]
INNER JOIN [dbo].[INCIDENTS] AS [Extent2] ON ([Extent1].[EmployeeNumber] = [Extent2].[IncidentEmployee]) OR (([Extent1].[EmployeeNumber] IS NULL) AND ([Extent2].[IncidentEmployee] IS NULL))
) AS [Distinct1]
But I am still not seeing results when I try to loop through the record set
foreach (var emps in query)
{
}

Not sure why the query does not return what it should return, but it occurred to me that since you only query the group key and not any grouped results you've got nothing but a Distinct():
var query =
(from e in contextDB.Employees
join i in contextDB.Incidents on i.IncidentEmployee equals e.EmployeeNumber
select new
{
IncEmpNum = i.IncidentEmployee
TITLE = e.TITLE,
USERFNAM = e.FNAM,
USERLNAM = e.LNAM
}).Distinct();
But EF was smart enough to see this as well and created a DISTINCT query too.
You don't specify which result you expected and in what way the actual result was different, but I really can't see how the grouping can produce a different result than a Distinct.
But how did your code compile? As xeondev noticed: there should be an equals in stead of an = in a join statement. My compiler (:D) does not swallow it otherwise. The generated SQL join is strange too: it also matches records where both joined values are NULL. This makes me suspect that at least one of your keys (i.IncidentEmployee or e.EmployeeNumber) is nullable and you should either use i.IncidentEmployee.Value or e.EmployeeNumber.Value or both.

Related

LINQ generating Nested/Sub queries

I am using Asp.NET & Entity Framework with SQL Server as Database, somehow I am getting this strange issue
I have this code:
var pricingInfo = (from price in invDB.Pricing.AsNoTracking()
join priceD in invDB.PricingDetail.AsNoTracking() on price.PricingId equals priceDtl.PricingId
join tagD in invDB.PricingTagDetail.AsNoTracking() on priceDtl.PricingDetailId equals tagDtl.PricingDetailId
join it in invDB.Item.AsNoTracking() on tagDtl.ItemId equals item.ItemId
join par in invDB.Party.AsNoTracking() on tagDtl.PartyId equals party.PartyId
join b in invDB.Brand.AsNoTracking() on tagDtl.BrandId equals brd.BrandId into t from brand in t.DefaultIfEmpty()
where tagDtl.AvailableQuantity > 0m && price.PricingNo == printNumber
select new
{
TagNo = tagDtl.TagNo,
SellingRate = tagDtl.SellingRate,
Quantity = tagDtl.AvailableQuantity ?? 0m,
ItemCode = item.Name,
UOMId = priceDtl.UOMId,
Brand = brand.BrandCode,
Supplier = party.PartyCode,
Offer = tagDtl.Offer
}).ToList();
Which generates the below sql query with a sub query, without where condition and it pulls out full records from a large volume data. This results to a heavy memory consumption and performance issues.
SELECT
[Filter1].[PricingId1] AS [PricingId],
[Filter1].[TagNo] AS [TagNo],
[Filter1].[SellingRate1] AS [SellingRate],
CASE WHEN ([Filter1].[AvailableQuantity] IS NULL) THEN cast(0 as decimal(18)) ELSE [Filter1].[AvailableQuantity] END AS [C1],
[Filter1].[Name] AS [Name],
[Filter1].[UOMId 1] AS [UOMId ],
[Extent6].[BrandCode] AS [BrandCode],
[Filter1].[PartyCode] AS [PartyCode],
[Filter1].[Offer] AS [Offer]
FROM
(
SELECT [Extent1].[PricingId] AS [PricingId1], [Extent1].[PricingNo] AS [PricingNo], [Extent2].[UnitOfMeasurementId] AS [UnitOfMeasurementId1], [Extent3].[TagNo] AS [TagNo], [Extent3].[BrandId] AS [BrandId1], [Extent3].[SellingRate] AS [SellingRate1], [Extent3].[AvailableQuantity] AS [AvailableQuantity], [Extent3].[Offer] AS [Offer], [Extent4].[Name] AS [Name], [Extent5].[PartyCode] AS [PartyCode]
FROM [PanERP].[Pricing] AS [Extent1]
INNER JOIN [PanERP].[PricingDetail] AS [Extent2] ON [Extent1].[PricingId] = [Extent2].[PricingId]
INNER JOIN [PanERP].[PricingTagDetail] AS [Extent3] ON [Extent2].[PricingDetailId] = [Extent3].[PricingDetailId]
INNER JOIN [PanERP].[Item] AS [Extent4] ON [Extent3].[ItemId] = [Extent4].[ItemId]
INNER JOIN [PanERP].[Party] AS [Extent5] ON [Extent3].[PartyId] = [Extent5].[PartyId]
WHERE [Extent3].[AvailableQuantity] > cast(0 as decimal(18))
) AS [Filter1]
LEFT OUTER JOIN [PanERP].[Brand] AS [Extent6] ON [Filter1].[BrandId1] = [Extent6].[BrandId]
WHERE ([Filter1].[PricingNo] = #p__linq__0) OR (([Filter1].[PricingNo] IS NULL) AND (#p__linq__0 IS NULL))
But When i change the condition
where tagDtl.AvailableQuantity > 0m
as a variable it creates another SQL query without nested select statement.
Here is the modified code
decimal availableQuantity = 0m;
var pricingInfo = (from price in invDB.Pricing.AsNoTracking()
join priceD in invDB.PricingDetail.AsNoTracking() on price.PricingId equals priceDtl.PricingId
join tagD in invDB.PricingTagDetail.AsNoTracking() on priceDtl.PricingDetailId equals tagDtl.PricingDetailId
join it in invDB.Item.AsNoTracking() on tagDtl.ItemId equals item.ItemId
join par in invDB.Party.AsNoTracking() on tagDtl.PartyId equals party.PartyId
join b in invDB.Brand.AsNoTracking() on tagDtl.BrandId equals brd.BrandId into t from brand in t.DefaultIfEmpty()
where tagDtl.AvailableQuantity > availableQuantity && price.PricingNo == printNumber
select new
{
TagNo = tagDtl.TagNo,
SellingRate = tagDtl.SellingRate,
Quantity = tagDtl.AvailableQuantity ?? availableQuantity,
ItemCode = item.Name,
UOMId = priceDtl.UOMId,
Brand = brand.BrandCode,
Supplier = party.PartyCode,
Offer = tagDtl.Offer
}).ToList();
and here is the SQL query without nested SQL statement.
SELECT
[Extent1].[PricingId] AS [PricingId],
[Extent3].[TagNo] AS [TagNo],
[Extent3].[SellingRate] AS [SellingRate],
CASE WHEN ([Extent3].[AvailableQuantity] IS NULL) THEN cast(0 as decimal(18)) ELSE [Extent3].[AvailableQuantity] END AS [C1],
[Extent4].[Name] AS [Name],
[Extent2].[UOMId ] AS [UOMId ],
[Extent6].[BrandCode] AS [BrandCode],
[Extent5].[PartyCode] AS [PartyCode],
[Extent3].[Offer] AS [Offer]
FROM [PanERP].[Pricing] AS [Extent1]
INNER JOIN [PanERP].[PricingDetail] AS [Extent2] ON [Extent1].[PricingId] = [Extent2].[PricingId]
INNER JOIN [PanERP].[PricingTagDetail] AS [Extent3] ON [Extent2].[PricingDetailId] = [Extent3].[PricingDetailId]
INNER JOIN [PanERP].[Item] AS [Extent4] ON [Extent3].[ItemId] = [Extent4].[ItemId]
INNER JOIN [PanERP].[Party] AS [Extent5] ON [Extent3].[PartyId] = [Extent5].[PartyId]
LEFT OUTER JOIN [PanERP].[Brand] AS [Extent6] ON [Extent3].[BrandId] = [Extent6].[BrandId]
WHERE ([Extent3].[AvailableQuantity] > #p__linq__0) AND (([Extent1].[PricingNo] = #p__linq__1) OR (([Extent1].[PricingNo] IS NULL) AND (#p__linq__1 IS NULL)))
If I move the where condition to the model definition as lambda expression, like this
from price in inventoryDb.Pricing.AsNoTracking().Where(c =>
c.PricingNo == printNumber))
then also it works fine.
Why is LINQ generating a nested Select? How can we avoid this?
Thanks in advance for your answers.

Well, I think you have answered your own question, on your comments. I will just try to clarify what is going on.
When you use a hard-coded constant, like 0m, the framework translates it into SQL keeping the value as a constant:
WHERE [Extent3].[AvailableQuantity] > cast(0 as decimal(18))
When you use a local variable, like “availableQuantity”, the framework creates a parameter:
([Extent3].[AvailableQuantity] > #p__linq__0)
I might be wrong, but, as I see, this is done in order to preserve the programmer’s goal when writing the code (constant = constant, variable = parameter).
And what about the subquery?
This is a query optimization logic (a bad one, probably, at least on this scenario). When you make a query using parameters, you might run it several times, but SQL Server will always use the same execution plan, making the query faster; when you use constants, each query need to be reevaluated (if you check SQL Server Activity Monitor, you will see that queries with parameters are treated as the same query, regardless the parameters values).
This way, in my opinion (sorry, I could not find any documentation about it), Entity Framework is trying to isolate the queries; the outer/generic one, that use parameters, and the inner/specific one, that use constants.
I would be happy if anyone could complement it with some Microsoft documentation about this subject…

Entity Framework v6.1 query compilation performance

I am confused how EF LINQ queries are compiled and executed. When I run a piece of program in LINQPad couple of times, I get varied performance results (each time the same query takes different amount of time). Please find below my test execution environment.
tools used: EF v6.1 & LINQPad v5.08.
Ref DB : ContosoUniversity DB downloaded from MSDN.
For queries, I am using Persons, Courses & Departments tables from the above DB; see below.
Now, I have below data:
Query goal: get the second person and associated departments.
Query:
var test = (
from p in Persons
join d in Departments on p.ID equals d.InstructorID
select new {
person = p,
dept = d
}
);
var result = (from pd in test
group pd by pd.person.ID into grp
orderby grp.Key
select new {
ID = grp.Key,
FirstName = grp.First().person.FirstName,
Deps = grp.Where(x => x.dept != null).Select(x => x.dept).Distinct().ToList()
}).Skip(1).Take(1).ToList();
foreach(var r in result)
{
Console.WriteLine("person is..." + r.FirstName);
Console.WriteLine(r.FirstName + "' deps are...");
foreach(var d in r.Deps){
Console.WriteLine(d.Name);
}
}
When I run this I get the result and LINQPad shows time taken value from 3.515 sec to 0.004 sec (depending how much gap I take between different runs).
If I take the generated SQL query and execute it, that query always runs between 0.015 sec to 0.001sec.
Generated query:
-- Region Parameters
DECLARE #p0 Int = 1
DECLARE #p1 Int = 1
-- EndRegion
SELECT [t7].[ID], [t7].[value] AS [FirstName]
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY [t6].[ID]) AS [ROW_NUMBER], [t6].[ID], [t6].[value]
FROM (
SELECT [t2].[ID], (
SELECT [t5].[FirstName]
FROM (
SELECT TOP (1) [t3].[FirstName]
FROM [Person] AS [t3]
INNER JOIN [Department] AS [t4] ON ([t3].[ID]) = [t4]. [InstructorID]
WHERE [t2].[ID] = [t3].[ID]
) AS [t5]
) AS [value]
FROM (
SELECT [t0].[ID]
FROM [Person] AS [t0]
INNER JOIN [Department] AS [t1] ON ([t0].[ID]) = [t1].[InstructorID]
GROUP BY [t0].[ID]
) AS [t2]
) AS [t6]
) AS [t7]
WHERE [t7].[ROW_NUMBER] BETWEEN #p0 + 1 AND #p0 + #p1
ORDER BY [t7].[ROW_NUMBER]
GO
-- Region Parameters
DECLARE #x1 Int = 2
-- EndRegion
SELECT DISTINCT [t1].[DepartmentID], [t1].[Name], [t1].[Budget], [t1]. [StartDate], [t1].[InstructorID], [t1].[RowVersion]
FROM [Person] AS [t0]
INNER JOIN [Department] AS [t1] ON ([t0].[ID]) = [t1].[InstructorID]
WHERE #x1 = [t0].[ID]
My questions:
1) Are those LINQ statements correct? Or can they be optimized?
2) Is the time difference for LINQ query execution normal?
Another different question:
I have modified the first query to execute immediately (called ToList before the second query). This time generated SQL is very simple as shown below (it doesn't look like there is a SQL query for the first LINQ statement with ToList() included):
SELECT [t0].[ID], [t0].[LastName], [t0].[FirstName], [t0].[HireDate], [t0]. [EnrollmentDate], [t0].[Discriminator], [t1].[DepartmentID], [t1].[Name], [t1]. [Budget], [t1].[StartDate], [t1].[InstructorID], [t1].[RowVersion]
FROM [Person] AS [t0]
INNER JOIN [Department] AS [t1] ON ([t0].[ID]) = [t1].[InstructorID]
Running this modified query also took varied amount of time but the difference is not as big as the first query set run.
In my application, there going to be lot of rows and I prefer first query set to second one but I am confused.
Please guide.
(Note: I have a little SQL Server knowledge so, I am using LINQPad to fine tune queries based on the performance)
Thanks

LINQ to SQL, multiple table join, generated SQL missing 2nd INNER JOIN

Can anyone tell me why the generated SQL does not contain a 2nd INNER JOIN? It seems to have been replaced with a NULL check in the WHERE clause? I'm not clear on why the 2nd INNER JOIN is not in the generated SQL.
C# code:
var cycleList = from o in entities.Orders
join c in entities.Cycles on o.Id equals c.OrderId
join calendar in entities.Calendars on c.CalendarId equals calendar.Id
where o.UnitId == unitId && o.CompanyId == companyId
select c.Id;
Generated SQL:
SELECT
[Extent2].[Id] AS [Id]
FROM [dbo].[Orders] AS [Extent1]
INNER JOIN [dbo].[Cycles] AS [Extent2] ON [Extent1].[Id] = [Extent2].[OrderId]
WHERE ([Extent2].[CalendarId] IS NOT NULL) AND ( CAST( [Extent1].[UnitId] AS int) = #p__linq__0) AND ( CAST( [Extent1].[CompanyId] AS int) = #p__linq__1)

It looks like the query generator is optimizing your query.
Since you are not selecting (or using in your where clause) any fields from the Calendars table in your query, only one join is done between the Orders table and the Cycles table. It's likely faster to check for the non-NULL foreign key than it is to join on a table from which no fields will be used.

Using Linq to Entities and havign a NOT IN clause

I have a SQL query that I am trying to convert to LINQ:
SELECT * FROM TABLE1
WHERE LICENSE_RTK NOT IN(
SELECT KEY_VALUE FROM TABLE2
WHERE REFERENCE_RTK = 'FOO')
So I wrote one query for inner query and then one query for the outer one and used Except:
var insideQuery = (from pkcr in this.Repository.Context.TABLE2 where pkcr.Reference_RTK == "FOO" select pkcr.Key_Value);
var outerQuery = (from pl in this.Repository.Context.TABLE1 select pl).Except(insideQuery);
But this is wrong. Cannot even compile it. What is the correct way of writing this?

You cannot compile second query, because Except should be used on Queryables of same type. But you are trying to apply it on Queryable<TABLE1> and Queryable<TypeOfTABLE2Key_Value>. Also I think you should use Contains here:
var keys = from pkcr in this.Repository.Context.TABLE2
where pkcr.Reference_RTK == "FOO"
select pkcr.Key_Value;
var query = from pl in this.Repository.Context.TABLE1
where !keys.Contains(pl.License_RTK)
select pl;
NOTE: Generated query will be NOT EXISTS instead of NOT IN, but that's what you want
SELECT * FROM FROM [dbo].[TABLE1] AS [Extent1]
WHERE NOT EXISTS
(SELECT 1 AS [C1]
FROM [dbo].[TABLE2] AS [Extent2]
WHERE ([Extent2].[Reference_RTK] == #p0) AND
([Extent2].[Key_Value] = [Extent1].[License_RTK]))

Convert SQL Sub Query to In to Linq Lambda

How do I convert the following SQL statement into Lambda Expression or Linq Query?
The following query get the single most recent Answer for each Question. Or to phrase it another way, get each Question with the newest Answer.
Also this will be execute by Entity Framework.
SELECT Answers.*
FROM Answers
Where AnswerID IN
(
SELECT Max(AnswerID) AnswerID
FROM Answers
GROUP BY QuestionID
)
Here another way to look at the previous query using an Inner Join
SELECT answers.*
FROM answers
INNER JOIN
(
SELECT Max(answerID) answerID --, QuestionSiteID
FROM answers
GROUP BY QuestionID
) t ON
answers.answerID = t.answerID
I have read that the LINQ Contains method is sub optimal for queries that access SQL.
LINQ to Sql and .Contains() trap.

I think you could do this using something like:
var subQuery = from a in answers
group a by a.QuestionID into grouping
select new
{
QuestionID = grouping.Key,
MaxAnswerID = grouping.Max(x => x.AnswerID)
};
var query = from a in answers
from s in subQuery
where a.AnswerID == s.MaxAnswerID
select a;
This results in a CROSS JOIN in the generated SQL
Also, you could use join in the second part of the query:
var query = from a in answers
join s in subQuery on a.AnswerID equals s.MaxAnswerID
select a;
This results in a INNER JOIN in the SQL
Note for side cases - the above answers make the reasonable assumption that AnswerID is the primary key of Answers - if you happen to have instead a table design which is keyed on (AnswerID, QuestionID) then you will need to join by both AnswerID and QuestionID like:
var subQuery = from a in answers
group a by a.QuestionID into grouping
select new
{
QuestionID = grouping.Key,
MaxAnswerID = grouping.Max(x => x.AnswerID)
};
var query = from a in answers
from s in subQuery
where a.AnswerID == s.MaxAnswerID
&& a.QuestionID == s.QuestionID
select a;
See the comment trail for more discussion on this alternate table design...

You could use a let statement to select the first answer per QuestionID group:
from answer in Answers
group answer by answer.QuestionID into question
let firstAnswer = question.OrderByDescending(q => q.AnswerID).First()
select firstAnswer
EDIT: Linq2Sql translates the above query into a N+1 database calls. This query gets translated to just one SQL query:
from a in Answers
group a by a.QuestionID into grouping
join a2 in Answers on
new {AnswerID = grouping.Max(x => x.AnswerID), QuestionID = grouping.Key}
equals new {a2.AnswerID, a2.QuestionID}
select a2
Makes me wonder in what way Linq2Sql is supposed to be simpler than SQL.

Try to use this query:
var query = from c in context.Childs
group c by c.ParentEntityId into pc
select pc.OrderByDescending(pcc => pcc.Id).Take(1);
I just checked the query in profiler and it produces single SQL query (the ugly one):
SELECT
[Project3].[ParentEntityId] AS [ParentEntityId],
[Project3].[C1] AS [C1],
[Project3].[Id] AS [Id],
[Project3].[Name] AS [Name],
[Project3].[ParentEntityId1] AS [ParentEntityId1]
FROM ( SELECT
[Distinct1].[ParentEntityId] AS [ParentEntityId],
[Limit1].[Id] AS [Id],
[Limit1].[Name] AS [Name],
[Limit1].[ParentEntityId] AS [ParentEntityId1],
CASE WHEN ([Limit1].[Id] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C1]
FROM (SELECT DISTINCT
[Extent1].[ParentEntityId] AS [ParentEntityId]
FROM [dbo].[ChildEntities] AS [Extent1] ) AS [Distinct1]
OUTER APPLY (SELECT TOP (1) [Project2].[Id] AS [Id], [Project2].[Name] AS [Name], [Project2].[ParentEntityId] AS [ParentEntityId]
FROM ( SELECT
[Extent2].[Id] AS [Id],
[Extent2].[Name] AS [Name],
[Extent2].[ParentEntityId] AS [ParentEntityId]
FROM [dbo].[ChildEntities] AS [Extent2]
WHERE ([Distinct1].[ParentEntityId] = [Extent2].[ParentEntityId]) OR (([Distinct1].[ParentEntityId] IS NULL) AND ([Extent2].[ParentEntityId] IS NULL))
) AS [Project2]
ORDER BY [Project2].[Id] DESC ) AS [Limit1]
) AS [Project3]
ORDER BY [Project3].[ParentEntityId] ASC, [Project3].[C1] ASC

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Linq Join with a Group By - c#

Related

LINQ generating Nested/Sub queries

Entity Framework v6.1 query compilation performance

LINQ to SQL, multiple table join, generated SQL missing 2nd INNER JOIN

Using Linq to Entities and havign a NOT IN clause

Convert SQL Sub Query to In to Linq Lambda

Categories

Resources