LINQ to SQL, multiple table join, generated SQL missing 2nd INNER JOIN

LINQ to SQL, multiple table join, generated SQL missing 2nd INNER JOIN - c#

Can anyone tell me why the generated SQL does not contain a 2nd INNER JOIN? It seems to have been replaced with a NULL check in the WHERE clause? I'm not clear on why the 2nd INNER JOIN is not in the generated SQL.
C# code:
var cycleList = from o in entities.Orders
join c in entities.Cycles on o.Id equals c.OrderId
join calendar in entities.Calendars on c.CalendarId equals calendar.Id
where o.UnitId == unitId && o.CompanyId == companyId
select c.Id;
Generated SQL:
SELECT
[Extent2].[Id] AS [Id]
FROM [dbo].[Orders] AS [Extent1]
INNER JOIN [dbo].[Cycles] AS [Extent2] ON [Extent1].[Id] = [Extent2].[OrderId]
WHERE ([Extent2].[CalendarId] IS NOT NULL) AND ( CAST( [Extent1].[UnitId] AS int) = #p__linq__0) AND ( CAST( [Extent1].[CompanyId] AS int) = #p__linq__1)

It looks like the query generator is optimizing your query.
Since you are not selecting (or using in your where clause) any fields from the Calendars table in your query, only one join is done between the Orders table and the Cycles table. It's likely faster to check for the non-NULL foreign key than it is to join on a table from which no fields will be used.

Related

LINQ generating Nested/Sub queries

I am using Asp.NET & Entity Framework with SQL Server as Database, somehow I am getting this strange issue
I have this code:
var pricingInfo = (from price in invDB.Pricing.AsNoTracking()
join priceD in invDB.PricingDetail.AsNoTracking() on price.PricingId equals priceDtl.PricingId
join tagD in invDB.PricingTagDetail.AsNoTracking() on priceDtl.PricingDetailId equals tagDtl.PricingDetailId
join it in invDB.Item.AsNoTracking() on tagDtl.ItemId equals item.ItemId
join par in invDB.Party.AsNoTracking() on tagDtl.PartyId equals party.PartyId
join b in invDB.Brand.AsNoTracking() on tagDtl.BrandId equals brd.BrandId into t from brand in t.DefaultIfEmpty()
where tagDtl.AvailableQuantity > 0m && price.PricingNo == printNumber
select new
{
TagNo = tagDtl.TagNo,
SellingRate = tagDtl.SellingRate,
Quantity = tagDtl.AvailableQuantity ?? 0m,
ItemCode = item.Name,
UOMId = priceDtl.UOMId,
Brand = brand.BrandCode,
Supplier = party.PartyCode,
Offer = tagDtl.Offer
}).ToList();
Which generates the below sql query with a sub query, without where condition and it pulls out full records from a large volume data. This results to a heavy memory consumption and performance issues.
SELECT
[Filter1].[PricingId1] AS [PricingId],
[Filter1].[TagNo] AS [TagNo],
[Filter1].[SellingRate1] AS [SellingRate],
CASE WHEN ([Filter1].[AvailableQuantity] IS NULL) THEN cast(0 as decimal(18)) ELSE [Filter1].[AvailableQuantity] END AS [C1],
[Filter1].[Name] AS [Name],
[Filter1].[UOMId 1] AS [UOMId ],
[Extent6].[BrandCode] AS [BrandCode],
[Filter1].[PartyCode] AS [PartyCode],
[Filter1].[Offer] AS [Offer]
FROM
(
SELECT [Extent1].[PricingId] AS [PricingId1], [Extent1].[PricingNo] AS [PricingNo], [Extent2].[UnitOfMeasurementId] AS [UnitOfMeasurementId1], [Extent3].[TagNo] AS [TagNo], [Extent3].[BrandId] AS [BrandId1], [Extent3].[SellingRate] AS [SellingRate1], [Extent3].[AvailableQuantity] AS [AvailableQuantity], [Extent3].[Offer] AS [Offer], [Extent4].[Name] AS [Name], [Extent5].[PartyCode] AS [PartyCode]
FROM [PanERP].[Pricing] AS [Extent1]
INNER JOIN [PanERP].[PricingDetail] AS [Extent2] ON [Extent1].[PricingId] = [Extent2].[PricingId]
INNER JOIN [PanERP].[PricingTagDetail] AS [Extent3] ON [Extent2].[PricingDetailId] = [Extent3].[PricingDetailId]
INNER JOIN [PanERP].[Item] AS [Extent4] ON [Extent3].[ItemId] = [Extent4].[ItemId]
INNER JOIN [PanERP].[Party] AS [Extent5] ON [Extent3].[PartyId] = [Extent5].[PartyId]
WHERE [Extent3].[AvailableQuantity] > cast(0 as decimal(18))
) AS [Filter1]
LEFT OUTER JOIN [PanERP].[Brand] AS [Extent6] ON [Filter1].[BrandId1] = [Extent6].[BrandId]
WHERE ([Filter1].[PricingNo] = #p__linq__0) OR (([Filter1].[PricingNo] IS NULL) AND (#p__linq__0 IS NULL))
But When i change the condition
where tagDtl.AvailableQuantity > 0m
as a variable it creates another SQL query without nested select statement.
Here is the modified code
decimal availableQuantity = 0m;
var pricingInfo = (from price in invDB.Pricing.AsNoTracking()
join priceD in invDB.PricingDetail.AsNoTracking() on price.PricingId equals priceDtl.PricingId
join tagD in invDB.PricingTagDetail.AsNoTracking() on priceDtl.PricingDetailId equals tagDtl.PricingDetailId
join it in invDB.Item.AsNoTracking() on tagDtl.ItemId equals item.ItemId
join par in invDB.Party.AsNoTracking() on tagDtl.PartyId equals party.PartyId
join b in invDB.Brand.AsNoTracking() on tagDtl.BrandId equals brd.BrandId into t from brand in t.DefaultIfEmpty()
where tagDtl.AvailableQuantity > availableQuantity && price.PricingNo == printNumber
select new
{
TagNo = tagDtl.TagNo,
SellingRate = tagDtl.SellingRate,
Quantity = tagDtl.AvailableQuantity ?? availableQuantity,
ItemCode = item.Name,
UOMId = priceDtl.UOMId,
Brand = brand.BrandCode,
Supplier = party.PartyCode,
Offer = tagDtl.Offer
}).ToList();
and here is the SQL query without nested SQL statement.
SELECT
[Extent1].[PricingId] AS [PricingId],
[Extent3].[TagNo] AS [TagNo],
[Extent3].[SellingRate] AS [SellingRate],
CASE WHEN ([Extent3].[AvailableQuantity] IS NULL) THEN cast(0 as decimal(18)) ELSE [Extent3].[AvailableQuantity] END AS [C1],
[Extent4].[Name] AS [Name],
[Extent2].[UOMId ] AS [UOMId ],
[Extent6].[BrandCode] AS [BrandCode],
[Extent5].[PartyCode] AS [PartyCode],
[Extent3].[Offer] AS [Offer]
FROM [PanERP].[Pricing] AS [Extent1]
INNER JOIN [PanERP].[PricingDetail] AS [Extent2] ON [Extent1].[PricingId] = [Extent2].[PricingId]
INNER JOIN [PanERP].[PricingTagDetail] AS [Extent3] ON [Extent2].[PricingDetailId] = [Extent3].[PricingDetailId]
INNER JOIN [PanERP].[Item] AS [Extent4] ON [Extent3].[ItemId] = [Extent4].[ItemId]
INNER JOIN [PanERP].[Party] AS [Extent5] ON [Extent3].[PartyId] = [Extent5].[PartyId]
LEFT OUTER JOIN [PanERP].[Brand] AS [Extent6] ON [Extent3].[BrandId] = [Extent6].[BrandId]
WHERE ([Extent3].[AvailableQuantity] > #p__linq__0) AND (([Extent1].[PricingNo] = #p__linq__1) OR (([Extent1].[PricingNo] IS NULL) AND (#p__linq__1 IS NULL)))
If I move the where condition to the model definition as lambda expression, like this
from price in inventoryDb.Pricing.AsNoTracking().Where(c =>
c.PricingNo == printNumber))
then also it works fine.
Why is LINQ generating a nested Select? How can we avoid this?
Thanks in advance for your answers.

Well, I think you have answered your own question, on your comments. I will just try to clarify what is going on.
When you use a hard-coded constant, like 0m, the framework translates it into SQL keeping the value as a constant:
WHERE [Extent3].[AvailableQuantity] > cast(0 as decimal(18))
When you use a local variable, like “availableQuantity”, the framework creates a parameter:
([Extent3].[AvailableQuantity] > #p__linq__0)
I might be wrong, but, as I see, this is done in order to preserve the programmer’s goal when writing the code (constant = constant, variable = parameter).
And what about the subquery?
This is a query optimization logic (a bad one, probably, at least on this scenario). When you make a query using parameters, you might run it several times, but SQL Server will always use the same execution plan, making the query faster; when you use constants, each query need to be reevaluated (if you check SQL Server Activity Monitor, you will see that queries with parameters are treated as the same query, regardless the parameters values).
This way, in my opinion (sorry, I could not find any documentation about it), Entity Framework is trying to isolate the queries; the outer/generic one, that use parameters, and the inner/specific one, that use constants.
I would be happy if anyone could complement it with some Microsoft documentation about this subject…

Anyone have an idea how to express complex join logic within a LINQ query?

I've been at it all day. For the life of me, I cannot figure out how to translate either of the final two select statements found within the below code snippet:
declare #Person table
(
[Name] varchar(50),
[ABA] varchar(9)
)
declare #Entity table
(
[Name] varchar(50),
[Respondent] varchar(9),
[TierRespondent] varchar(9)
)
insert into #Person ([Name], [ABA])
select 'Steve', '000000001'
union
select 'Mary', '000000002'
union
select 'Carey', '000000003'
insert into #Entity ([Name], [Respondent], [TierRespondent])
select 'Steve', '000000001', '000000006'
union
select 'Mary', '000000004', '000000002'
union
select 'Carey', '000000005', '000000008'
select *
FROM #Entity e
LEFT JOIN #Person p
ON p.[ABA] = e.Respondent
or p.[ABA] = e.[TierRespondent]
select *
FROM #Entity e
LEFT JOIN #Person p
ON p.[ABA] in (e.Respondent ,e.[TierRespondent])
The thing that boggles my mind is the logic found within the ON clause of the join statements.
I'm not a SQL wiz, so I've even failed at trying to restructure these SELECT statements into a different form that gives me the same results, but is also easier to translate to LINQ.
Any ideas, anyone?
Thanks.

After some hours of pondering, I found the following sql to be an acceptable refactoring that can easily be translated to a linq query:
select *
from #entity e
left join #Person p
on 1 = 1
where p.[ABA] = e.Respondent
or p.[ABA] = e.[TierRespondent]
and p.ABA is not null
And here is the corresponding linq query:
from entity in entities
join p in persons on 1 equals 1 into groupJoin
from person in groupJoin.DefaultIfEmpty()
where person.ABA == entity.Respondent || person.ABA == entity.TierRespondent
&& person.ABA != null
select new
{
entity.Name,
entity.Respondent,
entity.TierRespondent,
person?.Name,
person?.ABA
}
The select statements of the original post contained logic within the join clause. This logic cannot be expressed within a linq join clasue. This refactoring moves the logic from the join, and into the where clause, which can easily be handled expressed in linq.

Entity Framework query with 2 many-to-many joins

I'm working on a project which is using EF Code first and has the following model relationships:
Item (Id, Name, virtual List<Category>, virtual List<Tag>)
Category (Id, Name, virtual List<Item>)
Tag (Id, Name, virtual List<Item>)
I'm running a search where I would like to get all items where the item name = searchTerm, the category id is contained in a list of ints and where the tag name exists in a list of tags.
public IEnumerable<Item> Search(string searchTerm, IEnumerable<int> categoryIds, IEnumerable<string> tags)
{
var query = (
from i in context.Items
from c in context.Categories
from t in context.Tags
where i.Name.Contains(searchTerm)
&& categoryIds.Contains(c.Id)
&& tags.Contains(t.Name)
select i);
return query.ToList();
}
In SQL the query would look like the following:
SELECT I.* FROM Items I
INNER JOIN ItemItemCategories IIC ON IIC.Item_Id = I.Id
INNER JOIN ItemCategories C ON C.Id = IIC.ItemCategory_Id
INNER JOIN ItemItemTags IIT ON IIT.Item_Id = I.Id
INNER JOIN ItemTags T On T.Id = IIT.ItemTag_Id
WHERE I.Question like '%sample%' -- searchTerm
AND C.Id in (1,2) -- categoryIds
AND (T.Text like '%Difficult%' OR T.Text like '%Technical%') -- tags
My question is how can I form my code to return the query above. This is the most efficient way to perform the query from my knowledge. Currently the following query is being run from code:
SELECT
[Filter1].[Id1] AS [Id],
[Filter1].[Name] AS [Name]
FROM (
SELECT
[Extent1].[Id] AS [Id1],
[Extent1].[Name] AS [Name]
FROM [dbo].[Items] AS [Extent1]
CROSS JOIN [dbo].[Categories] AS [Extent2]
WHERE [Extent2].[Id] IN (1, 2) ) AS [Filter1]
CROSS JOIN [dbo].[Tags] AS [Extent3]
WHERE ([Filter1].[Name] LIKE #p__linq__0 ESCAPE N'~') AND ([Extent3].[Name] IN (N'Difficult', N'Technical')) AND ([Extent3].[Name] IS NOT NULL)

Try this:
var query = ( from i in context.Items
from c in i.Categories
from t in i.Tags
where i.Name.Contains(searchTerm)
&& categoryIds.Contains(c.Id)
&& tags.Contains(t.Name)
select i).ToList();
You do not have to search through all the categories and tags elements, only those who are related with you Item.
About the query you want, IMHO I don't think there's a more efficient query in Linq to Entities to get the result you are expecting that the query I propose above. Look the sql code that is generated:
SELECT
[Filter1].[Id] AS [Id],
[Filter1].[Name] AS [Name]
FROM (SELECT [Extent1].[Id] AS [Id], [Extent1].[Name] AS [Name]
FROM [dbo].[Items] AS [Extent1]
INNER JOIN [dbo].[ItemCategories] AS [Extent2] ON [Extent1].[Id] = [Extent2].[Item_Id]
WHERE [Extent2].[Category_Id] IN (1, 2) ) AS [Filter1]
INNER JOIN (SELECT [Extent3].[Item_Id] AS [Item_Id]
FROM [dbo].[TagItems] AS [Extent3]
INNER JOIN [dbo].[Tags] AS [Extent4] ON [Extent4].[Id] = [Extent3].[Tag_Id]
WHERE ([Extent4].[Name] IN (N'Difficult', N'Technical')) AND ([Extent4].[Name] IS NOT NULL) ) AS [Filter2] ON [Filter1].[Id] = [Filter2].[Item_Id]
WHERE [Filter1].[Name] LIKE #p__linq__0 ESCAPE N'~'
As you can see it is quite similar with the query that you expect.

LINQ: Include clause is causing two left join when there should be one

I am using below LINQ query:
CreateObjectSet<ClientCustomFieldValue>()
.Include(scf => scf.ClientCustomField.CustomField)
.Where(str => str.PassengerTripID == passengerTripID).ToList();
Sql corresponding to this query is(as per sql profiler)
exec sp_executesql
N'SELECT
[Extent1].[ClientCustomFieldValueID] AS [ClientCustomFieldValueID],
[Extent1].[ClientCustomFieldID] AS [ClientCustomFieldID],
[Extent1].[PassengerTripID] AS [PassengerTripID],
[Extent1].[DataValue] AS [DataValue],
[Extent1].[RowVersion] AS [RowVersion],
[Extent1].[LastChangeSecSessionID] AS [LastChangeSecSessionID],
[Extent1].[LastChangeTimeUTC] AS [LastChangeTimeUTC],
[Extent2].[ClientCustomFieldID] AS [ClientCustomFieldID1],
[Extent2].[ClientID] AS [ClientID],
[Extent2].[CustomFieldID] AS [CustomFieldID],
[Extent2].[CustomFieldSourceEnumID] AS [CustomFieldSourceEnumID],
[Extent2].[RequiredFlag] AS [RequiredFlag],
[Extent2].[ValidationRegex] AS [ValidationRegex],
[Extent2].[RowVersion] AS [RowVersion1],
[Extent2].[PassengerTripStopTypeEnumID] AS [PassengerTripStopTypeEnumID],
[Extent2].[LastChangeSecSessionID] AS [LastChangeSecSessionID1],
[Extent2].[LastChangeTimeUTC] AS [LastChangeTimeUTC1],
[Extent4].[CustomFieldID] AS [CustomFieldID1],
[Extent4].[CustomFieldCode] AS [CustomFieldCode],
[Extent4].[Description] AS [Description],
[Extent4].[RowVersion] AS [RowVersion2],
[Extent4].[LastChangeSecSessionID] AS [LastChangeSecSessionID2],
[Extent4].[LastChangeTimeUTC] AS [LastChangeTimeUTC2]
FROM [dbo].[ClientCustomFieldValue] AS [Extent1]
LEFT OUTER JOIN [dbo].[ClientCustomField] AS [Extent2]
ON ([Extent2].[DeleteFlag] = 0)
AND ([Extent1].[ClientCustomFieldID] = [Extent2].[ClientCustomFieldID])
LEFT OUTER JOIN [dbo].[ClientCustomField] AS [Extent3]
ON ([Extent3].[DeleteFlag] = 0)
AND ([Extent1].[ClientCustomFieldID] = [Extent3].[ClientCustomFieldID])
LEFT OUTER JOIN [dbo].[CustomField] AS [Extent4]
ON ([Extent4].[DeleteFlag] = 0)
AND ([Extent3].[CustomFieldID] = [Extent4].[CustomFieldID])
WHERE ([Extent1].[DeleteFlag] = 0)
AND ([Extent1].[PassengerTripID] = #p__linq__0)
',N'#p__linq__0 int',#p__linq__0=96
I would like to know why there are two left join with 'ClientCustomField' table. Kindly help me understand this.

Here is an assumption.
First left join, denoted as Extent2, is for the SELECT clause to retrieve all necessary fields from ClientCustomField table. This would be presented in the query anyway, no matter if there is an Include method call.
Second left join, denoted as Extent3, is to retrieve CustomField table fields. As you can see it is not used anywhere except for the last left join clause that is created specifically for that as it joins everything with the CustomField. That is something produced by the Include call.
Apparently LINQ is not checking what tables where already joined in the query, and processing each of the parts of the query separately it generated two left joins for each of them.

Linq Join with a Group By

Ok, I am trying to replicate the following SQL query into a Linq expression:
SELECT
I.EmployeeNumber,
E.TITLE,
E.FNAM,
E.LNAM
FROM
Incidents I INNER JOIN Employees E ON I.IncidentEmployee = E.EmployeeNumber
GROUP BY
I.EmployeeNumber,
E.TITLE,
E.FNAM,
E.LNAM
Simple enough (or at least I thought):
var query = (from e in contextDB.Employees
join i in contextDB.Incidents on i.IncidentEmployee = e.EmployeeNumber
group e by new { i.IncidentEmployee, e.TITLE, e.FNAM, e.LNAM } into allIncEmps
select new
{
IncEmpNum = allIncEmps.Key.IncidentEmployee
TITLE = allIncEmps.Key.TITLE,
USERFNAM = allIncEmps.Key.FNAM,
USERLNAM = allIncEmps.Key.LNAM
});
But I am not getting back the results I exprected, so I fire up SQL Profiler to see what is being sent down the pipe to SQL Server and this is what I see:
SELECT
[GroupBy1].[A1] AS [C1]
FROM ( SELECT
COUNT(1) AS [A1]
FROM ( SELECT DISTINCT
[Extent2].[IncidentEmployee] AS [IncidentEmployee],
[Extent1].[TITLE] AS [TITLE],
[Extent1].[FNAM] AS [FNAM],
[Extent1].[LNAM] AS [LNAM]
FROM [dbo].[Employees] AS [Extent1]
INNER JOIN [dbo].[INCIDENTS] AS [Extent2] ON ([Extent1].[EmployeeNumber] = [Extent2].[IncidentEmployee]) OR (([Extent1].[EmployeeNumber] IS NULL) AND ([Extent2].[IncidentEmployee] IS NULL))
) AS [Distinct1]
) AS [GroupBy1]
As you can see from the SQL string that was sent toSQL Server none of the fields that I was expecting to be return are being included in the Select clause. What am I doing wrong?
UPDATE
It has been a very long day, I re-ran the code again and now this is the SQL that is being sent down the pipe:
SELECT
[Distinct1].[IncidentEmployee] AS [IncidentEmployee],
[Distinct1].[TITLE] AS [TITLE],
[Distinct1].[FNAM] AS [FNAM],
[Distinct1].[LNAM] AS [LNAM]
FROM ( SELECT DISTINCT
[Extent1].[OFFNUM] AS [OFFNUM],
[Extent1].[TITLE] AS [TITLE],
[Extent1].[FNAM] AS [FNAM],
[Extent1].[LNAM] AS [LNAM]
FROM [dbo].[Employees] AS [Extent1]
INNER JOIN [dbo].[INCIDENTS] AS [Extent2] ON ([Extent1].[EmployeeNumber] = [Extent2].[IncidentEmployee]) OR (([Extent1].[EmployeeNumber] IS NULL) AND ([Extent2].[IncidentEmployee] IS NULL))
) AS [Distinct1]
But I am still not seeing results when I try to loop through the record set
foreach (var emps in query)
{
}

Not sure why the query does not return what it should return, but it occurred to me that since you only query the group key and not any grouped results you've got nothing but a Distinct():
var query =
(from e in contextDB.Employees
join i in contextDB.Incidents on i.IncidentEmployee equals e.EmployeeNumber
select new
{
IncEmpNum = i.IncidentEmployee
TITLE = e.TITLE,
USERFNAM = e.FNAM,
USERLNAM = e.LNAM
}).Distinct();
But EF was smart enough to see this as well and created a DISTINCT query too.
You don't specify which result you expected and in what way the actual result was different, but I really can't see how the grouping can produce a different result than a Distinct.
But how did your code compile? As xeondev noticed: there should be an equals in stead of an = in a join statement. My compiler (:D) does not swallow it otherwise. The generated SQL join is strange too: it also matches records where both joined values are NULL. This makes me suspect that at least one of your keys (i.IncidentEmployee or e.EmployeeNumber) is nullable and you should either use i.IncidentEmployee.Value or e.EmployeeNumber.Value or both.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

LINQ to SQL, multiple table join, generated SQL missing 2nd INNER JOIN - c#

Related

LINQ generating Nested/Sub queries

Anyone have an idea how to express complex join logic within a LINQ query?

Entity Framework query with 2 many-to-many joins

LINQ: Include clause is causing two left join when there should be one

Linq Join with a Group By

Categories

Resources