LINQ query losing select when joining two queries - c#

Just to set the context a little, I'm trying to use queries with mysql that use Late row lookup as shown in this article
https://explainextended.com/2009/10/23/mysql-order-by-limit-performance-late-row-lookups/
but that's a story for another day but the idea is that you do a key search on the table and then join it onto the whole table to force a late row lookup and the problem is coming from my LINQ queries when joined together.
-- Key search query --
Calling Code
IQueryable<int> keySearch = _defaultQueryFactory.Load(ContextEnums.ClientContext, MapEntityToDTO(), whereStatement, clientID).OrderBy(orderBy).Skip(startRow).Take(pageSize).Select(x => x.ID);
Resulting Query
SELECT
`Extent1`.`Sys_InvoiceID`
FROM `tblinvoice` AS `Extent1`
WHERE 3 = `Extent1`.`FK_StatusID`
ORDER BY
`Extent1`.`InvoiceDate` ASC LIMIT 0,430
-- Full Table Search --
Calling Code
IQueryable<InvoiceDTOModel> tableSearch = _defaultQueryFactory.Load(ContextEnums.ClientContext, MapEntityToDTO(), null, clientID, true).OrderBy(orderBy);
Resulting Query
SELECT
`Extent1`.`ID`,
`Extent1`.`C1`,
`Extent1`.`C2`,
`Extent1`.`C3`,
`Extent1`.`C4`,
`Extent1`.`C5`,
`Extent1`.`C6`,
`Extent2`.`SID`,
`Extent2`.`S1,
`Extent2`.`S2`,
`Extent2`.`S3`,
`Extent3`.`EID`,
`Extent3`.`E1`,
`Extent4`.`DID`,
`Extent4`.`D1`,
`Extent4`.`D2`,
`Extent4`.`D3`,
`Extent4`.`D4`,
`Extent4`.`D5`
FROM `tbl1` AS `Extent1` INNER JOIN `tbl2` AS `Extent2` ON `Extent1`.`SID` = `Extent2`.`SID` INNER JOIN `tbl3` AS `Extent3` ON `Extent1`.`EID` = `Extent3`.`EID` LEFT OUTER JOIN `tbl4` AS `Extent4` ON `Extent1`.`ID` = `Extent4`.`DID`
ORDER BY
`Extent1`.`C4` ASC
-- Joining the Two Together --
Calling Code
keySearch.Join(tableSearch, key => key, table => table.ID, (key, table) => table).OrderBy(orderBy).ToListAsync();
Resulting Query
SELECT
`Join3`.`ID`,
`Join3`.`C1`,
`Join3`.`C1`,
`Join3`.`C1`,
`Join3`.`C1`,
`Join3`.`C1`,
`Join3`.`C1`,
`Join3`.`SID`,
`Join3`.`S1,
`Join3`.`S2`,
`Join3`.`S3`,
`Join3`.`EID`,
`Join3`.`E1`,
`Join3`.`DID`,
`Join3`.`D1`,
`Join3`.`D2`,
`Join3`.`D3`,
`Join3`.`D4`,
`Join3`.`D5`
FROM (
`Extent1`.`ID`,
`Extent1`.`C1`,
`Extent1`.`C2`,
`Extent1`.`C3`,
`Extent1`.`C4`,
`Extent1`.`C5`,
`Extent1`.`C6`
FROM `tblinvoice` AS `Extent1`
WHERE 3 = `Extent1`.`EID`
ORDER BY
`Extent1`.`C4` ASC LIMIT 0,430) AS `Limit1` INNER JOIN (SELECT
`Extent1`.`ID`,
`Extent1`.`C1`,
`Extent1`.`C2`,
`Extent1`.`C3`,
`Extent1`.`C4`,
`Extent1`.`C5`,
`Extent1`.`C6`,
`Extent2`.`SID`,
`Extent2`.`S1,
`Extent2`.`S2`,
`Extent2`.`S3`,
`Extent3`.`EID`,
`Extent3`.`E1`,
`Extent4`.`DID`,
`Extent4`.`D1`,
`Extent4`.`D2`,
`Extent4`.`D3`,
`Extent4`.`D4`,
`Extent4`.`D5`
FROM `tbl1` AS `Extent2` INNER JOIN `tbl2` AS `Extent3` ON `Extent2`.`SID` = `Extent3`.`SID` INNER JOIN `tblstatus` AS `Extent4` ON `Extent2`.`EID` = `Extent4`.`EID` LEFT OUTER JOIN `tbl3` AS `Extent5` ON `Extent2`.`ID` = `Extent5`.`DID`) AS `Join3` ON `Limit1`.`ID` = `Join3`.`ID`
ORDER BY
`Join3`.`C4` ASC
Basically the inner select brings back
FROM (
`Extent1`.`ID`,
`Extent1`.`C1`,
`Extent1`.`C2`,
`Extent1`.`C3`,
`Extent1`.`C4`,
`Extent1`.`C5`,
`Extent1`.`C6`
FROM `tblinvoice` AS `Extent1`
WHERE 3 = `Extent1`.`EID`
ORDER BY
`Extent1`.`C4` ASC LIMIT 0,430) AS `Limit1`
Instead of
FROM (
`Extent1`.`ID`,
FROM `tblinvoice` AS `Extent1`
WHERE 3 = `Extent1`.`EID`
ORDER BY
`Extent1`.`C4` ASC LIMIT 0,430) AS `Limit1`
--Note--
The actual query selects around 15 columns, I've just shortened it to this example, it has an effect on the search as the dataset grows in size and it shouldn't be selecting all of the fields but i suspect there's an error in my join.
Any help is much appreciated.

Related

SQL Query taking too much time to execute on production

SQL Query taking too much time to execute. Working fine at UAT. I need to compare data of two tables and want to get difference. Below mention is my query.
Select *
from tblBrandDetailUsers tbdu
inner join tblBrands tbs on tbs.BrandId = tbdu.BrandId
left join tblBrandDetails tbd on tbd.CategoryId = tbdu.CategoryId
and tbd.BrandId = tbdu.BrandId
and tbd.CityId = tbdu.CityId
inner join tblCategory tc on tbdu.CategoryId = tc.CategoryId
inner join tblCity tcc on tcc.CityId = tbdu.CityId
where isnull(tbdu.SaleAmount,-1) <> isnull(tbd.SaleAmount,-1)
and isnull(tbdu.CityId,0) = 3
and isnull(tbdu.TopLevelCategoryId,0) = 2;
Need to optimize query.
a number of things you need to check:
number of rows for each table. the more rows you have the slower it gets. Do you have the same size of data with UAT?
SELECT * : avoid the * and only retrieve columns you need.
ISNULL function on left side of the WHERE predicate will scan the index because it is non-sargable. you can check the answer here and rewrite your predicate without any function on the left side of WHERE clause.
You need to provide a detailed information like actual execution plan. I can only give you a generic answer because not much detail was provided.
Remember the UAT is very different in PROD. the hardware you used, the number of rows, etc..
Every advises in comment looks right. The difference between UAT and Prod should be the volume of data.
Your issue should come of lack or inefficient indices.
You should add compound index on
tblBrandDetails.CategoryId,tblBrandDetails.BrandId, tblBrandDetails.CityId
and on
tblBrandDetailUsers.CategoryId,tblBrandDetailUsers.BrandId, tblBrandDetailUsers.CityId
ensure that all unique ids have a btree index (or similar type of index depending on your DB)
You can also add conditional indices to filter quicker the null values:
https://www.brentozar.com/archive/2015/09/filtered-indexes-and-is-not-null/
Rewrite your query like this :
Select *--> AVOID "*" put all the necessary columns
from tblBrandDetailUsers AS tbdu
inner join tblBrands AS tbs on tbs.BrandId = tbdu.BrandId
left join tblBrandDetails AS tbd on tbd.CategoryId = tbdu.CategoryId
and tbd.BrandId = tbdu.BrandId
and tbd.CityId = tbdu.CityId
inner join tblCategory AS tc on tbdu.CategoryId = tc.CategoryId
inner join tblCity AS tcc on tcc.CityId = tbdu.CityId
where tbdu.SaleAmount <> tbd.SaleAmount
and tbdu.CityId = 3
and tbdu.TopLevelCategoryId = 2
UNION ALL
SELECT * --> AVOID "*" put all the necessary columns
from tblBrandDetailUsers AS tbdu
inner join tblBrands AS tbs on tbs.BrandId = tbdu.BrandId
left join tblBrandDetails AS tbd on tbd.CategoryId = tbdu.CategoryId
and tbd.BrandId = tbdu.BrandId
and tbd.CityId = tbdu.CityId
inner join tblCategory AS tc on tbdu.CategoryId = tc.CategoryId
inner join tblCity AS tcc on tcc.CityId = tbdu.CityId
where tbdu.SaleAmount IS NULL
AND tbd.SaleAmount IS NULL
and tbdu.CityId = 3
and tbdu.TopLevelCategoryId = 2;
Modify the SELECT clause to have only the necessary columns and not *
Be sure that you have index that are close to :
For tblBrandDetailUsers TABLE :
index KEY (CityId, TopLevelCategoryId, BrandId, CategoryId) INCLUDE (SaleAmount)
index KEY (CityId, TopLevelCategoryId, CategoryId) INCLUDE (SaleAmount)
For tblBrandDetails TABLE :
index (CityId, BrandId, CategoryId)
And also :
tblCategory (CategoryId)
tblCity (CityId)
tblBrands (BrandId)
When you will rectify the query especially the SELECT clause, we can give you more accurate indexes, because selected columns have a big weight on indexes performances !
as other suggested, try to add index on columns used for joins

Anyone have an idea how to express complex join logic within a LINQ query?

I've been at it all day. For the life of me, I cannot figure out how to translate either of the final two select statements found within the below code snippet:
declare #Person table
(
[Name] varchar(50),
[ABA] varchar(9)
)
declare #Entity table
(
[Name] varchar(50),
[Respondent] varchar(9),
[TierRespondent] varchar(9)
)
insert into #Person ([Name], [ABA])
select 'Steve', '000000001'
union
select 'Mary', '000000002'
union
select 'Carey', '000000003'
insert into #Entity ([Name], [Respondent], [TierRespondent])
select 'Steve', '000000001', '000000006'
union
select 'Mary', '000000004', '000000002'
union
select 'Carey', '000000005', '000000008'
select *
FROM #Entity e
LEFT JOIN #Person p
ON p.[ABA] = e.Respondent
or p.[ABA] = e.[TierRespondent]
select *
FROM #Entity e
LEFT JOIN #Person p
ON p.[ABA] in (e.Respondent ,e.[TierRespondent])
The thing that boggles my mind is the logic found within the ON clause of the join statements.
I'm not a SQL wiz, so I've even failed at trying to restructure these SELECT statements into a different form that gives me the same results, but is also easier to translate to LINQ.
Any ideas, anyone?
Thanks.
After some hours of pondering, I found the following sql to be an acceptable refactoring that can easily be translated to a linq query:
select *
from #entity e
left join #Person p
on 1 = 1
where p.[ABA] = e.Respondent
or p.[ABA] = e.[TierRespondent]
and p.ABA is not null
And here is the corresponding linq query:
from entity in entities
join p in persons on 1 equals 1 into groupJoin
from person in groupJoin.DefaultIfEmpty()
where person.ABA == entity.Respondent || person.ABA == entity.TierRespondent
&& person.ABA != null
select new
{
entity.Name,
entity.Respondent,
entity.TierRespondent,
person?.Name,
person?.ABA
}
The select statements of the original post contained logic within the join clause. This logic cannot be expressed within a linq join clasue. This refactoring moves the logic from the join, and into the where clause, which can easily be handled expressed in linq.

LINQ: Include clause is causing two left join when there should be one

I am using below LINQ query:
CreateObjectSet<ClientCustomFieldValue>()
.Include(scf => scf.ClientCustomField.CustomField)
.Where(str => str.PassengerTripID == passengerTripID).ToList();
Sql corresponding to this query is(as per sql profiler)
exec sp_executesql
N'SELECT
[Extent1].[ClientCustomFieldValueID] AS [ClientCustomFieldValueID],
[Extent1].[ClientCustomFieldID] AS [ClientCustomFieldID],
[Extent1].[PassengerTripID] AS [PassengerTripID],
[Extent1].[DataValue] AS [DataValue],
[Extent1].[RowVersion] AS [RowVersion],
[Extent1].[LastChangeSecSessionID] AS [LastChangeSecSessionID],
[Extent1].[LastChangeTimeUTC] AS [LastChangeTimeUTC],
[Extent2].[ClientCustomFieldID] AS [ClientCustomFieldID1],
[Extent2].[ClientID] AS [ClientID],
[Extent2].[CustomFieldID] AS [CustomFieldID],
[Extent2].[CustomFieldSourceEnumID] AS [CustomFieldSourceEnumID],
[Extent2].[RequiredFlag] AS [RequiredFlag],
[Extent2].[ValidationRegex] AS [ValidationRegex],
[Extent2].[RowVersion] AS [RowVersion1],
[Extent2].[PassengerTripStopTypeEnumID] AS [PassengerTripStopTypeEnumID],
[Extent2].[LastChangeSecSessionID] AS [LastChangeSecSessionID1],
[Extent2].[LastChangeTimeUTC] AS [LastChangeTimeUTC1],
[Extent4].[CustomFieldID] AS [CustomFieldID1],
[Extent4].[CustomFieldCode] AS [CustomFieldCode],
[Extent4].[Description] AS [Description],
[Extent4].[RowVersion] AS [RowVersion2],
[Extent4].[LastChangeSecSessionID] AS [LastChangeSecSessionID2],
[Extent4].[LastChangeTimeUTC] AS [LastChangeTimeUTC2]
FROM [dbo].[ClientCustomFieldValue] AS [Extent1]
LEFT OUTER JOIN [dbo].[ClientCustomField] AS [Extent2]
ON ([Extent2].[DeleteFlag] = 0)
AND ([Extent1].[ClientCustomFieldID] = [Extent2].[ClientCustomFieldID])
LEFT OUTER JOIN [dbo].[ClientCustomField] AS [Extent3]
ON ([Extent3].[DeleteFlag] = 0)
AND ([Extent1].[ClientCustomFieldID] = [Extent3].[ClientCustomFieldID])
LEFT OUTER JOIN [dbo].[CustomField] AS [Extent4]
ON ([Extent4].[DeleteFlag] = 0)
AND ([Extent3].[CustomFieldID] = [Extent4].[CustomFieldID])
WHERE ([Extent1].[DeleteFlag] = 0)
AND ([Extent1].[PassengerTripID] = #p__linq__0)
',N'#p__linq__0 int',#p__linq__0=96
I would like to know why there are two left join with 'ClientCustomField' table. Kindly help me understand this.
Here is an assumption.
First left join, denoted as Extent2, is for the SELECT clause to retrieve all necessary fields from ClientCustomField table. This would be presented in the query anyway, no matter if there is an Include method call.
Second left join, denoted as Extent3, is to retrieve CustomField table fields. As you can see it is not used anywhere except for the last left join clause that is created specifically for that as it joins everything with the CustomField. That is something produced by the Include call.
Apparently LINQ is not checking what tables where already joined in the query, and processing each of the parts of the query separately it generated two left joins for each of them.

Linq Join with a Group By

Ok, I am trying to replicate the following SQL query into a Linq expression:
SELECT
I.EmployeeNumber,
E.TITLE,
E.FNAM,
E.LNAM
FROM
Incidents I INNER JOIN Employees E ON I.IncidentEmployee = E.EmployeeNumber
GROUP BY
I.EmployeeNumber,
E.TITLE,
E.FNAM,
E.LNAM
Simple enough (or at least I thought):
var query = (from e in contextDB.Employees
join i in contextDB.Incidents on i.IncidentEmployee = e.EmployeeNumber
group e by new { i.IncidentEmployee, e.TITLE, e.FNAM, e.LNAM } into allIncEmps
select new
{
IncEmpNum = allIncEmps.Key.IncidentEmployee
TITLE = allIncEmps.Key.TITLE,
USERFNAM = allIncEmps.Key.FNAM,
USERLNAM = allIncEmps.Key.LNAM
});
But I am not getting back the results I exprected, so I fire up SQL Profiler to see what is being sent down the pipe to SQL Server and this is what I see:
SELECT
[GroupBy1].[A1] AS [C1]
FROM ( SELECT
COUNT(1) AS [A1]
FROM ( SELECT DISTINCT
[Extent2].[IncidentEmployee] AS [IncidentEmployee],
[Extent1].[TITLE] AS [TITLE],
[Extent1].[FNAM] AS [FNAM],
[Extent1].[LNAM] AS [LNAM]
FROM [dbo].[Employees] AS [Extent1]
INNER JOIN [dbo].[INCIDENTS] AS [Extent2] ON ([Extent1].[EmployeeNumber] = [Extent2].[IncidentEmployee]) OR (([Extent1].[EmployeeNumber] IS NULL) AND ([Extent2].[IncidentEmployee] IS NULL))
) AS [Distinct1]
) AS [GroupBy1]
As you can see from the SQL string that was sent toSQL Server none of the fields that I was expecting to be return are being included in the Select clause. What am I doing wrong?
UPDATE
It has been a very long day, I re-ran the code again and now this is the SQL that is being sent down the pipe:
SELECT
[Distinct1].[IncidentEmployee] AS [IncidentEmployee],
[Distinct1].[TITLE] AS [TITLE],
[Distinct1].[FNAM] AS [FNAM],
[Distinct1].[LNAM] AS [LNAM]
FROM ( SELECT DISTINCT
[Extent1].[OFFNUM] AS [OFFNUM],
[Extent1].[TITLE] AS [TITLE],
[Extent1].[FNAM] AS [FNAM],
[Extent1].[LNAM] AS [LNAM]
FROM [dbo].[Employees] AS [Extent1]
INNER JOIN [dbo].[INCIDENTS] AS [Extent2] ON ([Extent1].[EmployeeNumber] = [Extent2].[IncidentEmployee]) OR (([Extent1].[EmployeeNumber] IS NULL) AND ([Extent2].[IncidentEmployee] IS NULL))
) AS [Distinct1]
But I am still not seeing results when I try to loop through the record set
foreach (var emps in query)
{
}
Not sure why the query does not return what it should return, but it occurred to me that since you only query the group key and not any grouped results you've got nothing but a Distinct():
var query =
(from e in contextDB.Employees
join i in contextDB.Incidents on i.IncidentEmployee equals e.EmployeeNumber
select new
{
IncEmpNum = i.IncidentEmployee
TITLE = e.TITLE,
USERFNAM = e.FNAM,
USERLNAM = e.LNAM
}).Distinct();
But EF was smart enough to see this as well and created a DISTINCT query too.
You don't specify which result you expected and in what way the actual result was different, but I really can't see how the grouping can produce a different result than a Distinct.
But how did your code compile? As xeondev noticed: there should be an equals in stead of an = in a join statement. My compiler (:D) does not swallow it otherwise. The generated SQL join is strange too: it also matches records where both joined values are NULL. This makes me suspect that at least one of your keys (i.IncidentEmployee or e.EmployeeNumber) is nullable and you should either use i.IncidentEmployee.Value or e.EmployeeNumber.Value or both.

Left Outer Join and Exists in Linq To SQL C# .NET 3.5

I'm stuck on translating a left outer join from LINQToSQL that returns unique parent rows.
I have 2 tables (Project, Project_Notes, and it's a 1-many relationship linked by Project_ID). I am doing a keyword search on multiple columns on the 2 table and I only want to return the unique projects if a column in Project_Notes contains a keyword. I have this linqtoSQl sequence going but it seems to be returning multiple Project rows. Maybe do an Exist somehow in LINQ? Or maybe a groupby of some sort?
Here's the LINQToSQL:
query = from p in query
join n in notes on p.PROJECT_ID equals n.PROJECT_ID into projectnotes
from n in notes.DefaultIfEmpty()
where n.NOTES.Contains(cwForm.search1Form)
select p;
here's the SQL it produced from profiler
exec sp_executesql N'SELECT [t2].[Title], [t2].[State], [t2].[PROJECT_ID],
[t2].[PROVIDER_ID], [t2].[CATEGORY_ID], [t2].[City], [t2].[UploadedDate],
[t2].[SubmittedDate], [t2].[Project_Type]FROM ( SELECT ROW_NUMBER() OVER (ORDER BY
[t0].[UploadedDate]) AS [ROW_NUMBER], [t0].[Title], [t0].[State], [t0].[PROJECT_ID],
[t0].[PROVIDER_ID], [t0].[CATEGORY_ID], [t0].[City], [t0].[UploadedDate],
[t0].[SubmittedDate], [t0].[Project_Type] FROM [dbo].[PROJECTS] AS [t0] LEFT OUTER JOIN
[dbo].[PROJECT_NOTES] AS [t1] ON 1=1 WHERE ([t1].[NOTES] LIKE #p0) AND
([t0].SubmittedDate] >= #p1) AND ([t0].[SubmittedDate] < #p2) AND ([t0].[PROVIDER_ID] =
#p3) AND ([t0].[CATEGORY_ID] IS NULL)) AS [t2] WHERE [t2].[ROW_NUMBER] BETWEEN #p4 + 1
AND #p4 + #p5 ORDER BY [t2].[ROW_NUMBER]',N'#p0 varchar(9),#p1 datetime,#p2 datetime,#p3
int,#p4 int,#p5 int',#p0='%chicago%',#p1=''2000-09-02 00:00:00:000'',#p2=''2009-03-02
00:00:00:000'',#p3=1000,#p4=373620,#p5=20
This query returns all mutiples of the 1-many relationship in the results. I found how to do an Exists in LINQ from here. http://www.linq-to-sql.com/linq-to-sql/t-sql-to-linq-upgrade/linq-exists/
Here is the LINQToSQL using Exists:
query = from p in query
where (from n in notes
where n.NOTES.Contains(cwForm.search1Form)
select n.PROJECT_ID).Contains(p.PROJECT_ID)
select p;
The generated SQL statement:
exec sp_executesql N'SELECT COUNT(*) AS [value] FROM [dbo].[PROJECTS] AS [t0] WHERE
(EXISTS(SELECT NULL AS [EMPTY] FROM [dbo].[PROJECT_NOTES] AS [t1] WHERE
([t1].PROJECT_ID] = ([t0].[PROJECT_ID])) AND ([t1].[NOTES] LIKE #p0))) AND
([t0].[SubmittedDate] >= #p1) AND ([t0].[SubmittedDate] < #p2) AND ([t0].[PROVIDER_ID] =
#p3) AND ([t0].[CATEGORY_ID] IS NULL)',N'#p0 varchar(9),#p1 datetime,#p2 datetime,#p3
int',#p0='%chicago%',#p1=''2000-09-02 00:00:00:000'',#p2=''2009-03-02
00:00:00:000'',#p3=1000
I get a SQL timeout from the databind() from using Exists.
it seems to be returning multiple Project rows
Yes, that's how join works. If a project has 5 matching notes, it show up 5 times.
What if the problem is - "Join" is the wrong idiom!
You want to filter the projects to those whose notes contain certain text:
var query = db.Project
.Where(p => p.Notes.Any(n => n.NoteField.Contains(searchString)));
You are going to have to use the DefaultIfEmpty extension method. There are a few questions on SO already that show how to do this. Here is a good example:
How can I perform a nested Join, Add, and Group in LINQ?

Categories

Resources