Entity Framework v6.1 query compilation performance - c#

I am confused how EF LINQ queries are compiled and executed. When I run a piece of program in LINQPad couple of times, I get varied performance results (each time the same query takes different amount of time). Please find below my test execution environment.
tools used: EF v6.1 & LINQPad v5.08.
Ref DB : ContosoUniversity DB downloaded from MSDN.
For queries, I am using Persons, Courses & Departments tables from the above DB; see below.
Now, I have below data:
Query goal: get the second person and associated departments.
Query:
var test = (
from p in Persons
join d in Departments on p.ID equals d.InstructorID
select new {
person = p,
dept = d
}
);
var result = (from pd in test
group pd by pd.person.ID into grp
orderby grp.Key
select new {
ID = grp.Key,
FirstName = grp.First().person.FirstName,
Deps = grp.Where(x => x.dept != null).Select(x => x.dept).Distinct().ToList()
}).Skip(1).Take(1).ToList();
foreach(var r in result)
{
Console.WriteLine("person is..." + r.FirstName);
Console.WriteLine(r.FirstName + "' deps are...");
foreach(var d in r.Deps){
Console.WriteLine(d.Name);
}
}
When I run this I get the result and LINQPad shows time taken value from 3.515 sec to 0.004 sec (depending how much gap I take between different runs).
If I take the generated SQL query and execute it, that query always runs between 0.015 sec to 0.001sec.
Generated query:
-- Region Parameters
DECLARE #p0 Int = 1
DECLARE #p1 Int = 1
-- EndRegion
SELECT [t7].[ID], [t7].[value] AS [FirstName]
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY [t6].[ID]) AS [ROW_NUMBER], [t6].[ID], [t6].[value]
FROM (
SELECT [t2].[ID], (
SELECT [t5].[FirstName]
FROM (
SELECT TOP (1) [t3].[FirstName]
FROM [Person] AS [t3]
INNER JOIN [Department] AS [t4] ON ([t3].[ID]) = [t4]. [InstructorID]
WHERE [t2].[ID] = [t3].[ID]
) AS [t5]
) AS [value]
FROM (
SELECT [t0].[ID]
FROM [Person] AS [t0]
INNER JOIN [Department] AS [t1] ON ([t0].[ID]) = [t1].[InstructorID]
GROUP BY [t0].[ID]
) AS [t2]
) AS [t6]
) AS [t7]
WHERE [t7].[ROW_NUMBER] BETWEEN #p0 + 1 AND #p0 + #p1
ORDER BY [t7].[ROW_NUMBER]
GO
-- Region Parameters
DECLARE #x1 Int = 2
-- EndRegion
SELECT DISTINCT [t1].[DepartmentID], [t1].[Name], [t1].[Budget], [t1]. [StartDate], [t1].[InstructorID], [t1].[RowVersion]
FROM [Person] AS [t0]
INNER JOIN [Department] AS [t1] ON ([t0].[ID]) = [t1].[InstructorID]
WHERE #x1 = [t0].[ID]
My questions:
1) Are those LINQ statements correct? Or can they be optimized?
2) Is the time difference for LINQ query execution normal?
Another different question:
I have modified the first query to execute immediately (called ToList before the second query). This time generated SQL is very simple as shown below (it doesn't look like there is a SQL query for the first LINQ statement with ToList() included):
SELECT [t0].[ID], [t0].[LastName], [t0].[FirstName], [t0].[HireDate], [t0]. [EnrollmentDate], [t0].[Discriminator], [t1].[DepartmentID], [t1].[Name], [t1]. [Budget], [t1].[StartDate], [t1].[InstructorID], [t1].[RowVersion]
FROM [Person] AS [t0]
INNER JOIN [Department] AS [t1] ON ([t0].[ID]) = [t1].[InstructorID]
Running this modified query also took varied amount of time but the difference is not as big as the first query set run.
In my application, there going to be lot of rows and I prefer first query set to second one but I am confused.
Please guide.
(Note: I have a little SQL Server knowledge so, I am using LINQPad to fine tune queries based on the performance)
Thanks

Related

How to optimize a slow running LINQ query that runs quickly in SQL Server

I am trying to write a LINQ query that returns rows (items) based on certain conditions on one of the sub-collection of the row (item). The query I wrote works but performs very poorly in LINQ. However if I run the generated query in SQL Server, it almost instantly returns the desired rows.
My assumption is that the query is slow because of the OrderByDescending() that is performed multiple times. Is this correct and how can I improve the performance of this query?
Edit: The goal of this query is to select all Foo objects that have a Bar object
where the price property of the last Baz object is within a lower and upper bound value.
var query = dbContext.Foos.AsQueryable();
query = query.Where(e => e.Bars.Any(p => p.Bazs.OrderByDescending(s => s.BazId).First().Price >= filter.LowerBoundPrice));
query = query.Where(e => e.Bars.Any(p => p.Bazs.OrderByDescending(s => s.BazId).First().Price <= filter.UpperBoundPrice));
Edit2: This is the generated SQL query.
-- Region Parameters
DECLARE #p0 Decimal(6,2) = 2000
DECLARE #p1 Decimal(6,2) = 1000
-- EndRegion
SELECT [t0].[FooId]
FROM [Foo] AS [t0]
WHERE (EXISTS(
SELECT NULL AS [EMPTY]
FROM [Bar] AS [t1]
WHERE (((
SELECT [t3].[Price]
FROM (
SELECT TOP (1) [t2].[Price]
FROM [Baz] AS [t2]
WHERE [t2].[BarId] = [t1].[BarId]
ORDER BY [t2].[BazId] DESC
) AS [t3]
)) <= #p0) AND ([t1].[FooId] = [t0].[FooId])
)) AND (EXISTS(
SELECT NULL AS [EMPTY]
FROM [Bar] AS [t4]
WHERE (((
SELECT [t6].[Price]
FROM (
SELECT TOP (1) [t5].[Price]
FROM [Baz] AS [t5]
WHERE [t5].[BarId] = [t4].[BarId]
ORDER BY [t5].[BazId] DESC
) AS [t6]
)) >= #p1) AND ([t4].[FooId] = [t0].[FooId])
))

Anyone have an idea how to express complex join logic within a LINQ query?

I've been at it all day. For the life of me, I cannot figure out how to translate either of the final two select statements found within the below code snippet:
declare #Person table
(
[Name] varchar(50),
[ABA] varchar(9)
)
declare #Entity table
(
[Name] varchar(50),
[Respondent] varchar(9),
[TierRespondent] varchar(9)
)
insert into #Person ([Name], [ABA])
select 'Steve', '000000001'
union
select 'Mary', '000000002'
union
select 'Carey', '000000003'
insert into #Entity ([Name], [Respondent], [TierRespondent])
select 'Steve', '000000001', '000000006'
union
select 'Mary', '000000004', '000000002'
union
select 'Carey', '000000005', '000000008'
select *
FROM #Entity e
LEFT JOIN #Person p
ON p.[ABA] = e.Respondent
or p.[ABA] = e.[TierRespondent]
select *
FROM #Entity e
LEFT JOIN #Person p
ON p.[ABA] in (e.Respondent ,e.[TierRespondent])
The thing that boggles my mind is the logic found within the ON clause of the join statements.
I'm not a SQL wiz, so I've even failed at trying to restructure these SELECT statements into a different form that gives me the same results, but is also easier to translate to LINQ.
Any ideas, anyone?
Thanks.
After some hours of pondering, I found the following sql to be an acceptable refactoring that can easily be translated to a linq query:
select *
from #entity e
left join #Person p
on 1 = 1
where p.[ABA] = e.Respondent
or p.[ABA] = e.[TierRespondent]
and p.ABA is not null
And here is the corresponding linq query:
from entity in entities
join p in persons on 1 equals 1 into groupJoin
from person in groupJoin.DefaultIfEmpty()
where person.ABA == entity.Respondent || person.ABA == entity.TierRespondent
&& person.ABA != null
select new
{
entity.Name,
entity.Respondent,
entity.TierRespondent,
person?.Name,
person?.ABA
}
The select statements of the original post contained logic within the join clause. This logic cannot be expressed within a linq join clasue. This refactoring moves the logic from the join, and into the where clause, which can easily be handled expressed in linq.

Linq Join with a Group By

Ok, I am trying to replicate the following SQL query into a Linq expression:
SELECT
I.EmployeeNumber,
E.TITLE,
E.FNAM,
E.LNAM
FROM
Incidents I INNER JOIN Employees E ON I.IncidentEmployee = E.EmployeeNumber
GROUP BY
I.EmployeeNumber,
E.TITLE,
E.FNAM,
E.LNAM
Simple enough (or at least I thought):
var query = (from e in contextDB.Employees
join i in contextDB.Incidents on i.IncidentEmployee = e.EmployeeNumber
group e by new { i.IncidentEmployee, e.TITLE, e.FNAM, e.LNAM } into allIncEmps
select new
{
IncEmpNum = allIncEmps.Key.IncidentEmployee
TITLE = allIncEmps.Key.TITLE,
USERFNAM = allIncEmps.Key.FNAM,
USERLNAM = allIncEmps.Key.LNAM
});
But I am not getting back the results I exprected, so I fire up SQL Profiler to see what is being sent down the pipe to SQL Server and this is what I see:
SELECT
[GroupBy1].[A1] AS [C1]
FROM ( SELECT
COUNT(1) AS [A1]
FROM ( SELECT DISTINCT
[Extent2].[IncidentEmployee] AS [IncidentEmployee],
[Extent1].[TITLE] AS [TITLE],
[Extent1].[FNAM] AS [FNAM],
[Extent1].[LNAM] AS [LNAM]
FROM [dbo].[Employees] AS [Extent1]
INNER JOIN [dbo].[INCIDENTS] AS [Extent2] ON ([Extent1].[EmployeeNumber] = [Extent2].[IncidentEmployee]) OR (([Extent1].[EmployeeNumber] IS NULL) AND ([Extent2].[IncidentEmployee] IS NULL))
) AS [Distinct1]
) AS [GroupBy1]
As you can see from the SQL string that was sent toSQL Server none of the fields that I was expecting to be return are being included in the Select clause. What am I doing wrong?
UPDATE
It has been a very long day, I re-ran the code again and now this is the SQL that is being sent down the pipe:
SELECT
[Distinct1].[IncidentEmployee] AS [IncidentEmployee],
[Distinct1].[TITLE] AS [TITLE],
[Distinct1].[FNAM] AS [FNAM],
[Distinct1].[LNAM] AS [LNAM]
FROM ( SELECT DISTINCT
[Extent1].[OFFNUM] AS [OFFNUM],
[Extent1].[TITLE] AS [TITLE],
[Extent1].[FNAM] AS [FNAM],
[Extent1].[LNAM] AS [LNAM]
FROM [dbo].[Employees] AS [Extent1]
INNER JOIN [dbo].[INCIDENTS] AS [Extent2] ON ([Extent1].[EmployeeNumber] = [Extent2].[IncidentEmployee]) OR (([Extent1].[EmployeeNumber] IS NULL) AND ([Extent2].[IncidentEmployee] IS NULL))
) AS [Distinct1]
But I am still not seeing results when I try to loop through the record set
foreach (var emps in query)
{
}
Not sure why the query does not return what it should return, but it occurred to me that since you only query the group key and not any grouped results you've got nothing but a Distinct():
var query =
(from e in contextDB.Employees
join i in contextDB.Incidents on i.IncidentEmployee equals e.EmployeeNumber
select new
{
IncEmpNum = i.IncidentEmployee
TITLE = e.TITLE,
USERFNAM = e.FNAM,
USERLNAM = e.LNAM
}).Distinct();
But EF was smart enough to see this as well and created a DISTINCT query too.
You don't specify which result you expected and in what way the actual result was different, but I really can't see how the grouping can produce a different result than a Distinct.
But how did your code compile? As xeondev noticed: there should be an equals in stead of an = in a join statement. My compiler (:D) does not swallow it otherwise. The generated SQL join is strange too: it also matches records where both joined values are NULL. This makes me suspect that at least one of your keys (i.IncidentEmployee or e.EmployeeNumber) is nullable and you should either use i.IncidentEmployee.Value or e.EmployeeNumber.Value or both.

How can I optimise this Linq query to remove the unnecessary SELECT Count(*)

I have three tables, Entity, Period and Result. There is a 1:1 mapping between Entity and Period and a 1:Many between Period and Result.
This is the linq query:
int id = 100;
DateTime start = DateTime.Now;
from p in db.Periods
where p.Entity.ObjectId == id && p.Start == start
select new { Period = p, Results = p.Results })
This is relevant parts of the generated SQL:
SELECT [t0].[EntityId], [t2].[PeriodId], [t2].[Value], (
SELECT COUNT(*)
FROM [dbo].[Result] AS [t3]
WHERE [t3].[PeriodId] = [t0].[Id]
) AS [value2]
FROM [dbo].[Period] AS [t0]
INNER JOIN [dbo].[Entity] AS [t1] ON [t1].[Id] = [t0].[EntityId]
LEFT OUTER JOIN [dbo].[Result] AS [t2] ON [t2].[PeriodId] = [t0].[Id]
WHERE ([t1].[ObjectId] = 100) AND ([t0].[Start] = '2010-02-01 00:00:00')
Where is the SELECT Count(*) coming from and how can I get rid of it? I don't need a count of the "Results" for each "Period" and it slows the query down by an order of magnitude.
Consider using the Context.LoadOptions and specifying for Period to LoadWith(p => p.Results) to eager load the period with results without needing to project into an anonymous type.

Left Outer Join and Exists in Linq To SQL C# .NET 3.5

I'm stuck on translating a left outer join from LINQToSQL that returns unique parent rows.
I have 2 tables (Project, Project_Notes, and it's a 1-many relationship linked by Project_ID). I am doing a keyword search on multiple columns on the 2 table and I only want to return the unique projects if a column in Project_Notes contains a keyword. I have this linqtoSQl sequence going but it seems to be returning multiple Project rows. Maybe do an Exist somehow in LINQ? Or maybe a groupby of some sort?
Here's the LINQToSQL:
query = from p in query
join n in notes on p.PROJECT_ID equals n.PROJECT_ID into projectnotes
from n in notes.DefaultIfEmpty()
where n.NOTES.Contains(cwForm.search1Form)
select p;
here's the SQL it produced from profiler
exec sp_executesql N'SELECT [t2].[Title], [t2].[State], [t2].[PROJECT_ID],
[t2].[PROVIDER_ID], [t2].[CATEGORY_ID], [t2].[City], [t2].[UploadedDate],
[t2].[SubmittedDate], [t2].[Project_Type]FROM ( SELECT ROW_NUMBER() OVER (ORDER BY
[t0].[UploadedDate]) AS [ROW_NUMBER], [t0].[Title], [t0].[State], [t0].[PROJECT_ID],
[t0].[PROVIDER_ID], [t0].[CATEGORY_ID], [t0].[City], [t0].[UploadedDate],
[t0].[SubmittedDate], [t0].[Project_Type] FROM [dbo].[PROJECTS] AS [t0] LEFT OUTER JOIN
[dbo].[PROJECT_NOTES] AS [t1] ON 1=1 WHERE ([t1].[NOTES] LIKE #p0) AND
([t0].SubmittedDate] >= #p1) AND ([t0].[SubmittedDate] < #p2) AND ([t0].[PROVIDER_ID] =
#p3) AND ([t0].[CATEGORY_ID] IS NULL)) AS [t2] WHERE [t2].[ROW_NUMBER] BETWEEN #p4 + 1
AND #p4 + #p5 ORDER BY [t2].[ROW_NUMBER]',N'#p0 varchar(9),#p1 datetime,#p2 datetime,#p3
int,#p4 int,#p5 int',#p0='%chicago%',#p1=''2000-09-02 00:00:00:000'',#p2=''2009-03-02
00:00:00:000'',#p3=1000,#p4=373620,#p5=20
This query returns all mutiples of the 1-many relationship in the results. I found how to do an Exists in LINQ from here. http://www.linq-to-sql.com/linq-to-sql/t-sql-to-linq-upgrade/linq-exists/
Here is the LINQToSQL using Exists:
query = from p in query
where (from n in notes
where n.NOTES.Contains(cwForm.search1Form)
select n.PROJECT_ID).Contains(p.PROJECT_ID)
select p;
The generated SQL statement:
exec sp_executesql N'SELECT COUNT(*) AS [value] FROM [dbo].[PROJECTS] AS [t0] WHERE
(EXISTS(SELECT NULL AS [EMPTY] FROM [dbo].[PROJECT_NOTES] AS [t1] WHERE
([t1].PROJECT_ID] = ([t0].[PROJECT_ID])) AND ([t1].[NOTES] LIKE #p0))) AND
([t0].[SubmittedDate] >= #p1) AND ([t0].[SubmittedDate] < #p2) AND ([t0].[PROVIDER_ID] =
#p3) AND ([t0].[CATEGORY_ID] IS NULL)',N'#p0 varchar(9),#p1 datetime,#p2 datetime,#p3
int',#p0='%chicago%',#p1=''2000-09-02 00:00:00:000'',#p2=''2009-03-02
00:00:00:000'',#p3=1000
I get a SQL timeout from the databind() from using Exists.
it seems to be returning multiple Project rows
Yes, that's how join works. If a project has 5 matching notes, it show up 5 times.
What if the problem is - "Join" is the wrong idiom!
You want to filter the projects to those whose notes contain certain text:
var query = db.Project
.Where(p => p.Notes.Any(n => n.NoteField.Contains(searchString)));
You are going to have to use the DefaultIfEmpty extension method. There are a few questions on SO already that show how to do this. Here is a good example:
How can I perform a nested Join, Add, and Group in LINQ?

Categories

Resources