What's the difference between these two LINQtoSQL statements? - c#

These two statements look the same logically to me, but they're resulting in different SQL being generated:
#1
var people = _DB.People.Where(p => p.Status == MyPersonEnum.STUDENT.ToString());
var ids = people.Select(p => p.Id);
var cars = _DB.Cars.Where(c => ids.Contains(c.PersonId));
#2
string s = MyPersonEnum.STUDENT.ToString();
var people = _DB.People.Where(p => p.Status == s);
var ids = people.Select(p => p.Id);
var cars = _DB.Cars.Where(c => ids.Contains(c.PersonId));
Example #1 doesn't work, but example #2 does.
The generated SQL for the var people query is identical for both, but the SQL in the final query differs like this:
#1
SELECT [t0].[PersonId], [t0].[etc].....
FROM [Cars] AS [t0]
WHERE EXISTS(
SELECT NULL AS [EMPTY]
FROM [People] AS [t1]
WHERE ([t1].[Id] = [t0].[PersonId]) AND ([t1].[Status] = (CONVERT(NVarChar,#p0)))
)
#2
SELECT [t0].[PersonId], [t0].[etc].....
FROM [Cars] AS [t0]
WHERE EXISTS(
SELECT NULL AS [EMPTY]
FROM [People] AS [t1]
WHERE ([t1].[Id] = [t0].[PersonId]) AND ([t1].[Status] = #p0)
)
Why is there this difference?
Edit:
Up until now all I've done to get the SQL generated is to inspect the queryable in the debugger. However, after setting up a logger as Jon suggested, it seems that the real sql executed is different.
#1
SELECT [t1].[Id], [t1].etc ... [t0].Id, [t1].etc ...
FROM [Cars] AS [t0], [People] AS [t1]
WHERE ([t1].[Id] = [t0].[PersonId]) AND (EXISTS(
SELECT NULL AS [EMPTY]
FROM [People] AS [t2]
WHERE ([t2].[Id] = [t0].[PersonId]) AND ([t2].[Status] = (CONVERT(NVarChar,#p0)))
)) AND ([t1].[Status] = #p1)
-- #p0: Input Int (Size = 0; Prec = 0; Scale = 0) [2]
-- #p1: Input NVarChar (Size = 7; Prec = 0; Scale = 0) [STUDENT]
#2
SELECT [t1].[Id], [t1].etc ... [t0].Id, [t1].etc ...
FROM [Cars] AS [t0], [People] AS [t1]
WHERE ([t1].[Id] = [t0].[PersonId]) AND (EXISTS(
SELECT NULL AS [EMPTY]
FROM [People] AS [t2]
WHERE ([t2].[Id] = [t0].[PersonId]) AND ([t2].[Status] = #p0)
)) AND ([t1].[Status] = #p1)
-- #p0: Input NVarChar (Size = 7; Prec = 0; Scale = 0) [STUDENT]
-- #p1: Input NVarChar (Size = 7; Prec = 0; Scale = 0) [STUDENT]

First, think of dual nature of e Enum:
enum MyPersonEnum
{
STUDENT, // implicit 1
TEACHER, // implicit 2
DIRECTOR = 10 // explicit 10
}
...
Assert.AreEqual(1, (int)MyPersonEnum.STUDENT);
Assert.AreEqual("STUDENT", MyPersonEnum.STUDENT.ToString());
In the second example, C# have converted Enum to string, so no conversion needed, and it's assumed that your database People.Status column accepts "STUDENT", "TEACHER", "DIRECTOR" strings as valid values in the logic.
The difference is, enum internal representation in CLR is integer, and the first example, #p parameter is passed as an integer, it's an L2S query builder behaviour, that's why the conversion.
The first one would work, if your database column was an int that takes values assigned to the Enum members {1,2,10} in my example.

No, they're different. In the first version, the expression MyPersonEnum.STUDENT.ToString() is within the expression tree - it's part of what LINQ to SQL has to convert into SQL. I'd be interested to see what #p0 is when the query is executed...
In the second version, you've already evaluated the expression, so LINQ to SQL just sees a reference to a variable which is already a string.
We know that they mean the same thing, but presumably LINQ to SQL doesn't have quite enough knowledge to understand that.
Out of interest, do both of them work?
EDIT: Okay, so the second version works. I suggest you use that form then :) In an ideal world, both would work - but in this case it seems you need to help LINQ to SQL a bit.

Related

How to optimize a slow running LINQ query that runs quickly in SQL Server

I am trying to write a LINQ query that returns rows (items) based on certain conditions on one of the sub-collection of the row (item). The query I wrote works but performs very poorly in LINQ. However if I run the generated query in SQL Server, it almost instantly returns the desired rows.
My assumption is that the query is slow because of the OrderByDescending() that is performed multiple times. Is this correct and how can I improve the performance of this query?
Edit: The goal of this query is to select all Foo objects that have a Bar object
where the price property of the last Baz object is within a lower and upper bound value.
var query = dbContext.Foos.AsQueryable();
query = query.Where(e => e.Bars.Any(p => p.Bazs.OrderByDescending(s => s.BazId).First().Price >= filter.LowerBoundPrice));
query = query.Where(e => e.Bars.Any(p => p.Bazs.OrderByDescending(s => s.BazId).First().Price <= filter.UpperBoundPrice));
Edit2: This is the generated SQL query.
-- Region Parameters
DECLARE #p0 Decimal(6,2) = 2000
DECLARE #p1 Decimal(6,2) = 1000
-- EndRegion
SELECT [t0].[FooId]
FROM [Foo] AS [t0]
WHERE (EXISTS(
SELECT NULL AS [EMPTY]
FROM [Bar] AS [t1]
WHERE (((
SELECT [t3].[Price]
FROM (
SELECT TOP (1) [t2].[Price]
FROM [Baz] AS [t2]
WHERE [t2].[BarId] = [t1].[BarId]
ORDER BY [t2].[BazId] DESC
) AS [t3]
)) <= #p0) AND ([t1].[FooId] = [t0].[FooId])
)) AND (EXISTS(
SELECT NULL AS [EMPTY]
FROM [Bar] AS [t4]
WHERE (((
SELECT [t6].[Price]
FROM (
SELECT TOP (1) [t5].[Price]
FROM [Baz] AS [t5]
WHERE [t5].[BarId] = [t4].[BarId]
ORDER BY [t5].[BazId] DESC
) AS [t6]
)) >= #p1) AND ([t4].[FooId] = [t0].[FooId])
))

A more refined version of this LINQ to SQL query

My conundrum is with trying to convert the following T-SQL query into a near equivalent (performance wise) LINQ to SQL query:
SELECT
j1.JOB,
max(CASE WHEN ISNULL(logs.statcategory, ' ') = 'PREP' THEN 'X' ELSE ' ' END) AS prep,
max(CASE WHEN ISNULL(logs.statcategory, ' ') = 'PRINT' THEN 'X' ELSE ' ' END) AS press,
max(CASE WHEN ISNULL(logs.statcategory, ' ') = 'BIND' THEN 'X' ELSE ' ' END) AS bind,
max(CASE WHEN ISNULL(logs.statcategory, ' ') = 'SHIP' THEN 'X' ELSE ' ' END) AS ship
from
job j1
left outer join
(
select
j.job,
l.statcategory,
cnt=count(*)
from
job j
join
jobloc jl
join location l
on
l.code = jl.location and
l.site = jl.site
on j.job = jl.job
WHERE
j.stat = 'O'
group by
j.job,l.statcategory
) logs
on
j1.job = logs.job
WHERE
j1.stat = 'O'
group by
j1.job
This query currently runs just under 0.2 seconds on MS SQL Server. The following LINQ query is what I've come up with that returns the exact same records, but runs nearly 30x slower:
from a0 in Jobs
join a1 in
(
from a0 in Jobs
join a1 in JobLocs on a0.Content equals a1.Job
join a2 in Locations on new {Code = a1.Location, a1.Site} equals new {a2.Code, a2.Site}
where a0.Stat == 'O'
select new {a0.Content, a2.StatCategory}
) on a0.Content equals a1.Content into a1
from a2 in a1.DefaultIfEmpty()
where a0.Stat == 'O'
group a2 by a0.Content into a0
orderby a0.Key
select new
{
Job = a0.Key,
Prep = (bool?)a0.Max(a1 => a1.StatCategory == "PREP" ? true : false),
Print = (bool?)a0.Max(a1 => a1.StatCategory == "PRINT" ? true : false),
BIND = (bool?)a0.Max(a1 => a1.StatCategory == "BIND" ? true : false),
SHIP = (bool?)a0.Max(a1 => a1.StatCategory == "SHIP" ? true : false),
}
Here is the generated SQL from the LINQ query (using LINQPad):
-- Region Parameters
DECLARE #p0 Int = 79
DECLARE #p1 Int = 79
DECLARE #p2 VarChar(1000) = 'PREP'
DECLARE #p3 VarChar(1000) = 'PRINT'
DECLARE #p4 VarChar(1000) = 'BIND'
DECLARE #p5 VarChar(1000) = 'SHIP'
-- EndRegion
SELECT [t4].[Job], [t4].[value] AS [Prep], [t4].[value2] AS [Print], [t4].[value3] AS [BIND], [t4].[value4] AS [SHIP]
FROM (
SELECT MAX(
(CASE
WHEN [t3].[StatCategory] = #p2 THEN 1
WHEN NOT ([t3].[StatCategory] = #p2) THEN 0
ELSE NULL
END)) AS [value], MAX(
(CASE
WHEN [t3].[StatCategory] = #p3 THEN 1
WHEN NOT ([t3].[StatCategory] = #p3) THEN 0
ELSE NULL
END)) AS [value2], MAX(
(CASE
WHEN [t3].[StatCategory] = #p4 THEN 1
WHEN NOT ([t3].[StatCategory] = #p4) THEN 0
ELSE NULL
END)) AS [value3], MAX(
(CASE
WHEN [t3].[StatCategory] = #p5 THEN 1
WHEN NOT ([t3].[StatCategory] = #p5) THEN 0
ELSE NULL
END)) AS [value4], [t0].[Job]
FROM [Job] AS [t0]
LEFT OUTER JOIN ([Job] AS [t1]
INNER JOIN [JobLoc] AS [t2] ON [t1].[Job] = [t2].[Job]
INNER JOIN [Location] AS [t3] ON ([t2].[Location] = [t3].[Code]) AND ([t2].[Site] = [t3].[Site])) ON ([t0].[Job] = [t1].[Job]) AND (UNICODE([t1].[Stat]) = #p0)
WHERE UNICODE([t0].[Stat]) = #p1
GROUP BY [t0].[Job]
) AS [t4]
ORDER BY [t4].[Job]
One thing that stands out is that the generated SQL from the LINQ query runs the aggregate for each column returned in a subquery, whereas in the original it is part of the outer SELECT. I can imagine part of the performance decrease is there.
I'm (tentatively) willing to accept that there is no better way to write this, and just use the DataContext.ExecuteQuery() method in the LINQ API (and just run and shape the first SQL statement directly). However, I'm trying to not include embedded SQL as much as possible in a project that I'm currently working on, so if it can be made to be near the performance of the original query, that'd be ideal. I've been hacking away at this for some time (partly as an academic exercise, and also to actually use this or similar queries like it), and this is the best I've come up with (I did not write the original query BTW--it was part of an older project that is being migrated to a newer one).
Thanks for any assistance.
As per our discussion in the comments,
The issue is the UNICODE conversion that the linq-to-entities adds from some unknown reason.
the DB cannot use the index because of the (unnecessary) conversion.
You can use .Equals instead of == and it will not use UNICODE or change the type to varchar(1) in the db.

Entity Framework v6.1 query compilation performance

I am confused how EF LINQ queries are compiled and executed. When I run a piece of program in LINQPad couple of times, I get varied performance results (each time the same query takes different amount of time). Please find below my test execution environment.
tools used: EF v6.1 & LINQPad v5.08.
Ref DB : ContosoUniversity DB downloaded from MSDN.
For queries, I am using Persons, Courses & Departments tables from the above DB; see below.
Now, I have below data:
Query goal: get the second person and associated departments.
Query:
var test = (
from p in Persons
join d in Departments on p.ID equals d.InstructorID
select new {
person = p,
dept = d
}
);
var result = (from pd in test
group pd by pd.person.ID into grp
orderby grp.Key
select new {
ID = grp.Key,
FirstName = grp.First().person.FirstName,
Deps = grp.Where(x => x.dept != null).Select(x => x.dept).Distinct().ToList()
}).Skip(1).Take(1).ToList();
foreach(var r in result)
{
Console.WriteLine("person is..." + r.FirstName);
Console.WriteLine(r.FirstName + "' deps are...");
foreach(var d in r.Deps){
Console.WriteLine(d.Name);
}
}
When I run this I get the result and LINQPad shows time taken value from 3.515 sec to 0.004 sec (depending how much gap I take between different runs).
If I take the generated SQL query and execute it, that query always runs between 0.015 sec to 0.001sec.
Generated query:
-- Region Parameters
DECLARE #p0 Int = 1
DECLARE #p1 Int = 1
-- EndRegion
SELECT [t7].[ID], [t7].[value] AS [FirstName]
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY [t6].[ID]) AS [ROW_NUMBER], [t6].[ID], [t6].[value]
FROM (
SELECT [t2].[ID], (
SELECT [t5].[FirstName]
FROM (
SELECT TOP (1) [t3].[FirstName]
FROM [Person] AS [t3]
INNER JOIN [Department] AS [t4] ON ([t3].[ID]) = [t4]. [InstructorID]
WHERE [t2].[ID] = [t3].[ID]
) AS [t5]
) AS [value]
FROM (
SELECT [t0].[ID]
FROM [Person] AS [t0]
INNER JOIN [Department] AS [t1] ON ([t0].[ID]) = [t1].[InstructorID]
GROUP BY [t0].[ID]
) AS [t2]
) AS [t6]
) AS [t7]
WHERE [t7].[ROW_NUMBER] BETWEEN #p0 + 1 AND #p0 + #p1
ORDER BY [t7].[ROW_NUMBER]
GO
-- Region Parameters
DECLARE #x1 Int = 2
-- EndRegion
SELECT DISTINCT [t1].[DepartmentID], [t1].[Name], [t1].[Budget], [t1]. [StartDate], [t1].[InstructorID], [t1].[RowVersion]
FROM [Person] AS [t0]
INNER JOIN [Department] AS [t1] ON ([t0].[ID]) = [t1].[InstructorID]
WHERE #x1 = [t0].[ID]
My questions:
1) Are those LINQ statements correct? Or can they be optimized?
2) Is the time difference for LINQ query execution normal?
Another different question:
I have modified the first query to execute immediately (called ToList before the second query). This time generated SQL is very simple as shown below (it doesn't look like there is a SQL query for the first LINQ statement with ToList() included):
SELECT [t0].[ID], [t0].[LastName], [t0].[FirstName], [t0].[HireDate], [t0]. [EnrollmentDate], [t0].[Discriminator], [t1].[DepartmentID], [t1].[Name], [t1]. [Budget], [t1].[StartDate], [t1].[InstructorID], [t1].[RowVersion]
FROM [Person] AS [t0]
INNER JOIN [Department] AS [t1] ON ([t0].[ID]) = [t1].[InstructorID]
Running this modified query also took varied amount of time but the difference is not as big as the first query set run.
In my application, there going to be lot of rows and I prefer first query set to second one but I am confused.
Please guide.
(Note: I have a little SQL Server knowledge so, I am using LINQPad to fine tune queries based on the performance)
Thanks

Querying in advance with linq2sql for data with gaps

Hi i have table: Values with ValueId, Timestamp , Value and BelongTo. Each 15 minutes there is insreted new row into that table with new value, current timestamp and specific BelongTo field. And now i want to find gaps i mean values where one after another has timestamp more then 15 minutes.
I was trying this:
var gaps = from p1 in db.T_Values
join p2 in db.T_Values on p1.TimeStamp.AddMinutes(15) equals p2.TimeStamp
into grups where !grups.Any() select new {p1};
and it works but i don't know if this is optimall, what do you think? and i don't know how can i add where p1.BelongTo == 1. Cos this query looks for all data.
Jon told
var gaps = from p1 in db.T_Values
where p1.BelongTo == 1
where !db.T_Values.Any(p2 => p1.TimeStamp.AddMinutes(15) == p2.Timestamp)
select p1;
Jon this last query is translated to:
exec sp_executesql N'SELECT [t0].[ValueID], [t0].[TimeStamp], [t0].[Value],
[t0].[BelongTo], [t0].[Type]
FROM [dbo].[T_Values] AS [t0]
WHERE (NOT (EXISTS(
SELECT NULL AS [EMPTY]
FROM [dbo].[T_Values] AS [t1]
WHERE DATEADD(ms, (CONVERT(BigInt,#p0 * 60000)) % 86400000,
DATEADD(day, (CONVERT(BigInt,#p0 * 60000)) / 86400000, [t0].[TimeStamp])) = [t1].[TimeStamp]
))) AND ([t0].[BelongTo] = #p1)',N'#p0 float,#p1 int',#p0=15,#p1=1
and it works unless all rows have the same belongTo, when there are rows with BelongTo with many diferent values then i've noticed I need to add to sql:and [t1].BelongTo = 1 which should finally look like this
N'SELECT [t0].[ValueID], [t0].[TimeStamp], [t0].[Value], [t0].[BelongTo], [t0].[Type]
FROM [dbo].[T_Values] AS [t0]
WHERE (NOT (EXISTS(
SELECT NULL AS [EMPTY]
FROM [dbo].[T_Values] AS [t1]
WHERE DATEADD(ms, (CONVERT(BigInt,#p0 * 60000)) % 86400000,
DATEADD(day, (CONVERT(BigInt,#p0 * 60000)) / 86400000, [t0].[TimeStamp])) = [t1].[TimeStamp]
and [t1].BelongTo = 1
))) AND ([t0].[BelongTo] = #p1)',N'#p0 float,#p1 int',#p0=15,#p1=1
other words:
SELECT TimeStamp
FROM [dbo].[T_Values] AS [t0]
WHERE NOT( (EXISTS (SELECT NULL AS [EMPTY]
FROM [dbo].[T_Values] AS [t1]
WHERE DATEADD(MINUTE, 15, [t0].[TimeStamp]) = [t1].[TimeStamp])))
AND ([t0].[BelongTo] = 1)
shoud change to
SELECT TimeStamp
FROM [dbo].[T_Values] AS [t0]
WHERE NOT( (EXISTS (SELECT NULL AS [EMPTY]
FROM [dbo].[T_Values] AS [t1]
WHERE DATEADD(MINUTE, 15, [t0].[TimeStamp]) = [t1].[TimeStamp] and [t1].BelongTo=1)))
AND ([t0].[BelongTo] = 1)
but I am still thinking how can I add this to linkq
Well adding the extra where clause is easy (and I'll remove the pointless anonymous type at the same time):
var gaps = from p1 in db.T_Values
where p1.BelongTo == 1
join p2 in db.T_Values
on p1.TimeStamp.AddMinutes(15) equals p2.TimeStamp
into grups
where !grups.Any()
select p1;
I'm not sure why you're grouping though... I would have thought this would be simpler:
var gaps = from p1 in db.T_Values
where p1.BelongTo == 1
where !db.T_Values.Any(p2 => p1.TimeStamp.AddMinutes(15) == p2.Timestamp)
select p1;
As for performance - look at the generated SQL and how it looks in SQL profiler.
EDIT: If you need the BelongTo check in both versions (makes sense) I'd suggest this:
var sequence = db.T_Values.Where(p => p.BelongTo == 1);
var gaps = from p1 in sequence
where !sequence.Any(p2 => p1.TimeStamp.AddMinutes(15) == p2.Timestamp)
select p1;
How about
var gaps = dbT_Values.Take(dbT_Values.Count()-1)
.Select((p, index) => new {P1 = p, P2 = dbT_Values.ElementAt(index + 1)})
.Where(p => p.P1.BelongsTo == 1 && p.P1.TimeStamp.AddMinutes(15).Equals(p.P2.TimeStamp)).Select(p => p.P1);

Linq distinct not working correctly

I'm having a strange problem with a linq query. I'm using LINQPad 4 to make some a query that uses regular expression using LinqToSQL as the LinqPad driver.
Here's the query that I'm trying to make :
(from match in
from s in SystemErrors
select Regex.Match(s.Description, "...")
select new
{
FamilyCode = match.Groups["FamilyCode"].Value,
ProductPrefix = match.Groups["ProductPrefix"].Value,
BillingGroup = match.Groups["BillingGroup"].Value,
Debtor = match.Groups["Debtor"].Value
}).Distinct()
As you can see I'm trying to extract data from a text description in a log table using groups. The query works, but the Distinct doesn't want to work, it returns a line for all Match.
I have read that distinct should work with anonymous type, matching each property. Even more strange is that distinct does actually do something, it orders the values alphabetically by the FamilyCode (and then by ProductPrefix, etc.).
Has someone an idea on why this isn't working?
Thanks
Here is what is displayed in the SQL tab of LinqPad :
DECLARE #p0 NVarChar(1000) = 'Big Regexp'
DECLARE #p1 NVarChar(1000) = 'FamilyCode'
DECLARE #p2 NVarChar(1000) = 'ProductPrefix'
DECLARE #p3 NVarChar(1000) = 'BillingGroup'
DECLARE #p4 NVarChar(1000) = 'Debtor'
SELECT DISTINCT [t2].[Description] AS [input], [t2].[value], [t2].[value2], [t2].[value3], [t2].[value4], [t2].[value5]
FROM (
SELECT [t1].[Description], [t1].[value], #p1 AS [value2], #p2 AS [value3], #p3 AS [value4], #p4 AS [value5]
FROM (
SELECT [t0].[Description], #p0 AS [value]
FROM [SystemError] AS [t0]
) AS [t1]
) AS [t2]
var result = from eachError in SystemErrors
let match = Regex.Match(eachError.Description, "...")
group eachError by new
{
FamilyCode = match.Groups["FamilyCode"].Value,
ProductPrefix = match.Groups["ProductPrefix"].Value,
BillingGroup = match.Groups["BillingGroup"].Value,
Debtor = match.Groups["Debtor"].Value
}
into unique
select unique.key;
When you use Distinct(), it's distinct by pointer to each object, not value because select new {} is object type not value type. Try using group by instead.
On the other hand, you can use .Distinct(IEqualityComparer<T>) overload and provided EqualityComparer for the object that you want to process.

Categories

Resources