Querying in advance with linq2sql for data with gaps

Querying in advance with linq2sql for data with gaps - c#

Hi i have table: Values with ValueId, Timestamp , Value and BelongTo. Each 15 minutes there is insreted new row into that table with new value, current timestamp and specific BelongTo field. And now i want to find gaps i mean values where one after another has timestamp more then 15 minutes.
I was trying this:
var gaps = from p1 in db.T_Values
join p2 in db.T_Values on p1.TimeStamp.AddMinutes(15) equals p2.TimeStamp
into grups where !grups.Any() select new {p1};
and it works but i don't know if this is optimall, what do you think? and i don't know how can i add where p1.BelongTo == 1. Cos this query looks for all data.
Jon told
var gaps = from p1 in db.T_Values
where p1.BelongTo == 1
where !db.T_Values.Any(p2 => p1.TimeStamp.AddMinutes(15) == p2.Timestamp)
select p1;
Jon this last query is translated to:
exec sp_executesql N'SELECT [t0].[ValueID], [t0].[TimeStamp], [t0].[Value],
[t0].[BelongTo], [t0].[Type]
FROM [dbo].[T_Values] AS [t0]
WHERE (NOT (EXISTS(
SELECT NULL AS [EMPTY]
FROM [dbo].[T_Values] AS [t1]
WHERE DATEADD(ms, (CONVERT(BigInt,#p0 * 60000)) % 86400000,
DATEADD(day, (CONVERT(BigInt,#p0 * 60000)) / 86400000, [t0].[TimeStamp])) = [t1].[TimeStamp]
))) AND ([t0].[BelongTo] = #p1)',N'#p0 float,#p1 int',#p0=15,#p1=1
and it works unless all rows have the same belongTo, when there are rows with BelongTo with many diferent values then i've noticed I need to add to sql:and [t1].BelongTo = 1 which should finally look like this
N'SELECT [t0].[ValueID], [t0].[TimeStamp], [t0].[Value], [t0].[BelongTo], [t0].[Type]
FROM [dbo].[T_Values] AS [t0]
WHERE (NOT (EXISTS(
SELECT NULL AS [EMPTY]
FROM [dbo].[T_Values] AS [t1]
WHERE DATEADD(ms, (CONVERT(BigInt,#p0 * 60000)) % 86400000,
DATEADD(day, (CONVERT(BigInt,#p0 * 60000)) / 86400000, [t0].[TimeStamp])) = [t1].[TimeStamp]
and [t1].BelongTo = 1
))) AND ([t0].[BelongTo] = #p1)',N'#p0 float,#p1 int',#p0=15,#p1=1
other words:
SELECT TimeStamp
FROM [dbo].[T_Values] AS [t0]
WHERE NOT( (EXISTS (SELECT NULL AS [EMPTY]
FROM [dbo].[T_Values] AS [t1]
WHERE DATEADD(MINUTE, 15, [t0].[TimeStamp]) = [t1].[TimeStamp])))
AND ([t0].[BelongTo] = 1)
shoud change to
SELECT TimeStamp
FROM [dbo].[T_Values] AS [t0]
WHERE NOT( (EXISTS (SELECT NULL AS [EMPTY]
FROM [dbo].[T_Values] AS [t1]
WHERE DATEADD(MINUTE, 15, [t0].[TimeStamp]) = [t1].[TimeStamp] and [t1].BelongTo=1)))
AND ([t0].[BelongTo] = 1)
but I am still thinking how can I add this to linkq

Well adding the extra where clause is easy (and I'll remove the pointless anonymous type at the same time):
var gaps = from p1 in db.T_Values
where p1.BelongTo == 1
join p2 in db.T_Values
on p1.TimeStamp.AddMinutes(15) equals p2.TimeStamp
into grups
where !grups.Any()
select p1;
I'm not sure why you're grouping though... I would have thought this would be simpler:
var gaps = from p1 in db.T_Values
where p1.BelongTo == 1
where !db.T_Values.Any(p2 => p1.TimeStamp.AddMinutes(15) == p2.Timestamp)
select p1;
As for performance - look at the generated SQL and how it looks in SQL profiler.
EDIT: If you need the BelongTo check in both versions (makes sense) I'd suggest this:
var sequence = db.T_Values.Where(p => p.BelongTo == 1);
var gaps = from p1 in sequence
where !sequence.Any(p2 => p1.TimeStamp.AddMinutes(15) == p2.Timestamp)
select p1;

How about
var gaps = dbT_Values.Take(dbT_Values.Count()-1)
.Select((p, index) => new {P1 = p, P2 = dbT_Values.ElementAt(index + 1)})
.Where(p => p.P1.BelongsTo == 1 && p.P1.TimeStamp.AddMinutes(15).Equals(p.P2.TimeStamp)).Select(p => p.P1);

Related

How to Convert Sub Query to Joins for Fast Result?

I want to convert Sub Queries into joins to improve performance.
The following sub-queries take to long to load.
SELECT c.tank_name, c.fuel_type, c.capacity, c.tank_id,
(SELECT TOP 1 b.Level
from Microframe.dbo.TrackMessages b
where b.IMEI = a.IMEI
AND b.Timestamp >= #Start
order by b.Timestamp ) AS Level,
(select top 1 b.Timestamp
from Microframe.dbo.TrackMessages b
where b.IMEI = a.IMEI
AND b.Timestamp >= #Start
order by b.Timestamp ) AS TimeStamp,
(SELECT top 1 b.Temp
from Microframe.dbo.TrackMessages b
where b.IMEI = a.IMEI
AND b.Timestamp >= #Start
order by b.Timestamp ) AS Temp
FROM GatexServerDB.dbo.device as a
JOIN GatexReportsDB.dbo.tbl_static_tank_info as c ON c.tank_id = a.owner_id
WHERE c.client_id = 65
AND a.IMEI IS NOT NULL
AND c.tank_id IN ({Tanks})

You can move the subquery to the FROM clause and use CROSS APPLY. Since you seem to be dealing with IoT data though, you should investigate T-SQL's ranking, windowing and analytic functions. Performance will depend heavily on the table's indexes.
Given these tables :
create table #TrackMessages (
Message_ID bigint primary key,
imei nvarchar(50) ,
[timestamp] datetime2,
Level int,
temp numeric(5,2)
);
create table #device (
imei nvarchar(50) primary key,
owner_id int
);
create table #tbl_static_tank_info (
tank_id int not null primary key,
tank_name nvarchar(20),
fuel_type nvarchar(20),
capacity numeric(9,2),
owner_id int,
client_id int
)
And indexes :
create nonclustered index IX_MSG_IMEI_Time on #TrackMessages (imei,timestamp) include(level,temp) ;
create INDEX IX_Device_OwnerID on #device (Owner_ID)
create INDEX IX_Tank_Client on #tbl_static_tank_info (Client_ID);
create INDEX IX_Tank_Owner on #tbl_static_tank_info (Owner_ID);
The TOP 1 query would look like this :
SELECT c.tank_name, c.fuel_type, c.capacity, c.tank_id,
Level,
TimeStamp,
Temp
FROM #device as a
inner JOIN #tbl_static_tank_info as c ON c.tank_id = a.owner_id
cross apply (SELECT top 1 imei,Temp,Level,timestamp
from #TrackMessages b
where b.IMEI = a.imei
AND b.Timestamp >= #start
order by b.Timestamp ) msg
WHERE c.client_id = 65
AND a.IMEI IS NOT NULL
AND c.tank_id IN (1,5,7)
If there is a 1-M relation between tanks, devices and messages, the FIRST_VALUE analytic function can be used to return the first record ber device, without using a subquery :
SELECT c.tank_name, c.fuel_type, c.capacity, c.tank_id,
first_value(Temp) over (partition by b.imei order by timestamp) as temp,
first_value(Level) over (partition by b.imei order by timestamp) as level,
min(timestamp) over (partition by b.imei) as timestamp
from #TrackMessages b
inner join #device as a on b.IMEI = a.imei
inner JOIN #tbl_static_tank_info as c ON c.tank_id = a.owner_id
WHERE c.client_id = 65
AND a.IMEI IS NOT NULL
AND c.tank_id IN (1,5,7)
Performance will depend heavily on the indexes, the table statistics and whether the index and OVER order matches.
This query can be modified to return both the first and last value per device using LAST_VALUE :
SELECT c.tank_name, c.fuel_type, c.capacity, c.tank_id,
first_value(Temp) over (partition by b.imei order by timestamp) as StartTemp,
first_value(Level) over (partition by b.imei order by timestamp) as StartLevel,
min(timestamp) over (partition by b.imei) as StartTime,
last_value(Temp) over (partition by b.imei order by timestamp) as EndTemp,
lastt_value(Level) over (partition by b.imei order by timestamp) as EndLevel,
max(timestamp) over (partition by b.imei) as EndTime
from #TrackMessages b
inner join #device as a on b.IMEI = a.imei
inner JOIN #tbl_static_tank_info as c ON c.tank_id = a.owner_id
WHERE c.client_id = 65
AND a.IMEI IS NOT NULL
AND c.tank_id IN (1,5,7)
the server would have to sort the measurements both by ascending timestamp order (that's what the IX_MSG_IMEI_Time index already does) and descending order.

Here's a solution with CROSS APPLY which is like a function you can declare on the go and use it as a joining clause. You can change CROSS APPLY to OUTER APPLY if the returning set might not exist, in this case if there might not be any record on TrackMessages for a particular IMEI (will return NULL values).
SELECT
c.tank_name,
c.fuel_type,
c.capacity,
c.tank_id,
T.Level,
T.Timestamp,
T.Temp
FROM GatexServerDB.dbo.device as a
JOIN GatexReportsDB.dbo.tbl_static_tank_info as c ON c.tank_id = a.owner_id
CROSS APPLY (
SELECT TOP 1 -- Retrieve only the first record
-- And return as many columns as you need
b.Level,
b.Timestamp,
b.Temp
FROM
Microframe.dbo.TrackMessages AS b
WHERE
a.IMEI = b.IMEI AND -- With matching IMEI
b.Timestamp >= #Start
ORDER BY
b.Timestamp) T -- Ordered by Timestamp
WHERE c.client_id = 65
AND a.IMEI IS NOT NULL
AND c.tank_id IN ({Tanks})
However I believe the key point here would be indexes on your tables. If you are already sure that the problem is the subquery, then make sure that TrackMessages has the following index:
CREATE NONCLUSTERED INDEX NCI_TrackMessages_IMEI_TimeStamp ON Microframe.dbo.TrackMessages (IMEI, Timestamp)
Indexes have pros and cons, make sure to check them out before creating or dropping one.

Not having the structures, my guees for the solution is:
WITH CTE AS
(SELECT B.IMEI,
b.Level,
b.Timetamp,
b.Temp,
ROW_NUMBER() OVER (PARTITION BY b.IMEI ORDER BY Timestamp) AS Row
FROM Microframe.dbo.TrackMessages b
WHERE b.Timestamp >= #Start
)
SELECT c.tank_name, c.fuel_type, c.capacity, c.tank_id,
CTE.Level, CTE.Timestamp, CTE.Temp
FROM GatexServerDB.dbo.device as a
INNER JOIN GatexReportsDB.dbo.tbl_static_tank_info as c ON c.tank_id = a.owner_id
INNER JOIN CTE ON CTE.IMEI = a.IMEI
WHERE c.client_id = 65
AND a.IMEI IS NOT NULL
AND c.tank_id IN ({Tanks})
AND CTE.Row = 1;
I cannot test it but it should be very close to the solution. Please, confirm if it works.

You can compare and go with either of the below solutions
The JOIN way where ordering is done via row-number windowing function
SELECT * FROM
(
SELECT
c.tank_name,
c.fuel_type,
c.capacity,
c.tank_id,
Level=b.Level,
TimeStamp=b.Timestamp,
Temp=b.Temp,
r=Row_number() over ( order by b.timestamp)
FROM GatexServerDB.dbo.device as a
JOIN GatexReportsDB.dbo.tbl_static_tank_info as c
ON c.tank_id = a.owner_id
JOIN Microframe.dbo.TrackMessages as b
ON b.IMEI = a.IMEI AND b.Timestamp >= #Start
WHERE c.client_id = 65
AND a.IMEI IS NOT NULL
AND c.tank_id IN ({Tanks})
)T
where r=1
or the CROSS APPLY way like below
SELECT * FROM
(
SELECT
c.tank_name, c.fuel_type, c.capacity, c.tank_id
FROM GatexServerDB.dbo.device as a
JOIN GatexReportsDB.dbo.tbl_static_tank_info as c
ON c.tank_id = a.owner_id
AND c.client_id = 65
AND a.IMEI IS NOT NULL
AND c.tank_id IN ({Tanks})
) A
CROSS APPLY
(
SELECT
TOP 1
b.Level, b.Timestamp,b.Temp
FROM Microframe.dbo.TrackMessages b
WHERE b.IMEI = a.IMEI
AND b.Timestamp >= #Start
ORDER BY b.Timestamp
)D

Improving SQL query generated by LINQ to EF

I have student table and I want to write a Linq query to get multiple counts. The SQL query that Linq generates is too complicated and un-optimized.
Following is definition of my table:
[Id] [int] IDENTITY(1,1) NOT NULL,
[Name] [varchar](100) NULL,
[Age] [int] NULL,
I need to get one count with students with name = test and one count for students with age > 10.
This is one of the query I have tried:
var sql = from st in school.Students
group st by 1 into grp
select new
{
NameCount = grp.Count(k => k.Name == "Test"),
AgeCount = grp.Count(k => k.Age > 5)
};
The SQL query that is generated is:
SELECT
[Limit1].[C1] AS [C1],
[Limit1].[C2] AS [C2],
[Limit1].[C3] AS [C3]
FROM ( SELECT TOP (1)
[Project2].[C1] AS [C1],
[Project2].[C2] AS [C2],
(SELECT
COUNT(1) AS [A1]
FROM [dbo].[Student] AS [Extent3]
WHERE ([Project2].[C1] = 1) AND ([Extent3].[Age] > 5)) AS [C3]
FROM ( SELECT
[Distinct1].[C1] AS [C1],
(SELECT
COUNT(1) AS [A1]
FROM [dbo].[Student] AS [Extent2]
WHERE ([Distinct1].[C1] = 1) AND (N'Test' = [Extent2].[Name])) AS [C2]
FROM ( SELECT DISTINCT
1 AS [C1]
FROM [dbo].[Student] AS [Extent1]
) AS [Distinct1]
) AS [Project2]
) AS [Limit1]
For me this seems to be complex. This can be achieved by following simple query:
select COUNT(CASE WHEN st.Name = 'Test' THEN 1 ELSE 0 END) NameCount,
COUNT(CASE WHEN st.Age > 5 THEN 1 ELSE 0 END) AgeCount from Student st
Is there a way in LINQ with which the SQL query that gets generated will have both the aggregation rather than having it two separate queries joined with nested queries?

From my experience with EF6, conditional Sum (i.e. Sum(condition ? 1 : 0)) is translated much better to SQL than Count with predicate (i.e. Count(condition)):
var query =
from st in school.Students
group st by 1 into grp
select new
{
NameCount = grp.Sum(k => k.Name == "Test" ? 1 : 0),
AgeCount = grp.Sum(k => k.Age > 5 ? 1 : 0)
};
Btw, your SQL example should be using SUM as well. In order to utilize the SQL COUNT which excludes NULLs, it should be ELSE NULL or no ELSE:
select COUNT(CASE WHEN st.Name = 'Test' THEN 1 END) NameCount,
COUNT(CASE WHEN st.Age > 5 THEN 1 END) AgeCount
from Student st
But there is no equivalent LINQ construct for this, hence no way to let EF6 generate such translation. But IMO the SUM is good enough equivalent.

A much simpler query results from not using the unnecessary group by and just querying the table twice in the select:
var sql = from st in school.Students.Take(1)
select new {
NameCount = school.Students.Count(k => k.Name == "Test"),
AgeCount = school.Students.Count(k => k.Age > 5)
};

Entity Framework SUM CASE not optimized

I'm trying to write a simple SQL query in LinQ, and no matter how hard I try, I always get a complex query.
Here is the SQL I am trying to achieve (this is not what I'm getting):
SELECT
ClearingAccounts.ID,
SUM(CASE WHEN Payments.StatusID = 1 THEN Payments.TotalAmount ELSE 0 END) AS Sum1,
SUM(CASE WHEN DirectDebits.StatusID = 2 THEN DirectDebits.TotalAmount ELSE 0 END) AS Sum2,
SUM(CASE WHEN Payments.StatusID = 2 THEN Payments.TotalAmount ELSE 0 END) AS Sum3,
SUM(CASE WHEN DirectDebits.StatusID = 1 THEN DirectDebits.TotalAmount ELSE 0 END) AS Sum4
FROM ClearingAccounts
LEFT JOIN Payments ON Payments.ClearingAccountID = ClearingAccounts.ID
LEFT JOIN DirectDebits ON DirectDebits.ClearingAccountID = ClearingAccounts.ID
GROUP BY ClearingAccounts.ID
Here is the code:
from clearingAccount in clearingAccounts
let payments = clearingAccount.Payments
let directDebits = clearingAccount.DirectDebits
select new
{
ID = clearingAccount.ID,
Sum1 = payments.Sum(p => p.StatusID == 1 ? p.TotalAmount : 0),
Sum2 = directDebits.Sum(p => p.StatusID == 2 ? p.TotalAmount : 0),
Sum3 = payments.Sum(p => p.StatusID == 2 ? p.TotalAmount : 0),
Sum4 = directDebits.Sum(p => p.StatusID == 1 ? p.TotalAmount : 0),
}
The generated query gets the data from the respective table for each sum, so four times. I'm not sure if it's even possible to optimize this?
EDIT Here the is generated query:
SELECT
[Project5].[ID] AS [ID],
[Project5].[C1] AS [C1],
[Project5].[C2] AS [C2],
[Project5].[C3] AS [C3],
[Project5].[C4] AS [C4]
FROM ( SELECT
[Project4].[ID] AS [ID],
[Project4].[C1] AS [C1],
[Project4].[C2] AS [C2],
[Project4].[C3] AS [C3],
(SELECT
SUM([Filter5].[A1]) AS [A1]
FROM ( SELECT
CASE WHEN (1 = [Extent5].[StatusID]) THEN [Extent5].[TotalAmount] ELSE cast(0 as decimal(18)) END AS [A1]
FROM [dbo].[DirectDebits] AS [Extent5]
WHERE [Project4].[ID] = [Extent5].[ClearingAccountID]
) AS [Filter5]) AS [C4]
FROM ( SELECT
[Project3].[ID] AS [ID],
[Project3].[C1] AS [C1],
[Project3].[C2] AS [C2],
(SELECT
SUM([Filter4].[A1]) AS [A1]
FROM ( SELECT
CASE WHEN (2 = [Extent4].[StatusID]) THEN [Extent4].[TotalAmount] ELSE cast(0 as decimal(18)) END AS [A1]
FROM [dbo].[Payments] AS [Extent4]
WHERE [Project3].[ID] = [Extent4].[ClearingAccountID]
) AS [Filter4]) AS [C3]
FROM ( SELECT
[Project2].[ID] AS [ID],
[Project2].[C1] AS [C1],
(SELECT
SUM([Filter3].[A1]) AS [A1]
FROM ( SELECT
CASE WHEN (2 = [Extent3].[StatusID]) THEN [Extent3].[TotalAmount] ELSE cast(0 as decimal(18)) END AS [A1]
FROM [dbo].[DirectDebits] AS [Extent3]
WHERE [Project2].[ID] = [Extent3].[ClearingAccountID]
) AS [Filter3]) AS [C2]
FROM ( SELECT
[Project1].[ID] AS [ID],
(SELECT
SUM([Filter2].[A1]) AS [A1]
FROM ( SELECT
CASE WHEN (1 = [Extent2].[StatusID]) THEN [Extent2].[TotalAmount] ELSE cast(0 as decimal(18)) END AS [A1]
FROM [dbo].[Payments] AS [Extent2]
WHERE [Project1].[ID] = [Extent2].[ClearingAccountID]
) AS [Filter2]) AS [C1]
FROM ( SELECT
[Extent1].[ID] AS [ID]
FROM [dbo].[ClearingAccounts] AS [Extent1]
WHERE ([Extent1].[CustomerID] = 3) AND ([Extent1].[Deleted] <> 1)
) AS [Project1]
) AS [Project2]
) AS [Project3]
) AS [Project4]
) AS [Project5]

Edit
Note that as per #usr's comment, that your original Sql Query is broken. By LEFT OUTER joining on two independent tables, and then grouping on the common join key, as soon as one of the DirectDebits or Payments tables returns more than one row, you will erroneously duplicate the TotalAmount value in the 'other' SUMmed colums (and vice versa). e.g. If a given ClearingAccount has 3 DirectDebits and 4 Payments, you will get a total of 12 rows (whereas you should be summing 3 and 4 rows independently for the two tables). A better Sql Query would be:
WITH ctePayments AS
(
SELECT
ClearingAccounts.ID,
-- Note the ELSE 0 projection isn't required as nulls are eliminated from aggregates
SUM(CASE WHEN Payments.StatusID = 1 THEN Payments.TotalAmount END) AS Sum1,
SUM(CASE WHEN Payments.StatusID = 2 THEN Payments.TotalAmount END) AS Sum3
FROM ClearingAccounts
INNER JOIN Payments ON Payments.ClearingAccountID = ClearingAccounts.ID
GROUP BY ClearingAccounts.ID
),
cteDirectDebits AS
(
SELECT
ClearingAccounts.ID,
SUM(CASE WHEN DirectDebits.StatusID = 2 THEN DirectDebits.TotalAmount END) AS Sum2,
SUM(CASE WHEN DirectDebits.StatusID = 1 THEN DirectDebits.TotalAmount END) AS Sum4
FROM ClearingAccounts
INNER JOIN DirectDebits ON DirectDebits.ClearingAccountID = ClearingAccounts.ID
GROUP BY ClearingAccounts.ID
)
SELECT ca.ID, COALESCE(p.Sum1, 0) AS Sum1, COALESCE(d.Sum2, 0) AS Sum2,
COALESCE(p.Sum3, 0) AS Sum3, COALESCE(d.Sum4, 0) AS Sum4
FROM
ClearingAccounts ca
LEFT OUTER JOIN ctePayments p
ON ca.ID = p.ID
LEFT OUTER JOIN cteDirectDebits d
ON ca.ID = d.ID;
-- GROUP BY not required, since we have already guaranteed at most one row
-- per joined table in the CTE's, assuming ClearingAccounts.ID is unique;
You'll want to fix and test this with test cases before you even contemplate conversion to LINQ.
Old Answer(s)
The Sql construct:
SELECT SUM(CASE WHEN ... THEN 1 ELSE 0 END) AS Something
when applied in a SELECT list, is a common hack 'alternative' to pivot data from the 'greater' select into columns which meet the projection criteria (and hence the zero if not matched) . It isn't really a sum at all, its a 'matched' count.
With regards to optimizing the Sql generated, another alternative would be to materialize the data after joining and grouping (and of course, if there is a predicate WHERE clause, apply that in Sql too via IQueryable), and then do the conditional summation in memory:
var result2 = Db.ClearingAccounts
.Include(c => c.Payments)
.Include(c => c.DirectDebits)
.GroupBy(c => c.Id)
.ToList() // or any other means to force materialization here.
.ToDictionary(
grp => grp.Key,
grp => new
{
PaymentsByStatus = grp.SelectMany(x => x.Payments)
.GroupBy(p => p.StatusId),
DirectDebitByStatus = grp.SelectMany(x => x.Payments)
.GroupBy(p => p.StatusId),
})
.Select(ca => new
{
ID = ca.Key,
Sum1 = ca.Value.PaymentsByStatus.Where(pbs => pbs.Key == 1)
.Select(pbs => pbs.Select(x => x.TotalAmount).Sum()),
Sum2 = ca.Value.DirectDebitByStatus.Where(pbs => pbs.Key == 2)
.Select(ddbs => ddbs.Select(x => x.TotalAmount).Sum()),
Sum3 = ca.Value.PaymentsByStatus.Where(pbs => pbs.Key == 2)
.Select(pbs => pbs.Select(x => x.TotalAmount).Sum()),
Sum4 = ca.Value.DirectDebitByStatus.Where(pbs => pbs.Key == 1)
.Select(ddbs => ddbs.Select(x => x.TotalAmount).Sum())
});
However, personally, I would leave this pivot projection directly in Sql, and then use something like SqlQuery to then deserialize the result back from Sql
directly into the final Entity type.

1)
Add AsNoTracking in EF to avoid tracking changes.
Check that you have indexes on the columns you are using for the JOINs. Especially the column that you are using to group by. Profile the query and optimize it. EF has also overhead over a stored procedure.
or
2) If you cannot find a way to make it as fast as you need, create a stored procedure and call it from EF. Even the same query will be faster.

How can I optimise this Linq query to remove the unnecessary SELECT Count(*)

I have three tables, Entity, Period and Result. There is a 1:1 mapping between Entity and Period and a 1:Many between Period and Result.
This is the linq query:
int id = 100;
DateTime start = DateTime.Now;
from p in db.Periods
where p.Entity.ObjectId == id && p.Start == start
select new { Period = p, Results = p.Results })
This is relevant parts of the generated SQL:
SELECT [t0].[EntityId], [t2].[PeriodId], [t2].[Value], (
SELECT COUNT(*)
FROM [dbo].[Result] AS [t3]
WHERE [t3].[PeriodId] = [t0].[Id]
) AS [value2]
FROM [dbo].[Period] AS [t0]
INNER JOIN [dbo].[Entity] AS [t1] ON [t1].[Id] = [t0].[EntityId]
LEFT OUTER JOIN [dbo].[Result] AS [t2] ON [t2].[PeriodId] = [t0].[Id]
WHERE ([t1].[ObjectId] = 100) AND ([t0].[Start] = '2010-02-01 00:00:00')
Where is the SELECT Count(*) coming from and how can I get rid of it? I don't need a count of the "Results" for each "Period" and it slows the query down by an order of magnitude.

Consider using the Context.LoadOptions and specifying for Period to LoadWith(p => p.Results) to eager load the period with results without needing to project into an anonymous type.

What's the difference between these two LINQtoSQL statements?

These two statements look the same logically to me, but they're resulting in different SQL being generated:
#1
var people = _DB.People.Where(p => p.Status == MyPersonEnum.STUDENT.ToString());
var ids = people.Select(p => p.Id);
var cars = _DB.Cars.Where(c => ids.Contains(c.PersonId));
#2
string s = MyPersonEnum.STUDENT.ToString();
var people = _DB.People.Where(p => p.Status == s);
var ids = people.Select(p => p.Id);
var cars = _DB.Cars.Where(c => ids.Contains(c.PersonId));
Example #1 doesn't work, but example #2 does.
The generated SQL for the var people query is identical for both, but the SQL in the final query differs like this:
#1
SELECT [t0].[PersonId], [t0].[etc].....
FROM [Cars] AS [t0]
WHERE EXISTS(
SELECT NULL AS [EMPTY]
FROM [People] AS [t1]
WHERE ([t1].[Id] = [t0].[PersonId]) AND ([t1].[Status] = (CONVERT(NVarChar,#p0)))
)
#2
SELECT [t0].[PersonId], [t0].[etc].....
FROM [Cars] AS [t0]
WHERE EXISTS(
SELECT NULL AS [EMPTY]
FROM [People] AS [t1]
WHERE ([t1].[Id] = [t0].[PersonId]) AND ([t1].[Status] = #p0)
)
Why is there this difference?
Edit:
Up until now all I've done to get the SQL generated is to inspect the queryable in the debugger. However, after setting up a logger as Jon suggested, it seems that the real sql executed is different.
#1
SELECT [t1].[Id], [t1].etc ... [t0].Id, [t1].etc ...
FROM [Cars] AS [t0], [People] AS [t1]
WHERE ([t1].[Id] = [t0].[PersonId]) AND (EXISTS(
SELECT NULL AS [EMPTY]
FROM [People] AS [t2]
WHERE ([t2].[Id] = [t0].[PersonId]) AND ([t2].[Status] = (CONVERT(NVarChar,#p0)))
)) AND ([t1].[Status] = #p1)
-- #p0: Input Int (Size = 0; Prec = 0; Scale = 0) [2]
-- #p1: Input NVarChar (Size = 7; Prec = 0; Scale = 0) [STUDENT]
#2
SELECT [t1].[Id], [t1].etc ... [t0].Id, [t1].etc ...
FROM [Cars] AS [t0], [People] AS [t1]
WHERE ([t1].[Id] = [t0].[PersonId]) AND (EXISTS(
SELECT NULL AS [EMPTY]
FROM [People] AS [t2]
WHERE ([t2].[Id] = [t0].[PersonId]) AND ([t2].[Status] = #p0)
)) AND ([t1].[Status] = #p1)
-- #p0: Input NVarChar (Size = 7; Prec = 0; Scale = 0) [STUDENT]
-- #p1: Input NVarChar (Size = 7; Prec = 0; Scale = 0) [STUDENT]

First, think of dual nature of e Enum:
enum MyPersonEnum
{
STUDENT, // implicit 1
TEACHER, // implicit 2
DIRECTOR = 10 // explicit 10
}
...
Assert.AreEqual(1, (int)MyPersonEnum.STUDENT);
Assert.AreEqual("STUDENT", MyPersonEnum.STUDENT.ToString());
In the second example, C# have converted Enum to string, so no conversion needed, and it's assumed that your database People.Status column accepts "STUDENT", "TEACHER", "DIRECTOR" strings as valid values in the logic.
The difference is, enum internal representation in CLR is integer, and the first example, #p parameter is passed as an integer, it's an L2S query builder behaviour, that's why the conversion.
The first one would work, if your database column was an int that takes values assigned to the Enum members {1,2,10} in my example.

No, they're different. In the first version, the expression MyPersonEnum.STUDENT.ToString() is within the expression tree - it's part of what LINQ to SQL has to convert into SQL. I'd be interested to see what #p0 is when the query is executed...
In the second version, you've already evaluated the expression, so LINQ to SQL just sees a reference to a variable which is already a string.
We know that they mean the same thing, but presumably LINQ to SQL doesn't have quite enough knowledge to understand that.
Out of interest, do both of them work?
EDIT: Okay, so the second version works. I suggest you use that form then :) In an ideal world, both would work - but in this case it seems you need to help LINQ to SQL a bit.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Querying in advance with linq2sql for data with gaps - c#

How about var gaps = dbT_Values.Take(dbT_Values.Count()-1) .Select((p, index) => new {P1 = p, P2 = dbT_Values.ElementAt(index + 1)}) .Where(p => p.P1.BelongsTo == 1 && p.P1.TimeStamp.AddMinutes(15).Equals(p.P2.TimeStamp)).Select(p => p.P1);

Related

How to Convert Sub Query to Joins for Fast Result?

Improving SQL query generated by LINQ to EF

Entity Framework SUM CASE not optimized

How can I optimise this Linq query to remove the unnecessary SELECT Count(*)

What's the difference between these two LINQtoSQL statements?

Categories

Resources