Linq to SQL | Top 5 Distinct Order by Date - c#

I have an SQL query which I want to call from LINQ to SQL in asp.net application.
SELECT TOP 5 *
FROM (SELECT SongId,
DateInserted,
ROW_NUMBER()
OVER(
PARTITION BY SongId
ORDER BY DateInserted DESC) rn
FROM DownloadHistory) t
WHERE t.rn = 1
ORDER BY DateInserted DESC
I don't know whether its possible or not through linq to sql, if not then please provide any other way around.

I think you'd have to change the SQL partition to a Linq group-by. (Effectively all the partition does is group by song, and select the newest row for each group.) So something like this:
IEnumerable<DownloadHistory> top5Results = DownloadHistory
// group by SongId
.GroupBy(row => row.SongId)
// for each group, select the newest row
.Select(grp =>
grp.OrderByDescending(historyItem => historyItem.DateInserted)
.FirstOrDefault()
)
// get the newest 5 from the results of the newest-1-per-song partition
.OrderByDescending(historyItem => historyItem.DateInserted)
.Take(5);

Although McGarnagle answer solves the problem, but when i see the execution plan for the two queries, it was really amazing to see that linq to sql was really too slow as compare to native sql queries. See the generated query for the above linq to sql:
--It took 99% of the two execution
SELECT TOP (5) [t3].[SongId], [t3].[DateInserted]
FROM (
SELECT [t0].[SongId]
FROM [dbo].[DownloadHistory] AS [t0]
GROUP BY [t0].[SongId]
) AS [t1]
OUTER APPLY (
SELECT TOP (1) [t2].[SongId], [t2].[DateInserted]
FROM [dbo].[DownloadHistory] AS [t2]
WHERE [t1].[SongId] = [t2].[SongId]
ORDER BY [t2].[DateInserted] DESC
) AS [t3]
ORDER BY [t3].[DateInserted] DESC
--It took 1% of the two execution
SELECT TOP 5 t.SongId,t.DateInserted
FROM (SELECT SongId,
DateInserted,
ROW_NUMBER()
OVER(
PARTITION BY SongId
ORDER BY DateInserted DESC) rn
FROM DownloadHistory) t
WHERE t.rn = 1
ORDER BY DateInserted DESC

Related

How to optimize a slow running LINQ query that runs quickly in SQL Server

I am trying to write a LINQ query that returns rows (items) based on certain conditions on one of the sub-collection of the row (item). The query I wrote works but performs very poorly in LINQ. However if I run the generated query in SQL Server, it almost instantly returns the desired rows.
My assumption is that the query is slow because of the OrderByDescending() that is performed multiple times. Is this correct and how can I improve the performance of this query?
Edit: The goal of this query is to select all Foo objects that have a Bar object
where the price property of the last Baz object is within a lower and upper bound value.
var query = dbContext.Foos.AsQueryable();
query = query.Where(e => e.Bars.Any(p => p.Bazs.OrderByDescending(s => s.BazId).First().Price >= filter.LowerBoundPrice));
query = query.Where(e => e.Bars.Any(p => p.Bazs.OrderByDescending(s => s.BazId).First().Price <= filter.UpperBoundPrice));
Edit2: This is the generated SQL query.
-- Region Parameters
DECLARE #p0 Decimal(6,2) = 2000
DECLARE #p1 Decimal(6,2) = 1000
-- EndRegion
SELECT [t0].[FooId]
FROM [Foo] AS [t0]
WHERE (EXISTS(
SELECT NULL AS [EMPTY]
FROM [Bar] AS [t1]
WHERE (((
SELECT [t3].[Price]
FROM (
SELECT TOP (1) [t2].[Price]
FROM [Baz] AS [t2]
WHERE [t2].[BarId] = [t1].[BarId]
ORDER BY [t2].[BazId] DESC
) AS [t3]
)) <= #p0) AND ([t1].[FooId] = [t0].[FooId])
)) AND (EXISTS(
SELECT NULL AS [EMPTY]
FROM [Bar] AS [t4]
WHERE (((
SELECT [t6].[Price]
FROM (
SELECT TOP (1) [t5].[Price]
FROM [Baz] AS [t5]
WHERE [t5].[BarId] = [t4].[BarId]
ORDER BY [t5].[BazId] DESC
) AS [t6]
)) >= #p1) AND ([t4].[FooId] = [t0].[FooId])
))

LINQ query losing select when joining two queries

Just to set the context a little, I'm trying to use queries with mysql that use Late row lookup as shown in this article
https://explainextended.com/2009/10/23/mysql-order-by-limit-performance-late-row-lookups/
but that's a story for another day but the idea is that you do a key search on the table and then join it onto the whole table to force a late row lookup and the problem is coming from my LINQ queries when joined together.
-- Key search query --
Calling Code
IQueryable<int> keySearch = _defaultQueryFactory.Load(ContextEnums.ClientContext, MapEntityToDTO(), whereStatement, clientID).OrderBy(orderBy).Skip(startRow).Take(pageSize).Select(x => x.ID);
Resulting Query
SELECT
`Extent1`.`Sys_InvoiceID`
FROM `tblinvoice` AS `Extent1`
WHERE 3 = `Extent1`.`FK_StatusID`
ORDER BY
`Extent1`.`InvoiceDate` ASC LIMIT 0,430
-- Full Table Search --
Calling Code
IQueryable<InvoiceDTOModel> tableSearch = _defaultQueryFactory.Load(ContextEnums.ClientContext, MapEntityToDTO(), null, clientID, true).OrderBy(orderBy);
Resulting Query
SELECT
`Extent1`.`ID`,
`Extent1`.`C1`,
`Extent1`.`C2`,
`Extent1`.`C3`,
`Extent1`.`C4`,
`Extent1`.`C5`,
`Extent1`.`C6`,
`Extent2`.`SID`,
`Extent2`.`S1,
`Extent2`.`S2`,
`Extent2`.`S3`,
`Extent3`.`EID`,
`Extent3`.`E1`,
`Extent4`.`DID`,
`Extent4`.`D1`,
`Extent4`.`D2`,
`Extent4`.`D3`,
`Extent4`.`D4`,
`Extent4`.`D5`
FROM `tbl1` AS `Extent1` INNER JOIN `tbl2` AS `Extent2` ON `Extent1`.`SID` = `Extent2`.`SID` INNER JOIN `tbl3` AS `Extent3` ON `Extent1`.`EID` = `Extent3`.`EID` LEFT OUTER JOIN `tbl4` AS `Extent4` ON `Extent1`.`ID` = `Extent4`.`DID`
ORDER BY
`Extent1`.`C4` ASC
-- Joining the Two Together --
Calling Code
keySearch.Join(tableSearch, key => key, table => table.ID, (key, table) => table).OrderBy(orderBy).ToListAsync();
Resulting Query
SELECT
`Join3`.`ID`,
`Join3`.`C1`,
`Join3`.`C1`,
`Join3`.`C1`,
`Join3`.`C1`,
`Join3`.`C1`,
`Join3`.`C1`,
`Join3`.`SID`,
`Join3`.`S1,
`Join3`.`S2`,
`Join3`.`S3`,
`Join3`.`EID`,
`Join3`.`E1`,
`Join3`.`DID`,
`Join3`.`D1`,
`Join3`.`D2`,
`Join3`.`D3`,
`Join3`.`D4`,
`Join3`.`D5`
FROM (
`Extent1`.`ID`,
`Extent1`.`C1`,
`Extent1`.`C2`,
`Extent1`.`C3`,
`Extent1`.`C4`,
`Extent1`.`C5`,
`Extent1`.`C6`
FROM `tblinvoice` AS `Extent1`
WHERE 3 = `Extent1`.`EID`
ORDER BY
`Extent1`.`C4` ASC LIMIT 0,430) AS `Limit1` INNER JOIN (SELECT
`Extent1`.`ID`,
`Extent1`.`C1`,
`Extent1`.`C2`,
`Extent1`.`C3`,
`Extent1`.`C4`,
`Extent1`.`C5`,
`Extent1`.`C6`,
`Extent2`.`SID`,
`Extent2`.`S1,
`Extent2`.`S2`,
`Extent2`.`S3`,
`Extent3`.`EID`,
`Extent3`.`E1`,
`Extent4`.`DID`,
`Extent4`.`D1`,
`Extent4`.`D2`,
`Extent4`.`D3`,
`Extent4`.`D4`,
`Extent4`.`D5`
FROM `tbl1` AS `Extent2` INNER JOIN `tbl2` AS `Extent3` ON `Extent2`.`SID` = `Extent3`.`SID` INNER JOIN `tblstatus` AS `Extent4` ON `Extent2`.`EID` = `Extent4`.`EID` LEFT OUTER JOIN `tbl3` AS `Extent5` ON `Extent2`.`ID` = `Extent5`.`DID`) AS `Join3` ON `Limit1`.`ID` = `Join3`.`ID`
ORDER BY
`Join3`.`C4` ASC
Basically the inner select brings back
FROM (
`Extent1`.`ID`,
`Extent1`.`C1`,
`Extent1`.`C2`,
`Extent1`.`C3`,
`Extent1`.`C4`,
`Extent1`.`C5`,
`Extent1`.`C6`
FROM `tblinvoice` AS `Extent1`
WHERE 3 = `Extent1`.`EID`
ORDER BY
`Extent1`.`C4` ASC LIMIT 0,430) AS `Limit1`
Instead of
FROM (
`Extent1`.`ID`,
FROM `tblinvoice` AS `Extent1`
WHERE 3 = `Extent1`.`EID`
ORDER BY
`Extent1`.`C4` ASC LIMIT 0,430) AS `Limit1`
--Note--
The actual query selects around 15 columns, I've just shortened it to this example, it has an effect on the search as the dataset grows in size and it shouldn't be selecting all of the fields but i suspect there's an error in my join.
Any help is much appreciated.

Entity Framework v6.1 query compilation performance

I am confused how EF LINQ queries are compiled and executed. When I run a piece of program in LINQPad couple of times, I get varied performance results (each time the same query takes different amount of time). Please find below my test execution environment.
tools used: EF v6.1 & LINQPad v5.08.
Ref DB : ContosoUniversity DB downloaded from MSDN.
For queries, I am using Persons, Courses & Departments tables from the above DB; see below.
Now, I have below data:
Query goal: get the second person and associated departments.
Query:
var test = (
from p in Persons
join d in Departments on p.ID equals d.InstructorID
select new {
person = p,
dept = d
}
);
var result = (from pd in test
group pd by pd.person.ID into grp
orderby grp.Key
select new {
ID = grp.Key,
FirstName = grp.First().person.FirstName,
Deps = grp.Where(x => x.dept != null).Select(x => x.dept).Distinct().ToList()
}).Skip(1).Take(1).ToList();
foreach(var r in result)
{
Console.WriteLine("person is..." + r.FirstName);
Console.WriteLine(r.FirstName + "' deps are...");
foreach(var d in r.Deps){
Console.WriteLine(d.Name);
}
}
When I run this I get the result and LINQPad shows time taken value from 3.515 sec to 0.004 sec (depending how much gap I take between different runs).
If I take the generated SQL query and execute it, that query always runs between 0.015 sec to 0.001sec.
Generated query:
-- Region Parameters
DECLARE #p0 Int = 1
DECLARE #p1 Int = 1
-- EndRegion
SELECT [t7].[ID], [t7].[value] AS [FirstName]
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY [t6].[ID]) AS [ROW_NUMBER], [t6].[ID], [t6].[value]
FROM (
SELECT [t2].[ID], (
SELECT [t5].[FirstName]
FROM (
SELECT TOP (1) [t3].[FirstName]
FROM [Person] AS [t3]
INNER JOIN [Department] AS [t4] ON ([t3].[ID]) = [t4]. [InstructorID]
WHERE [t2].[ID] = [t3].[ID]
) AS [t5]
) AS [value]
FROM (
SELECT [t0].[ID]
FROM [Person] AS [t0]
INNER JOIN [Department] AS [t1] ON ([t0].[ID]) = [t1].[InstructorID]
GROUP BY [t0].[ID]
) AS [t2]
) AS [t6]
) AS [t7]
WHERE [t7].[ROW_NUMBER] BETWEEN #p0 + 1 AND #p0 + #p1
ORDER BY [t7].[ROW_NUMBER]
GO
-- Region Parameters
DECLARE #x1 Int = 2
-- EndRegion
SELECT DISTINCT [t1].[DepartmentID], [t1].[Name], [t1].[Budget], [t1]. [StartDate], [t1].[InstructorID], [t1].[RowVersion]
FROM [Person] AS [t0]
INNER JOIN [Department] AS [t1] ON ([t0].[ID]) = [t1].[InstructorID]
WHERE #x1 = [t0].[ID]
My questions:
1) Are those LINQ statements correct? Or can they be optimized?
2) Is the time difference for LINQ query execution normal?
Another different question:
I have modified the first query to execute immediately (called ToList before the second query). This time generated SQL is very simple as shown below (it doesn't look like there is a SQL query for the first LINQ statement with ToList() included):
SELECT [t0].[ID], [t0].[LastName], [t0].[FirstName], [t0].[HireDate], [t0]. [EnrollmentDate], [t0].[Discriminator], [t1].[DepartmentID], [t1].[Name], [t1]. [Budget], [t1].[StartDate], [t1].[InstructorID], [t1].[RowVersion]
FROM [Person] AS [t0]
INNER JOIN [Department] AS [t1] ON ([t0].[ID]) = [t1].[InstructorID]
Running this modified query also took varied amount of time but the difference is not as big as the first query set run.
In my application, there going to be lot of rows and I prefer first query set to second one but I am confused.
Please guide.
(Note: I have a little SQL Server knowledge so, I am using LINQPad to fine tune queries based on the performance)
Thanks

select top and a selected row in SQL

I have a league with 3000 entrants. What I want to do is select only 50 of these but if the user does not exist in this top 50 I want to select that row also and display their current position, the below query shows I am selecting the top 50 positions from the main inner query which brings back all players and their positions. Now I want to show the current logged in user and their position so is there any way to select top 50 and the user's entry from the subset by amending the below? I.e. is it possible to run two selects on a subset like
SELECT TOP 50 AND SELECT TOP 1 (Where condition)
FROM
(
Subset
)
my Query
SELECT TOP 50 [LeagueID],
[EntryID],
[UserID],
[TotalPoints],
[TotalBonusPoints],
[TotalPointsLastEvnet],
[TotalBonusPointsLastRaceEvent],
[Prize],
[dbo].[GetTotalPool]([LeagueID]) AS [TotalPool],
DENSE_RANK() OVER( PARTITION BY [LeagueID] ORDER BY [TotalPoints] DESC, [TotalBonusPoints] DESC) AS [Position],
DENSE_RANK() OVER( PARTITION BY [LeagueID] ORDER BY [TotalPointsLastRace] DESC, [TotalBonusPointsLastRace] DESC) AS [PositionLastRace]
FROM
(
// inner query here bringing back all entrants
) AS DATA
You can use union for joining two subset of results: http://msdn.microsoft.com/en-au/library/ms180026.aspx
Also if you need to populate result from same suset then you can use common table expression:
http://technet.microsoft.com/en-us/library/ms190766%28v=sql.105%29.aspx
The pseodocode:
with subset(...)
select top 50
union
select top 1
You can do this without a union. You just need or:
WITH data as (
// inner query here bringing back all entrants
)
SELECT * -- or whatever columns you really want
FROM (SELECT data.*
DENSE_RANK() OVER (PARTITION BY [LeagueID]
ORDER BY [TotalPoints] DESC, [TotalBonusPoints] DESC
) AS [Position],
DENSE_RANK() OVER (PARTITION BY [LeagueID]
ORDER BY [TotalPointsLastRace] DESC, [TotalBonusPointsLastRace] DESC
) AS [PositionLastRace],
ROW_NUMBER() OVER (PARTITION BY [LeagueID]
ORDER BY [TotalPoints] DESC, [TotalBonusPoints] DESC
) as Position_Rownum
FROM data
) d
WHERE Position_RowNum <= 50 or UserId = #Current_userid
This uses row_number() to do what you wanted top to do. I note that your question does not include an order by clause, so I am guessing that you want the ordering by Position.
You can use a CTE instead of sub-query and SELECT TOP 50 and then SELECT TOP 1 and union all something like this....
;WITH CTE AS
(
// inner query here bringing back all entrants
)
SELECT TOP 50 * FROM CTE WHERE <Some Codition>
UNION ALL
SELECT TOP 1 * FROM CTE WHERE <Some other Codition>
Please try the following:
;WITH entrants
([LeagueID],[EntryID],[UserID],[TotalPoints],[TotalBonusPoints],[TotalPointsLastEvnet],
[TotalBonusPointsLastRaceEvent],[Prize],[TotalPool],[Position],[PositionLastRace])
AS
(
SELECT [LeagueID],
[EntryID],
[UserID],
[TotalPoints],
[TotalBonusPoints],
[TotalPointsLastEvnet],
[TotalBonusPointsLastRaceEvent],
[Prize],
[dbo].[GetTotalPool]([LeagueID]) AS [TotalPool],
DENSE_RANK() OVER( PARTITION BY [LeagueID] ORDER BY [TotalPoints] DESC, [TotalBonusPoints] DESC) AS [Position],
DENSE_RANK() OVER( PARTITION BY [LeagueID] ORDER BY [TotalPointsLastRace] DESC, [TotalBonusPointsLastRace] DESC) AS [PositionLastRace]
FROM
(
// inner query here bringing back all entrants
) AS DATA
)
SELECT TOP 50 * FROM entrants
UNION
SELECT * FROM entrants WHERE UserID = #current_userid
ORDER BY Position;

Left Outer Join and Exists in Linq To SQL C# .NET 3.5

I'm stuck on translating a left outer join from LINQToSQL that returns unique parent rows.
I have 2 tables (Project, Project_Notes, and it's a 1-many relationship linked by Project_ID). I am doing a keyword search on multiple columns on the 2 table and I only want to return the unique projects if a column in Project_Notes contains a keyword. I have this linqtoSQl sequence going but it seems to be returning multiple Project rows. Maybe do an Exist somehow in LINQ? Or maybe a groupby of some sort?
Here's the LINQToSQL:
query = from p in query
join n in notes on p.PROJECT_ID equals n.PROJECT_ID into projectnotes
from n in notes.DefaultIfEmpty()
where n.NOTES.Contains(cwForm.search1Form)
select p;
here's the SQL it produced from profiler
exec sp_executesql N'SELECT [t2].[Title], [t2].[State], [t2].[PROJECT_ID],
[t2].[PROVIDER_ID], [t2].[CATEGORY_ID], [t2].[City], [t2].[UploadedDate],
[t2].[SubmittedDate], [t2].[Project_Type]FROM ( SELECT ROW_NUMBER() OVER (ORDER BY
[t0].[UploadedDate]) AS [ROW_NUMBER], [t0].[Title], [t0].[State], [t0].[PROJECT_ID],
[t0].[PROVIDER_ID], [t0].[CATEGORY_ID], [t0].[City], [t0].[UploadedDate],
[t0].[SubmittedDate], [t0].[Project_Type] FROM [dbo].[PROJECTS] AS [t0] LEFT OUTER JOIN
[dbo].[PROJECT_NOTES] AS [t1] ON 1=1 WHERE ([t1].[NOTES] LIKE #p0) AND
([t0].SubmittedDate] >= #p1) AND ([t0].[SubmittedDate] < #p2) AND ([t0].[PROVIDER_ID] =
#p3) AND ([t0].[CATEGORY_ID] IS NULL)) AS [t2] WHERE [t2].[ROW_NUMBER] BETWEEN #p4 + 1
AND #p4 + #p5 ORDER BY [t2].[ROW_NUMBER]',N'#p0 varchar(9),#p1 datetime,#p2 datetime,#p3
int,#p4 int,#p5 int',#p0='%chicago%',#p1=''2000-09-02 00:00:00:000'',#p2=''2009-03-02
00:00:00:000'',#p3=1000,#p4=373620,#p5=20
This query returns all mutiples of the 1-many relationship in the results. I found how to do an Exists in LINQ from here. http://www.linq-to-sql.com/linq-to-sql/t-sql-to-linq-upgrade/linq-exists/
Here is the LINQToSQL using Exists:
query = from p in query
where (from n in notes
where n.NOTES.Contains(cwForm.search1Form)
select n.PROJECT_ID).Contains(p.PROJECT_ID)
select p;
The generated SQL statement:
exec sp_executesql N'SELECT COUNT(*) AS [value] FROM [dbo].[PROJECTS] AS [t0] WHERE
(EXISTS(SELECT NULL AS [EMPTY] FROM [dbo].[PROJECT_NOTES] AS [t1] WHERE
([t1].PROJECT_ID] = ([t0].[PROJECT_ID])) AND ([t1].[NOTES] LIKE #p0))) AND
([t0].[SubmittedDate] >= #p1) AND ([t0].[SubmittedDate] < #p2) AND ([t0].[PROVIDER_ID] =
#p3) AND ([t0].[CATEGORY_ID] IS NULL)',N'#p0 varchar(9),#p1 datetime,#p2 datetime,#p3
int',#p0='%chicago%',#p1=''2000-09-02 00:00:00:000'',#p2=''2009-03-02
00:00:00:000'',#p3=1000
I get a SQL timeout from the databind() from using Exists.
it seems to be returning multiple Project rows
Yes, that's how join works. If a project has 5 matching notes, it show up 5 times.
What if the problem is - "Join" is the wrong idiom!
You want to filter the projects to those whose notes contain certain text:
var query = db.Project
.Where(p => p.Notes.Any(n => n.NoteField.Contains(searchString)));
You are going to have to use the DefaultIfEmpty extension method. There are a few questions on SO already that show how to do this. Here is a good example:
How can I perform a nested Join, Add, and Group in LINQ?

Categories

Resources