GroupBy with EF Core 6.0 and SQL Server - c#

I have a Blazor Web Application that has been working and in the field for a few months. I want to extend the DB querying to the group of similar "Detections".
It was written starting with .NET 5, and just today was updated to .NET 6 trying and get this working.
I would like to know how to get the results ordered by TimeStamp (a DateTime property). I have a working example with an in-memory DB, but production will be in SQL Server. I am not that great in SQL, but I have played around with it for a while in Management Studio with no luck.
Commenting out the OrderByDescending() groups things properly, but the results are not in the correct order. It seems the EF translation process is completely removing that line, it makes no difference in the generated query or the result set.
var results = context.Detections
//Line below makes no change ignored by SQL Server. Works when using in memory DB.
//.OrderByDescending(det => det.TimeStamp)
.GroupBy(det => new
{
Year = det.TimeStamp.Year,
Month = det.TimeStamp.Month,
Day = det.TimeStamp.Day,
Hour = det.TimeStamp.Hour,
})
.Select(grp => new
{
Count = grp.Count(),
Detection = grp.OrderByDescending(det => det.TimeStamp).First(),
})
//The following line will not translate
//.OrderByDescending(det => det.Detection.TimeStamp)
.ToList();
If any of this matters:
Visual Studio 2022 (4.8.04084)
.Net 6.0
SQL Server 2019 (15.0.2080.9)
*All NuGet packages related to EF have been updated to 6.0
Edit for clarification
The above code segment produces the following SQL query.
SELECT [t].[c], [t0].[Id], [t0].[TimeStamp]
FROM (
SELECT COUNT(*) AS [c], DATEPART(year, [d].[TimeStamp]) AS [c0], DATEPART(month, [d].[TimeStamp]) AS [c1], DATEPART(day, [d].[TimeStamp]) AS [c2], DATEPART(hour, [d].[TimeStamp]) AS [c3]
FROM [Detections] AS [d]
WHERE [d].[TimeStamp] > DATEADD(day, CAST(-16.0E0 AS int), GETUTCDATE())
GROUP BY DATEPART(year, [d].[TimeStamp]), DATEPART(month, [d].[TimeStamp]), DATEPART(day, [d].[TimeStamp]), DATEPART(hour, [d].[TimeStamp])
) AS [t]
OUTER APPLY (
SELECT TOP(1) [d0].[Id], [d0].[TimeStamp]
FROM [Detections] AS [d0]
WHERE ([d0].[TimeStamp] > DATEADD(day, CAST(-30.0E0 AS int), GETUTCDATE())) AND (((([t].[c0] = DATEPART(year, [d0].[TimeStamp])) AND ([t].[c1] = DATEPART(month, [d0].[TimeStamp]))) AND ([t].[c2] = DATEPART(day, [d0].[TimeStamp]))) AND ([t].[c3] = DATEPART(hour, [d0].[TimeStamp])))
ORDER BY [d0].[TimeStamp] DESC
) AS [t0]
It produces results similar to the following. Notice not sorted by time.
1 628591 2021-11-02 14:34:06.0442966
10 628601 2021-11-12 05:43:27.7015291
150 628821 2021-11-12 21:59:27.6444236
20 628621 2021-11-12 06:17:13.7798282
50 628671 2021-11-12 15:17:23.8893856
If I add ORDER BY [t0].TimeStamp DESC at the end of that SQL query in Management Studio I get the results I am looking for (see below). I just need to know how to write that in LINQ.
150 628821 2021-11-12 21:59:27.6444236
50 628671 2021-11-12 15:17:23.8893856
20 628621 2021-11-12 06:17:13.7798282
10 628601 2021-11-12 05:43:27.7015291
1 628591 2021-11-02 14:34:06.0442966
Adding .OrderByDescending(det => det.Detection.TimeStamp) at the end before ToList() was my first thought, but that "could not be translated". I will need to do some pagination with these results so I would really like to do the sorting in SQL.

GroupBy has to do its own Ordering so that 'ignores' is not totally unexpected.
Move it to below the grouping:
var results = context.Detections
//.OrderByDescending(det => det.TimeStamp)
.GroupBy(det => new
{
Year = det.TimeStamp.Year,
Month = det.TimeStamp.Month,
Day = det.TimeStamp.Day,
Hour = det.TimeStamp.Hour,
})
// .OrderByDescending(grp => grp.Key) // may have to split into y/m/d/h again
.OrderByDescending(grp => grp.Key.Year)
.ThenByDescending( grp => grp.Key.Month)
.ThenByDescending( grp => grp.Key.Day)
.ThenByDescending( grp => grp.Key.Hour)
.Select(grp => new
{
Count = grp.Count(),
Detection = grp.OrderByDescending(det => det.TimeStamp).First(),
})
.ToList();
When EF supports it, the Ordering and Grouping might become a little easier with
.GroupBy(det => new
{
Date = det.TimeStamp.Date,
Hour = det.TimeStamp.Hour,
})

For anyone looking at this in the future.
I was able to make this work by declaring and populating a TimeStamp property and using the OrderByDescending() at the end. I am not sure if this is the best solution, but it did solve my problem.
var results = context.Detections
.GroupBy(det => new
{
Year = det.TimeStamp.Year,
Month = det.TimeStamp.Month,
Day = det.TimeStamp.Day,
Hour = det.TimeStamp.Hour,
})
.Select(grp => new
{
Count = grp.Count(),
TimeStamp = grp.OrderByDescending(det => det.TimeStamp).First().TimeStamp,
Detection = grp.OrderByDescending(det => det.TimeStamp).First(),
})
.OrderByDescending(det => det.TimeStamp)
.ToList();

Related

Implement an SQL query in LINQ

I'm trying implement the follow query in LINQ, but I don't find solution:
SQL:
SELECT COUNT(*) AS AmountMonths
FROM (SELECT SUBSTRING(CONVERT(NVARCHAR(12), pay_date, 112), 1, 6) AS Month
FROM #tmp
GROUP BY SUBSTRING(CONVERT(NVARCHAR(12), pay_date, 112), 1, 6)) AS AmountMonths
What I need is get the amounts of months in which the clients made payments, with the condition that there may be months in which no payments have been made.
In C# I tried the following:
int amountMonths = payDetail.GroupBy(x => Convert.ToDateTime(x.PayDate)).Count();
and
int amountMonths = payDetail.GroupBy(x => Convert.ToDateTime(x.PayDate).Month).Count();
But I am not getting the expected result.
(Assuming you're using EF Core)
You're almost there. You could do:
var amountMonths = context.AmountMonths.GroupBy(c => new { c.PayDate.Year, c.PayDate.Month }).Count();
This will translate to something like:
SELECT COUNT(*)
FROM (
SELECT DATEPART(year, [a].[PayDate]) AS [a]
FROM [AmountMonths] AS [a]
GROUP BY DATEPART(year, [a].[PayDate]), DATEPART(month, [a].[Pay_Date])
) AS [t]
which I'd find preferable over creating a string and chopping it up. EOMONTH isn't a standard mapped function, alas, otherwise it can be used to convert a date to month level granularity

LINQ Using ROW_NUMBER() function?

I have a table that shows a list of sync's from our mobile users back to our database. This means that each user could have thousands of sync records.
I have written a query that uses the ROW_NUMBER() function to pull the most recent sync for every user and only active users, as I don't want to see sync'd data from terminated employees. (i.e. User A sync'd yesterday at noon, User A sync'd today at noon but I only want to see the sync from today).
SELECT * FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY [SerialNumber] ORDER BY SyncDate DESC )as RN
FROM [TSCH].[dbo].[SYNCREPORT]
) as T
Where RN = 1 and WorkerStatus = 'ACTIVE' and SerialNumber = ######;
What would the best approach for writing this using LINQ in c# for my .net web application? Thanks for the help!
Could be something like this
var result=yourtable.OrderBy(x=>x.SyncDate).GroupBy(x=>x.SerialNumber)
.Where(x=>x.WorkerStatus=="Active" && x.SerialNumber=="####")
.Select(g => new {g, count= g.Count()})
.SelectMany(t => t.g.Select(b => b)
.Zip(Enumerable.Range(1,t.count), (c,i) => new {c.value1, c,value2, rn = i}));

How can i improve the performance of this LINQ?

UPDATE
thanks to #usr I have got this down to ~3 seconds simply by changing
.Select(
log => log.OrderByDescending(
d => d.DateTimeUTC
).FirstOrDefault()
)
to
.Select(
log => log.OrderByDescending(
d => d.Id
).FirstOrDefault()
)
I have a database with two tables - Logs and Collectors - which I am using Entity Framework to read. There are 86 collector records and each one has 50000+ corresponding Log records.
I want to get the most recent log record for each collector which is easily done with this SQL
SELECT CollectorLogModels_1.Status, CollectorLogModels_1.NumericValue,
CollectorLogModels_1.StringValue, CollectorLogModels_1.DateTimeUTC,
CollectorSettingsModels.Target, CollectorSettingsModels.TypeName
FROM
(SELECT CollectorId, MAX(Id) AS Id
FROM CollectorLogModels GROUP BY CollectorId) AS RecentLogs
INNER JOIN CollectorLogModels AS CollectorLogModels_1
ON RecentLogs.Id = CollectorLogModels_1.Id
INNER JOIN CollectorSettingsModels
ON CollectorLogModels_1.CollectorId = CollectorSettingsModels.Id
This takes ~2 seconds to execute.
the closest I have been able to get with LINQ is the following
var logs = context.Logs.Include(co => co.Collector)
.GroupBy(
log => log.CollectorId, log => log
)
.Select(
log => log.OrderByDescending(
d => d.DateTimeUtc
).FirstOrDefault()
)
.Join(
context.Collectors,
(l => l.CollectorId),
(c => c.Id),
(l, c) => new
{
c.Target,
DateTimeUTC = l.DateTimeUtc,
l.Status,
l.StringValue,
CollectorName = c.TypeName
}
).OrderBy(
o => o.Target
).ThenBy(
o => o.CollectorName
)
;
This produces the results I want but takes ~35 seconds to execute.
This becomes the following SQL
SELECT
[Distinct1].[CollectorId] AS [CollectorId],
[Extent3].[Target] AS [Target],
[Limit1].[DateTimeUtc] AS [DateTimeUtc],
[Limit1].[Status] AS [Status],
[Limit1].[StringValue] AS [StringValue],
[Extent3].[TypeName] AS [TypeName]
FROM (SELECT DISTINCT
[Extent1].[CollectorId] AS [CollectorId]
FROM [dbo].[CollectorLogModels] AS [Extent1] ) AS [Distinct1]
OUTER APPLY (SELECT TOP (1) [Project2].[Status] AS [Status], [Project2].[StringValue] AS [StringValue], [Project2].[DateTimeUtc] AS [DateTimeUtc], [Project2].[CollectorId] AS [CollectorId]
FROM ( SELECT
[Extent2].[Status] AS [Status],
[Extent2].[StringValue] AS [StringValue],
[Extent2].[DateTimeUtc] AS [DateTimeUtc],
[Extent2].[CollectorId] AS [CollectorId]
FROM [dbo].[CollectorLogModels] AS [Extent2]
WHERE [Distinct1].[CollectorId] = [Extent2].[CollectorId]
) AS [Project2]
ORDER BY [Project2].[DateTimeUtc] DESC ) AS [Limit1]
INNER JOIN [dbo].[CollectorSettingsModels] AS [Extent3] ON [Limit1].[CollectorId] = [Extent3].[Id]
ORDER BY [Extent3].[Target] ASC, [Extent3].[TypeName] ASC
How can I get performance closer to what is achievable with SQL alone?
In your original SQL you can select a collection DateTimeUTC from a different row than the MAX(ID). That's probably a bug. The EF does not have that problem. It's not semantically identical, it is a harder query.
If you rewrite the EF query to be structurally the same as the SQL query you'll get identical performance. I see nothing here that EF would not support.
Compute the max(id) with EF as well and join on that.
I had the exact same issue, i solved it by adding indexes.
A query of mine would take 45 seconds to complete, i managed to get it completing in less than a second.

SQL Time Duration Between Records

I have the following table:
CamId RegNumber DateSeen
5 G1234B 18/02/2014 11:54
3 G1234B 18/02/2014 11:51
5 G11854 18/02/2014 11:50
3 G11854 18/02/2014 11:49
3 G24581 18/02/2014 11:48
I need to know the time taken from when a registration number is seen at CamId 3 to CamId 5, a reg number must exist in both CamId 3 and 5 for this to work.
The result i am looking for is a list of registration numbers together with a time difference in seconds (for the purpose of this demo in minutes):
RegNumber Duration
G1234B 3
G11854 1
I then want to add up all these durations and get the median or average value.
Hopefully someone can assist, a linq sql statement would be ideal.
You can use Enumerable.GroupBy, then select the latest record with CamId == 5, subtract it with the earliest record with CamId == 3 and use TimeSpan.TotalSeconds.
var query = db.Registration
.GroupBy(r => r.RegNumber)
.Select(grp => new
{
RegNumber = grp.Key,
Duration = (grp.Where(r => r.CamId == 5)
.OrderByDescending(r => DateSeen)
.Select(r => r.DateSeen)
.FirstOrDefault()
- grp.Where(r => r.CamId == 3)
.OrderBy(r => DateSeen)
.Select(r => r.DateSeen)
.FirstOrDefault()).TotalSeconds
});
Update: "Would you be able to provide the above in an SQL statement?"
WITH CTE AS
(
SELECT [CamId], [RegNumber], [DateSeen],
Duration = DATEDIFF(second,
(SELECT MIN(DateSeen)FROM dbo.Registration r2
WHERE r1.RegNumber=r2.RegNumber
AND r2.CamId = 3),
(SELECT MAX(DateSeen)FROM dbo.Registration r2
WHERE r1.RegNumber=r2.RegNumber
AND r2.CamId = 5)),
RN = ROW_NUMBER() OVER (PARTITION BY RegNumber ORDER BY DateSeen)
FROM dbo.Registration r1
)
SELECT [RegNumber], [Duration]
FROM CTE
WHERE [Duration] IS NOT NULL AND RN = 1
Demo

Entity Framework: Efficiently grouping by month

I've done a bit of research on this, and the best I've found so far is to use an Asenumerable on the whole dataset, so that the filtering occurs in linq to objects rather than on the DB. I'm using the latest EF.
My working (but very slow) code is:
var trendData =
from d in ExpenseItemsViewableDirect.AsEnumerable()
group d by new {Period = d.Er_Approved_Date.Year.ToString() + "-" + d.Er_Approved_Date.Month.ToString("00") } into g
select new
{
Period = g.Key.Period,
Total = g.Sum(x => x.Item_Amount),
AveragePerTrans = Math.Round(g.Average(x => x.Item_Amount),2)
};
This gives me months in format YYYY-MM, along with the total amount and average amount. However it takes several minutes every time.
My other workaround is to do an update query in SQL so I have a YYYYMM field to group natively by. Changing the DB isn't an easy fix however so any suggestions would be appreciated.
The thread I found the above code idea (http://stackoverflow.com/questions/1059737/group-by-weeks-in-linq-to-entities) mentions 'waiting until .NET 4.0'. Is there anything recently introduced that helps in this situation?
The reason for poor performance is that the whole table is fetched into memory (AsEnumerable()). You can group then by Year and Month like this
var trendData =
(from d in ExpenseItemsViewableDirect
group d by new {
Year = d.Er_Approved_Date.Year,
Month = d.Er_Approved_Date.Month
} into g
select new
{
Year = g.Key.Year,
Month = g.Key.Month,
Total = g.Sum(x => x.Item_Amount),
AveragePerTrans = Math.Round(g.Average(x => x.Item_Amount),2)
}
).AsEnumerable()
.Select(g=>new {
Period = g.Year + "-" + g.Month,
Total = g.Total,
AveragePerTrans = g.AveragePerTrans
});
edit
The original query, from my response, was trying to do a concatenation between an int and a string, which is not translatable by EF into SQL statements. I could use SqlFunctions class, but the query it gets kind ugly. So I added AsEnumerable() after the grouping is made, which means that EF will execute the group query on server, will get the year, month, etc, but the custom projection is made over objects (what follows after AsEnumerable()).
When it comes to group by month i prefer to do this task in this way:
var sqlMinDate = (DateTime) SqlDateTime.MinValue;
var trendData = ExpenseItemsViewableDirect
.GroupBy(x => SqlFunctions.DateAdd("month", SqlFunctions.DateDiff("month", sqlMinDate, x.Er_Approved_Date), sqlMinDate))
.Select(x => new
{
Period = g.Key // DateTime type
})
As it keeps datetime type in the grouping result.
Similarly to what cryss wrote, I am doing the following for EF. Note we have to use EntityFunctions to be able to call all DB providers supported by EF. SqlFunctions only works for SQLServer.
var sqlMinDate = (DateTime) SqlDateTime.MinValue;
(from x in ExpenseItemsViewableDirect
let month = EntityFunctions.AddMonths(sqlMinDate, EntityFunctions.DiffMonths(sqlMinDate, x.Er_Approved_Date))
group d by month
into g
select new
{
Period = g.Key,
Total = g.Sum(x => x.Item_Amount),
AveragePerTrans = Math.Round(g.Average(x => x.Item_Amount),2)
}).Dump();
A taste of generated SQL (from a similar schema):
-- Region Parameters
DECLARE #p__linq__0 DateTime2 = '1753-01-01 00:00:00.0000000'
DECLARE #p__linq__1 DateTime2 = '1753-01-01 00:00:00.0000000'
-- EndRegion
SELECT
1 AS [C1],
[GroupBy1].[K1] AS [C2],
[GroupBy1].[A1] AS [C3]
FROM ( SELECT
[Project1].[C1] AS [K1],
FROM ( SELECT
DATEADD (month, DATEDIFF (month, #p__linq__1, [Extent1].[CreationDate]), #p__linq__0) AS [C1]
FROM [YourTable] AS [Extent1]
) AS [Project1]
GROUP BY [Project1].[C1]
) AS [GroupBy1]

Categories

Resources