LINQ Multiple GroupBy Query Performing several times slower than T-SQL - c#

I'm totally new to LINQ.
I have an SQL GroupBy which runs in barely a few milliseconds. But when I try to achieve the same thing via LINQ, it just seems awfully slow.
What I'm trying to achieve is fetch an average monthly duration of a ceratin database update.
In SQL =>
select SUBSTRING(yyyyMMdd, 0,7),
AVG (duration)
from (select (CONVERT(CHAR(8), mmud.logDateTime, 112)) as yyyyMMdd,
DateDIFF(ms, min(mmud.logDateTime), max(mmud.logDateTime)) as duration
from mydb.mydbo.updateData mmud
left
join mydb.mydbo.updateDataKeyValue mmudkv
on mmud.updateDataid = mmudkv.updateDataId
left
join mydb.mydbo.updateDataDetailKey mmuddk
on mmudkv.updateDataDetailKeyid = mmuddk.Id
where dbname = 'MY_NEW_DB'
and mmudkv.value in ('start', 'finish')
group
by (CONVERT(CHAR(8), mmud.logDateTime, 112))
) as resultSet
group
by substring(yyyyMMdd, 0,7)
order
by substring(yyyyMMdd, 0,7)
in LINQ => I first fetch the record from a table that links information of the Database Name and UpdateData and then do filtering and groupby on the related information.
entry.updatedata.Where(
ue => ue.updatedataKeyValue.Any(
uedkv =>
uedkv.Value.ToLower() == "starting update" ||
uedkv.Value.ToLower() == "client release"))
.Select(
ue =>
new
{
logDateTimeyyyyMMdd = ue.logDateTime.Date,
logDateTime = ue.logDateTime
})
.GroupBy(
updateDataDetail => updateDataDetail.logDateTimeyyyyMMdd)
.Select(
groupedupdatedata => new
{
UpdateDateyyyyMM = groupedupdatedata.Key.ToString("yyyyMMdd"),
Duration =
(groupedupdatedata.Max(groupMember => groupMember.logDateTime) -
groupedupdatedata.Min(groupMember => groupMember.logDateTime)
)
.TotalMilliseconds
}
).
ToList();
var updatedataMonthlyDurations =
updatedataInDateRangeWithDescriptions.GroupBy(ue => ue.UpdateDateyyyyMM.Substring(0,6))
.Select(
group =>
new updatedataMonthlyAverageDuration
{
DbName = entry.DbName,
UpdateDateyyyyMM = group.Key.Substring(0,6),
Duration =
group.Average(
gmember =>
(gmember.Duration))
}
).ToList();
I know that GroupBy in LINQ isn't the same as GroupBy in T-SQL, but not sure what happens behind the scenes. Could anyone explain the difference and what happens in memory when I run the LINQ version? After I did the .ToList() after the first GroupBy things got a little faster. But even then this way of finding average duration is really slow.
What would be the best alternative and are there ways of improving a slow LINQ statement using Visual Studio 2012?

Your linq query is doing most of its work in linq-to-objects. You should be constructing a linq-to-entities/sql query that generates the complete query in one shot.
Your query seems to have a redundant group by clause, and I am not sure which table dbname comes from, but the following query should get you on the right track.
var query = from mmud in context.updateData
from mmudkv in context.updateDataKeyValue
.Where(x => mmud.updateDataid == x.updateDataId)
.DefaultIfEmpty()
from mmuddk in context.updateDataDetailKey
.Where(x => mmudkv.updateDataDetailKeyid == x.Id)
.DefaultIfEmpty()
where mmud.dbname == "MY_NEW_DB"
where mmudkv.value == "start" || mmudkv.value == "finish"
group mmud by mmud.logDateTime.Date into g
select new
{
Date = g.Key,
Average = EntityFunctions.DiffMilliseconds(g.Max(x => x.logDateTime), g.Min(x => x.logDateTime)),
};
var queryByMonth = from x in query
group x by new { x.Date.Year, x.Date.Month } into x
select new
{
Year = x.Key.Year,
Month = x.Key.Month,
Average = x.Average(y => y.Average)
};
// Single sql statement is to sent to your database
var result = queryByMonth.ToList();
If you are still having problems, we will need to know if you are using entityframework or linq-to-sql. And you will need to provide your context/model information

Related

Need some guidance understanding what EF Core does with this query internally [duplicate]

This question already has an answer here:
DefaultIfEmpty Exception "bug or limitation" with EF Core
(1 answer)
Closed 1 year ago.
I am constructing this IQueryable in code:
var query = (from t in context.Tasks.AsNoTracking()
join wt in context.WorkTemplates on t.ID equals wt.TaskID into tmp
from tt in tmp.DefaultIfEmpty(new WorkTemplate())
group tt by t into grp
select new
{
grp.Key.ID,
grp.Key.Name,
grp.Key.Instructions,
grp.Key.ManualOnly,
grp.Key.Timestamp,
TemplateCount = grp.Where(x => x.ID > 0).Count()
})
.Select(x => new Task { ID = x.ID, Name = x.Name, Instructions = x.Instructions, ManualOnly = x.ManualOnly, Timestamp = x.Timestamp, TemplateCount = x.TemplateCount })
.AsQueryable();
My requirement is that it has to be a single IQueryable statement because it is handed off to a grid control that uses it to fetch data in an on-going fashion.
EF Core 2.2.6 did not complain about this query, but apparently that was because it was doing what it had to in order to execute at least part of it locally. Now that I have moved to EF Core 5+, it throws this exception:
I think the problem is the group part of the query. It is there because it derives the TemplateCount, but I can't find a way to rewrite this that works.
Since this query is never going to pull enough data to cause a performance problem, local execution is fine. It is more important that it works as a single IQueryable.
Try this one:
var query = (from t in context.Tasks.AsNoTracking()
join wt in context.WorkTemplates on t.ID equals wt.TaskID into tmp
from tt in tmp.DefaultIfEmpty()
group tt by new { t.ID, t.Name, t.Instructions, t.ManualOnly, t.Timestamp } into grp
select new
{
grp.Key.ID,
grp.Key.Name,
grp.Key.Instructions,
grp.Key.ManualOnly,
grp.Key.Timestamp,
TemplateCount = grp.Sum(x => x.ID > 0 ? 1 : 0)
})
.Select(x => new Task { ID = x.ID, Name = x.Name, Instructions = x.Instructions, ManualOnly = x.ManualOnly, Timestamp = x.Timestamp, TemplateCount = x.TemplateCount });
Just simplify tmp.DefaultIfEmpty(new WorkTemplate()), corrected grouping and replacing Count with Sum

EF Core 2.1 GROUP BY and select first item in each group

Let's imaging a forum having a list of topics and posts in them.
I want to get the list of topics and a title of last post (by date) for each topic.
Is there a way to achieve this using EF Core (2.1)?
In SQL it could be done like
SELECT Posts.Title, Posts.CreatedDate, Posts.TopicId FROM
(SELECT Max(CreatedDate), TopicId FROM Posts GROUP BY TopicId) lastPosts
JOIN Posts ON Posts.CreatedDate = lastPosts.CreatedDate AND Posts.TopicId = lastPosts.TopicId
In EFCore I can select LastDates
_context.Posts.GroupBy(x => x.TopicId, (x, y) => new
{
CreatedDate = y.Max(z => z.CreatedDate),
TopicId = x,
});
And if I run .ToList() the query is correctly translated to GROUP BY.
But I can't go further.
The following is executed in memory, not in SQL (resulting in SELECT * FROM Posts):
.GroupBy(...)
.Select(x => new
{
x.TopicId,
Post = x.Posts.Where(z => z.CreatedDate == x.CreatedDate)
//Post = x.Posts.FirstOrDefault(z => z.CreatedDate == x.CreatedDate)
})
Attempting to JOIN gives NotSupportedException (Could not parse expression):
.GroupBy(...)
.Join(_context.Posts,
(x, y) => x.TopicId == y.TopicId && x.CreatedDate == y.CreatedDate,
(x, post) => new
{
post.Title,
post.CreatedDate,
})
I know I can do it using SELECT N+1 (running a separate query per topic), but I'd like to avoid that.
I don't know since which version of EFCore it's possible, but there's a simpler single-query alternative now:
context.Topic
.SelectMany(topic => topic.Posts.OrderByDescending(z => z.CreatedDate).Take(1),
(topic, post) => new {topic.Id, topic.Title, post.Text, post.CreatedDate})
.OrderByDescending(x => x.CreatedDate)
.ToList();
Basically what I'm doing now is after running
var topics = _context.Posts.GroupBy(x => x.TopicId, (x, y) => new
{
CreatedDate = y.Max(z => z.CreatedDate),
TopicId = x,
}).ToList();
I build the following query:
Expression<Func<Post, bool>> lastPostsQuery = post => false;
foreach (var topic in topics)
{
lastPostsQuery = lastPostsQuery.Or(post => post.TopicId == topic.TopicId && post.CreatedDate = topic.CreatedDate); //.Or is implemented in PredicateBuilder
}
var lastPosts = _context.Posts.Where(lastPostsQuery).ToList();
Which results in one query (instead of N) like SELECT * FROM Posts WHERE (Posts.TopicId == 1 AND Posts.CreatedDate = '2017-08-01') OR (Posts.TopicId == 2 AND Posts.CreatedDate = '2017-08-02') OR ....
Not extremely efficient but since the number of topics per page is quite low it does the trick.
In EF Core 2.1 GroupBy LINQ operator only support translating to the SQL GROUP BY clause in most common cases. Aggregation function like sum, max ...
linq-groupby-translation
You can until full support group by in EF Core use Dapper
I am not sure about version of EFCore it's possible, but you can try something like this: It will first group by then will select max id and return max id record from each group.
var firstProducts = Context.Posts
.GroupBy(p => p.TopicId)
.Select(g => g.OrderByDescending(p => p.id).FirstOrDefault())
.ToList();

How do I implement tsql's sum(...) over () in linq to SQL?

I have this functional t-sql query that counts the entries in a group by clause, and at the same time produces a percentage of the count compared to the entire set.
It is blazing fast (~90 ms) in Azure. I'd like to implement in a similar manner with LINQ to SQL, but I can't figure it out...
select f.worktype, f.counted, (100.0 * f.counted)/ (sum(f.counted) over ()) as percentage
from
(SELECT
wa.skillEN AS workType,
count(wa.skillEN) counted
FROM [dbo].WorkAssignments as WA
join [dbo].WorkOrders as WO ON (WO.ID = WA.workorderID)
WHERE wo.dateTimeOfWork < ('1/1/2014')
and wo.dateTimeOfWork > ('1/1/2013')
and wo.statusEN = 'Completed'
group by wa.skillEN) as f
group by f.worktype, f.counted
The LINQ I've been trying in LINQPad...
WorkAssignments
.Where(wa => wa.WorkOrder.DateTimeofWork > DateTime.Now.AddYears(-2)
&& wa.WorkOrder.DateTimeofWork < DateTime.Now)
.GroupBy(wa => wa.SkillEN)
.Select(g => new
{
label = g.Key,
count = g.Count()
})
.GroupBy(g => new {g.label, g.count})
.Select(gg => new
{
label = gg.Key.label,
count = gg.Key.count,
pct = gg.Sum(a => a.count)
})
(The dates in the where clause are slightly different, but I don't think it's relevant)
So, how would I implement the over () feature in LINQ to SQL?

Slow EF query grouping data by Month/Year

I have a record set of approximetly 1 million records. I'm trying to query the records to report monthly figures.
The following MySQL query executes in about 0.3 seconds
SELECT SUM(total), MONTH(create_datetime), YEAR(create_datetime)
FROM orders GROUP BY MONTH(create_datetime), YEAR(create_datetime)
However I am unable to figure out an entity framework lambda expression that can execute any near as fast
The only statement I have come up with that actually works is
var monthlySales = db.Orders
.Select(c => new
{
Total = c.Total,
CreateDateTime = c.CreateDateTime
})
.GroupBy(c => new { c.CreateDateTime.Year, c.CreateDateTime.Month })
.Select(c => new
{
CreateDateTime = c.FirstOrDefault().CreateDateTime,
Total = c.Sum(d => d.Total)
})
.OrderBy(c => c.CreateDateTime)
.ToList();
But it is horribly slow.
How can I get this query to execute as quickly as it does directly in MySQL
When you do ".ToList()" in the middle of query (before doing grouping) EF will effectively query all orders from database in memory and then do grouping in C#. Depending on amount of data in your table, that can take a while and I think this is why your query is so slow.
Try to rewrite your query having only 1 expression that enumerates results (ToList, ToArray, AsEnumerable)
Try this:
var monthlySales = from c in db.Orders
group c by new { y = c.CreateDateTime.Year, m = c.CreateDateTime.Month } into g
select new {
Total = c.Sum(t => t.Total),
Year = g.Key.y,
Month = g.Key.m }).ToList();
I came across this setup which executes quickly
var monthlySales = db.Orders
.GroupBy(c => new { Year = c.CreateDateTime.Year, Month = c.CreateDateTime.Month })
.Select(c => new
{
Month = c.Key.Month,
Year = c.Key.Year,
Total = c.Sum(d => d.Total)
})
.OrderByDescending(a => a.Year)
.ThenByDescending(a => a.Month)
.ToList();

Is there any way to optimize this LINQ to Entities query?

I was asked to produce a report that is driven by a fairly complex SQL query against a SQL Server database. Since the site of the report was already using Entity Framework 4.1, I thought I would attempt to write the query using EF and LINQ:
var q = from r in ctx.Responses
.Where(x => ctx.Responses.Where(u => u.UserId == x.UserId).Count() >= VALID_RESPONSES)
.GroupBy(x => new { x.User.AwardCity, x.Category.Label, x.ResponseText })
orderby r.FirstOrDefault().User.AwardCity, r.FirstOrDefault().Category.Label, r.Count() descending
select new
{
City = r.FirstOrDefault().User.AwardCity,
Category = r.FirstOrDefault().Category.Label,
Response = r.FirstOrDefault().ResponseText,
Votes = r.Count()
};
This query tallies votes, but only from users who have submitted a certain number of required minimum votes.
This approach was a complete disaster from a performance perspective, so we switched to ADO.NET and the query ran very quickly. I did look at the LINQ generated SQL using the SQL Profiler, and although it looked atrocious as usual I didn't see any clues as to how to optimize the LINQ statement to make it more efficient.
Here's the straight TSQL version:
WITH ValidUsers(UserId)
AS
(
SELECT UserId
FROM Responses
GROUP BY UserId
HAVING COUNT(*) >= 103
)
SELECT d.AwardCity
, c.Label
, r.ResponseText
, COUNT(*) AS Votes
FROM ValidUsers u
JOIN Responses r ON r.UserId = u.UserId
JOIN Categories c ON r.CategoryId = c.CategoryId
JOIN Demographics d ON r.UserId = d.Id
GROUP BY d.AwardCity, c.Label, r.ResponseText
ORDER BY d.AwardCity, s.SectionName, COUNT(*) DESC
What I'm wondering is: is this query just too complex for EF and LINQ to handle efficiently or have I missed a trick?
Using a let to reduce the number of r.First()'s will probably improve performance. It's probably not enough yet.
var q = from r in ctx.Responses
.Where()
.GroupBy()
let response = r.First()
orderby response.User.AwardCity, response.Category.Label, r.Count() descending
select new
{
City = response.User.AwardCity,
Category = response.Category.Label,
Response = response.ResponseText,
Votes = r.Count()
};
Maybe this change improve the performance, removing the resulting nested sql select in the where clause
First get the votes of each user and put them in a Dictionary
var userVotes = ctx.Responses.GroupBy(x => x.UserId )
.ToDictionary(a => a.Key.UserId, b => b.Count());
var cityQuery = ctx.Responses.ToList().Where(x => userVotes[x.UserId] >= VALID_RESPONSES)
.GroupBy(x => new { x.User.AwardCity, x.Category.Label, x.ResponseText })
.Select(r => new
{
City = r.First().User.AwardCity,
Category = r.First().Category.Label,
Response = r.First().ResponseText,
Votes = r.Count()
})
.OrderByDescending(r => r.City, r.Category, r.Votes());

Categories

Resources