I have this functional t-sql query that counts the entries in a group by clause, and at the same time produces a percentage of the count compared to the entire set.
It is blazing fast (~90 ms) in Azure. I'd like to implement in a similar manner with LINQ to SQL, but I can't figure it out...
select f.worktype, f.counted, (100.0 * f.counted)/ (sum(f.counted) over ()) as percentage
wa.skillEN AS workType,
count(wa.skillEN) counted
FROM [dbo].WorkAssignments as WA
join [dbo].WorkOrders as WO ON (WO.ID = WA.workorderID)
WHERE wo.dateTimeOfWork < ('1/1/2014')
and wo.dateTimeOfWork > ('1/1/2013')
and wo.statusEN = 'Completed'
group by wa.skillEN) as f
group by f.worktype, f.counted
The LINQ I've been trying in LINQPad...
.Where(wa => wa.WorkOrder.DateTimeofWork > DateTime.Now.AddYears(-2)
&& wa.WorkOrder.DateTimeofWork < DateTime.Now)
.GroupBy(wa => wa.SkillEN)
.Select(g => new
label = g.Key,
count = g.Count()
.GroupBy(g => new {g.label, g.count})
.Select(gg => new
label = gg.Key.label,
count = gg.Key.count,
pct = gg.Sum(a => a.count)
(The dates in the where clause are slightly different, but I don't think it's relevant)
So, how would I implement the over () feature in LINQ to SQL?
I wrote a SQL query that will get the count of tickets, closed tickets and its closure rate (%) and group it monthly basis (current year), but I would like to express this as a LINQ query to achieve the same result.
SELECT *, (ClosedCount * 100 / TicketCount) AS ClosureRate FROM (
SELECT COUNT(Id) as TicketCount, MONTH(InsertDate) as MonthNumber, DATENAME(MONTH, E1.InsertDate) as MonthName,
(SELECT COUNT(Id) FROM EvaluationHistoryTable E2 WHERE TicketStatus = 'CLOSED' AND YEAR(E2.InsertDate) = '2021') AS 'ClosedCount'
FROM EvaluationHistoryTable E1
WHERE YEAR(E1.InsertDate) = 2021
GROUP BY MONTH(InsertDate), DATENAME(MONTH, E1.InsertDate));
This is code that I'm working on:
var ytdClosureRateData = _context.EvaluationHistoryTable
.Where(t => t.InsertDate.Value.Year == DateTime.Now.Year)
.GroupBy(m => new
Month = m.InsertDate.Value.Month
.Select(g => new YtdTicketClosureRateModel
MonthName = DateTimeFormatInfo.CurrentInfo.GetAbbreviatedMonthName(g.Key.Month),
MonthNumber = g.Key.Month,
ItemCount = g.Count(),
ClosedCount = // problem
ClosureRate = // problem
.OrderBy(a => a.MonthNumber)
I am having rtouble trying to express the count of closed tickets (ClosedCount) in linq format, I need the count to calculate the ClosureRate.
This won't be the same SQL but it should produce the same result in memory:
var ytdClosureRateData = _context.EvaluationHistoryTable
.Where(t => t.InsertDate.Value.Year == DateTime.Now.Year)
.GroupBy(m => new
Month = m.InsertDate.Value.Month
.Select(g => new
Month = g.Key.Month,
ItemCount = g.Count(),
ClosedCount = g.Where(t => t.TicketStatus == "CLOSED").Count()
}).OrderBy(a => a.MonthNumber)
.Select(x => new YtdTicketClosureRateModel
MonthName = DateTimeFormatInfo.CurrentInfo.GetAbbreviatedMonthName(x.Month),
MonthNumber = x.Month,
ItemCount = x.ItemCount,
ClosedCount = x.ClosedCount,
ClosureRate = x.ClosedCount * 100D / x.ItemCount
Two techniques have been implemented here:
Use Fluent Query to specify the filter to apply for the ClosedCount set, you can combine Fluent and Query syntax to your hearts content, they each have pros and cons, in this instance it just simplifed the syntax to do it this way.
Focus the DB query on only bringing back the data that you need, the rest can be easily calculated in member after the initial DB execution. That is why there are 2 projections here, the first should be expressed purely in SQL, the rest is evaluated as Linq to Objects
The general assumption is that traffic over the wire and serialization are generally the bottle necks for simple queries like this, so we force Linq to Entities (or Linq to SQL) to produce the smallest payload that is practical and build the rest or the values and calculations in memory.
Svyatoslav Danyliv makes a really good point in this answer
The logic can be simplified, from both an SQL and LINQ perspective by using a CASE expression on the TicketStatus to return 1 or 0 and then we can simply sum that column, which means you can avoid a nested query and can simply join on the results.
Original query can be simplified to this one:
(ClosedCount * 100 / TicketCount) AS ClosureRate
COUNT(Id) AS TicketCount,
MONTH(InsertDate) AS MonthNumber,
DATENAME(MONTH, E1.InsertDate) AS MonthName,
SUM(CASE WHEN TicketStatus = 'CLOSED' THEN 1 ELSE 0 END) AS 'ClosedCount'
FROM EvaluationHistoryTable E1
WHERE YEAR(E1.InsertDate) = 2021
GROUP BY MONTH(InsertDate), DATENAME(MONTH, E1.InsertDate));
Which is easily convertible to server-side LINQ:
var grouped =
from eh in _context.EvaluationHistoryTable
where eh.InsertDate.Value.Year == DateTime.Now.Year
group eh by new { eh.InsertDate.Value.Month }
select new
ItemCount = g.Count(),
ClosedCount = g.Sum(t => t.TicketStatus == "CLOSED" ? 1 : 0)
var query =
from x in grouped
orderby x.Month
select new YtdTicketClosureRateModel
MonthName = DateTimeFormatInfo.CurrentInfo.GetAbbreviatedMonthName(x.Month),
MonthNumber = x.Month,
ItemCount = x.ItemCount,
ClosedCount = x.ClosedCount,
ClosureRate = x.ClosedCount * 100D / x.ItemCount
var result = query.ToList();
How can I convert this query into LINQ expression?
SELECT RH.Id,RH.Reputation,RH.ReputationIpId,CheckDateTime,IpGroup.Name,ROW_NUMBER()
OVER (PARTITION BY ReputationIpId ORDER BY CheckDateTime DESC) AS RowNum
FROM dbo.ReputationHistory As RH
INNER JOIN dbo.ReputationIps as IP ON RH.ReputationIpId=IP.Id
INNER JOIN dbo.ReputationMonitorGroups as IpGroup ON IP.ReputationMonitorGroupId=IpGroup.Id
SELECT AVG(Reputation),Name RowNum FROM MyRowSet WHERE RowNum <= 5 Group By RowNum,Name ORDER BY RowNum
What is this query doing?
It takes five last "Reputation" record for each "IP" in history
and then we get the average value for IpGroup.
First part of expression is:
.GroupBy(el => el.ReputationIpId)
.Select(grp => grp
.OrderBy(gr => gr.CheckDateTime)
It takes last 5 records for each IP. Then I need to get the average value for each row. For example average value for the first row of all IPs, second and so on.
I solve this. Here is the answer.
var result = _reputationHistoryRepository.GetAll()
.Include(ips => ips.ReputationIp)
.ThenInclude(ipGroups => ipGroups.ReputationMonitorGroup)
.GroupBy(repHistory => repHistory.ReputationIpId)
.Select(group => group
.OrderBy(repHistory => repHistory.CheckDateTime)
.Select((reputationHistory, index) => new
GroupName = reputationHistory.ReputationIp.ReputationMonitorGroup.Name,
ReputationHistoryObject = reputationHistory,
RowNum = index
.SelectMany(reputationHistory => reputationHistory)
.GroupBy(reputationHistory => new
.Select(reputationHistoryGroup => new
Reputation = reputationHistoryGroup.Average(x => x.ReputationHistoryObject.Reputation),
GrpName = reputationHistoryGroup.Key.GroupName,
RowNum = reputationHistoryGroup.Key.RowNum
I have a record set of approximetly 1 million records. I'm trying to query the records to report monthly figures.
The following MySQL query executes in about 0.3 seconds
SELECT SUM(total), MONTH(create_datetime), YEAR(create_datetime)
FROM orders GROUP BY MONTH(create_datetime), YEAR(create_datetime)
However I am unable to figure out an entity framework lambda expression that can execute any near as fast
The only statement I have come up with that actually works is
var monthlySales = db.Orders
.Select(c => new
Total = c.Total,
CreateDateTime = c.CreateDateTime
.GroupBy(c => new { c.CreateDateTime.Year, c.CreateDateTime.Month })
.Select(c => new
CreateDateTime = c.FirstOrDefault().CreateDateTime,
Total = c.Sum(d => d.Total)
.OrderBy(c => c.CreateDateTime)
But it is horribly slow.
How can I get this query to execute as quickly as it does directly in MySQL
When you do ".ToList()" in the middle of query (before doing grouping) EF will effectively query all orders from database in memory and then do grouping in C#. Depending on amount of data in your table, that can take a while and I think this is why your query is so slow.
Try to rewrite your query having only 1 expression that enumerates results (ToList, ToArray, AsEnumerable)
Try this:
var monthlySales = from c in db.Orders
group c by new { y = c.CreateDateTime.Year, m = c.CreateDateTime.Month } into g
select new {
Total = c.Sum(t => t.Total),
Year = g.Key.y,
Month = g.Key.m }).ToList();
I came across this setup which executes quickly
var monthlySales = db.Orders
.GroupBy(c => new { Year = c.CreateDateTime.Year, Month = c.CreateDateTime.Month })
.Select(c => new
Month = c.Key.Month,
Year = c.Key.Year,
Total = c.Sum(d => d.Total)
.OrderByDescending(a => a.Year)
.ThenByDescending(a => a.Month)
I have this query in SQL:
SELECT AVG(CAST(r.users AS NUMERIC)) FROM description r WHERE r.company = p.id
) DESC) AS rownb, p.* FROM company p
) rs WHERE rownb <= 2
The farthest I've come with converting this query to LINQ is:
.SelectMany(r =>
.GroupBy(t => t.Company)
.Select(t => new {
Average = t.Average(a => (double)a.Users),
Company = t.Key })
.OrderByDescending(t => t.Average)
, (p, r) => new { Companies = p, Descriptions = r })
.Where(t => t.Companies.Id == t.Descriptions.Company)
.GroupBy(t => t.Companies.Category)
.Select(t => t.Take(2))
.SelectMany(t => t)
.Select(t => new { t.Companies.Name, t.Descriptions.Average, t.Companies.Category})
.OrderBy(t => t.Category)
But the problem is the performance. While the SQL query cost is 28% (relative to the batch), the LINQ query is 72%.
I already replaced the Join with SelectMany in LINQ, which reduced by 20% the cost. But now I don't know how to optimize this query anymore.
Also, I understand there is no ROW_NUMBER in LINQ.
I'm using LINQPad to inspect the resulting SQL query.
Question: Is ROW_NUMBER the responsible for this performance difference? Is it possible to optimize the LINQ query further?
you can emulate row_number in your select like this
.Select((t,i) => new { rowindex = i+1, t.Companies.Name, t.Descriptions.Average, t.Companies.Category})
As for Optimizations, I'm not too sure.
I'm totally new to LINQ.
I have an SQL GroupBy which runs in barely a few milliseconds. But when I try to achieve the same thing via LINQ, it just seems awfully slow.
What I'm trying to achieve is fetch an average monthly duration of a ceratin database update.
In SQL =>
select SUBSTRING(yyyyMMdd, 0,7),
AVG (duration)
from (select (CONVERT(CHAR(8), mmud.logDateTime, 112)) as yyyyMMdd,
DateDIFF(ms, min(mmud.logDateTime), max(mmud.logDateTime)) as duration
from mydb.mydbo.updateData mmud
join mydb.mydbo.updateDataKeyValue mmudkv
on mmud.updateDataid = mmudkv.updateDataId
join mydb.mydbo.updateDataDetailKey mmuddk
on mmudkv.updateDataDetailKeyid = mmuddk.Id
where dbname = 'MY_NEW_DB'
and mmudkv.value in ('start', 'finish')
by (CONVERT(CHAR(8), mmud.logDateTime, 112))
) as resultSet
by substring(yyyyMMdd, 0,7)
by substring(yyyyMMdd, 0,7)
in LINQ => I first fetch the record from a table that links information of the Database Name and UpdateData and then do filtering and groupby on the related information.
ue => ue.updatedataKeyValue.Any(
uedkv =>
uedkv.Value.ToLower() == "starting update" ||
uedkv.Value.ToLower() == "client release"))
ue =>
logDateTimeyyyyMMdd = ue.logDateTime.Date,
logDateTime = ue.logDateTime
updateDataDetail => updateDataDetail.logDateTimeyyyyMMdd)
groupedupdatedata => new
UpdateDateyyyyMM = groupedupdatedata.Key.ToString("yyyyMMdd"),
Duration =
(groupedupdatedata.Max(groupMember => groupMember.logDateTime) -
groupedupdatedata.Min(groupMember => groupMember.logDateTime)
var updatedataMonthlyDurations =
updatedataInDateRangeWithDescriptions.GroupBy(ue => ue.UpdateDateyyyyMM.Substring(0,6))
group =>
new updatedataMonthlyAverageDuration
DbName = entry.DbName,
UpdateDateyyyyMM = group.Key.Substring(0,6),
Duration =
gmember =>
I know that GroupBy in LINQ isn't the same as GroupBy in T-SQL, but not sure what happens behind the scenes. Could anyone explain the difference and what happens in memory when I run the LINQ version? After I did the .ToList() after the first GroupBy things got a little faster. But even then this way of finding average duration is really slow.
What would be the best alternative and are there ways of improving a slow LINQ statement using Visual Studio 2012?
Your linq query is doing most of its work in linq-to-objects. You should be constructing a linq-to-entities/sql query that generates the complete query in one shot.
Your query seems to have a redundant group by clause, and I am not sure which table dbname comes from, but the following query should get you on the right track.
var query = from mmud in context.updateData
from mmudkv in context.updateDataKeyValue
.Where(x => mmud.updateDataid == x.updateDataId)
from mmuddk in context.updateDataDetailKey
.Where(x => mmudkv.updateDataDetailKeyid == x.Id)
where mmud.dbname == "MY_NEW_DB"
where mmudkv.value == "start" || mmudkv.value == "finish"
group mmud by mmud.logDateTime.Date into g
select new
Date = g.Key,
Average = EntityFunctions.DiffMilliseconds(g.Max(x => x.logDateTime), g.Min(x => x.logDateTime)),
var queryByMonth = from x in query
group x by new { x.Date.Year, x.Date.Month } into x
select new
Year = x.Key.Year,
Month = x.Key.Month,
Average = x.Average(y => y.Average)
// Single sql statement is to sent to your database
var result = queryByMonth.ToList();
If you are still having problems, we will need to know if you are using entityframework or linq-to-sql. And you will need to provide your context/model information