Write a comparable LINQ query for aggregate distinct count in sql? - c#

I want to get a count for each month but count should be only at most one per day even if there are multiple occurences . I have the SQL query which works right but having trouble to convert it into LINQ -
select
count(DISTINCT DAY(date)) as Monthly_Count,
MONTH(date) as Month,
YEAR(date)
from
activity
where
id=#id
group by
YEAR(date),
MONTH(date)
Could anyone help me translating the above query to LINQ. Thanks!

Per LINQ to SQL using GROUP BY and COUNT(DISTINCT) given by #Rick, this should work:
var query = from act in db.Activity
where act.Id == id
group act by new { act.Date.Year, act.Date.Month } into g
select new
{
MonthlyCount = g.Select(act => act.Date.Day).Distinct().Count(),
Month = g.Key.Month,
Year = g.Key.Year
};
I don't know if L2S can convert the inner g.Select(act => act.Date.Day).Distinct.Count() properly.

var results = db.activities.Where(a => a.id == myID)
.GroupBy(a => new
{
Month = a.date.Month,
Year = a.date.Year
})
.Select(g => new
{
Month = g.Key.Month,
Year = g.Key.Year,
Monthly_Count = g.Select(d => d.date.Day)
.Distinct()
.Count()
})

Related

SQL Server query with subquery to LINQ query

I wrote a SQL query that will get the count of tickets, closed tickets and its closure rate (%) and group it monthly basis (current year), but I would like to express this as a LINQ query to achieve the same result.
SELECT *, (ClosedCount * 100 / TicketCount) AS ClosureRate FROM (
SELECT COUNT(Id) as TicketCount, MONTH(InsertDate) as MonthNumber, DATENAME(MONTH, E1.InsertDate) as MonthName,
(SELECT COUNT(Id) FROM EvaluationHistoryTable E2 WHERE TicketStatus = 'CLOSED' AND YEAR(E2.InsertDate) = '2021') AS 'ClosedCount'
FROM EvaluationHistoryTable E1
WHERE YEAR(E1.InsertDate) = 2021
GROUP BY MONTH(InsertDate), DATENAME(MONTH, E1.InsertDate));
This is code that I'm working on:
var ytdClosureRateData = _context.EvaluationHistoryTable
.Where(t => t.InsertDate.Value.Year == DateTime.Now.Year)
.GroupBy(m => new
{
Month = m.InsertDate.Value.Month
})
.Select(g => new YtdTicketClosureRateModel
{
MonthName = DateTimeFormatInfo.CurrentInfo.GetAbbreviatedMonthName(g.Key.Month),
MonthNumber = g.Key.Month,
ItemCount = g.Count(),
ClosedCount = // problem
ClosureRate = // problem
}).AsEnumerable()
.OrderBy(a => a.MonthNumber)
.ToList();
I am having rtouble trying to express the count of closed tickets (ClosedCount) in linq format, I need the count to calculate the ClosureRate.
This won't be the same SQL but it should produce the same result in memory:
var ytdClosureRateData = _context.EvaluationHistoryTable
.Where(t => t.InsertDate.Value.Year == DateTime.Now.Year)
.GroupBy(m => new
{
Month = m.InsertDate.Value.Month
})
.Select(g => new
{
Month = g.Key.Month,
ItemCount = g.Count(),
ClosedCount = g.Where(t => t.TicketStatus == "CLOSED").Count()
}).OrderBy(a => a.MonthNumber)
.ToList()
.Select(x => new YtdTicketClosureRateModel
{
MonthName = DateTimeFormatInfo.CurrentInfo.GetAbbreviatedMonthName(x.Month),
MonthNumber = x.Month,
ItemCount = x.ItemCount,
ClosedCount = x.ClosedCount,
ClosureRate = x.ClosedCount * 100D / x.ItemCount
})
.ToList();
Two techniques have been implemented here:
Use Fluent Query to specify the filter to apply for the ClosedCount set, you can combine Fluent and Query syntax to your hearts content, they each have pros and cons, in this instance it just simplifed the syntax to do it this way.
Focus the DB query on only bringing back the data that you need, the rest can be easily calculated in member after the initial DB execution. That is why there are 2 projections here, the first should be expressed purely in SQL, the rest is evaluated as Linq to Objects
The general assumption is that traffic over the wire and serialization are generally the bottle necks for simple queries like this, so we force Linq to Entities (or Linq to SQL) to produce the smallest payload that is practical and build the rest or the values and calculations in memory.
UPDATE:
Svyatoslav Danyliv makes a really good point in this answer
The logic can be simplified, from both an SQL and LINQ perspective by using a CASE expression on the TicketStatus to return 1 or 0 and then we can simply sum that column, which means you can avoid a nested query and can simply join on the results.
Original query can be simplified to this one:
SELECT *,
(ClosedCount * 100 / TicketCount) AS ClosureRate
FROM (
SELECT
COUNT(Id) AS TicketCount,
MONTH(InsertDate) AS MonthNumber,
DATENAME(MONTH, E1.InsertDate) AS MonthName,
SUM(CASE WHEN TicketStatus = 'CLOSED' THEN 1 ELSE 0 END) AS 'ClosedCount'
FROM EvaluationHistoryTable E1
WHERE YEAR(E1.InsertDate) = 2021
GROUP BY MONTH(InsertDate), DATENAME(MONTH, E1.InsertDate));
Which is easily convertible to server-side LINQ:
var grouped =
from eh in _context.EvaluationHistoryTable
where eh.InsertDate.Value.Year == DateTime.Now.Year
group eh by new { eh.InsertDate.Value.Month }
select new
{
g.Key.Month,
ItemCount = g.Count(),
ClosedCount = g.Sum(t => t.TicketStatus == "CLOSED" ? 1 : 0)
};
var query =
from x in grouped
orderby x.Month
select new YtdTicketClosureRateModel
{
MonthName = DateTimeFormatInfo.CurrentInfo.GetAbbreviatedMonthName(x.Month),
MonthNumber = x.Month,
ItemCount = x.ItemCount,
ClosedCount = x.ClosedCount,
ClosureRate = x.ClosedCount * 100D / x.ItemCount
};
var result = query.ToList();

How to convert Mssql query into LINQ?

How can I convert this query into LINQ expression?
WITH MyRowSet
AS
(
SELECT RH.Id,RH.Reputation,RH.ReputationIpId,CheckDateTime,IpGroup.Name,ROW_NUMBER()
OVER (PARTITION BY ReputationIpId ORDER BY CheckDateTime DESC) AS RowNum
FROM dbo.ReputationHistory As RH
INNER JOIN dbo.ReputationIps as IP ON RH.ReputationIpId=IP.Id
INNER JOIN dbo.ReputationMonitorGroups as IpGroup ON IP.ReputationMonitorGroupId=IpGroup.Id
)
SELECT AVG(Reputation),Name RowNum FROM MyRowSet WHERE RowNum <= 5 Group By RowNum,Name ORDER BY RowNum
What is this query doing?
It takes five last "Reputation" record for each "IP" in history
and then we get the average value for IpGroup.
First part of expression is:
reputationHistoryRepository.GetAll()
.GroupBy(el => el.ReputationIpId)
.Select(grp => grp
.OrderBy(gr => gr.CheckDateTime)
.TakeLast(lastRecordNum))
It takes last 5 records for each IP. Then I need to get the average value for each row. For example average value for the first row of all IPs, second and so on.
I solve this. Here is the answer.
var result = _reputationHistoryRepository.GetAll()
.Include(ips => ips.ReputationIp)
.ThenInclude(ipGroups => ipGroups.ReputationMonitorGroup)
.GroupBy(repHistory => repHistory.ReputationIpId)
.Select(group => group
.OrderBy(repHistory => repHistory.CheckDateTime)
.TakeLast(countOfRecords)
.Select((reputationHistory, index) => new
{
GroupName = reputationHistory.ReputationIp.ReputationMonitorGroup.Name,
ReputationHistoryObject = reputationHistory,
RowNum = index
}))
.SelectMany(reputationHistory => reputationHistory)
.GroupBy(reputationHistory => new
{
reputationHistory.RowNum,
reputationHistory.GroupName
})
.Select(reputationHistoryGroup => new
{
Reputation = reputationHistoryGroup.Average(x => x.ReputationHistoryObject.Reputation),
GrpName = reputationHistoryGroup.Key.GroupName,
RowNum = reputationHistoryGroup.Key.RowNum
});

Slow EF query grouping data by Month/Year

I have a record set of approximetly 1 million records. I'm trying to query the records to report monthly figures.
The following MySQL query executes in about 0.3 seconds
SELECT SUM(total), MONTH(create_datetime), YEAR(create_datetime)
FROM orders GROUP BY MONTH(create_datetime), YEAR(create_datetime)
However I am unable to figure out an entity framework lambda expression that can execute any near as fast
The only statement I have come up with that actually works is
var monthlySales = db.Orders
.Select(c => new
{
Total = c.Total,
CreateDateTime = c.CreateDateTime
})
.GroupBy(c => new { c.CreateDateTime.Year, c.CreateDateTime.Month })
.Select(c => new
{
CreateDateTime = c.FirstOrDefault().CreateDateTime,
Total = c.Sum(d => d.Total)
})
.OrderBy(c => c.CreateDateTime)
.ToList();
But it is horribly slow.
How can I get this query to execute as quickly as it does directly in MySQL
When you do ".ToList()" in the middle of query (before doing grouping) EF will effectively query all orders from database in memory and then do grouping in C#. Depending on amount of data in your table, that can take a while and I think this is why your query is so slow.
Try to rewrite your query having only 1 expression that enumerates results (ToList, ToArray, AsEnumerable)
Try this:
var monthlySales = from c in db.Orders
group c by new { y = c.CreateDateTime.Year, m = c.CreateDateTime.Month } into g
select new {
Total = c.Sum(t => t.Total),
Year = g.Key.y,
Month = g.Key.m }).ToList();
I came across this setup which executes quickly
var monthlySales = db.Orders
.GroupBy(c => new { Year = c.CreateDateTime.Year, Month = c.CreateDateTime.Month })
.Select(c => new
{
Month = c.Key.Month,
Year = c.Key.Year,
Total = c.Sum(d => d.Total)
})
.OrderByDescending(a => a.Year)
.ThenByDescending(a => a.Month)
.ToList();

LINQ Multiple GroupBy Query Performing several times slower than T-SQL

I'm totally new to LINQ.
I have an SQL GroupBy which runs in barely a few milliseconds. But when I try to achieve the same thing via LINQ, it just seems awfully slow.
What I'm trying to achieve is fetch an average monthly duration of a ceratin database update.
In SQL =>
select SUBSTRING(yyyyMMdd, 0,7),
AVG (duration)
from (select (CONVERT(CHAR(8), mmud.logDateTime, 112)) as yyyyMMdd,
DateDIFF(ms, min(mmud.logDateTime), max(mmud.logDateTime)) as duration
from mydb.mydbo.updateData mmud
left
join mydb.mydbo.updateDataKeyValue mmudkv
on mmud.updateDataid = mmudkv.updateDataId
left
join mydb.mydbo.updateDataDetailKey mmuddk
on mmudkv.updateDataDetailKeyid = mmuddk.Id
where dbname = 'MY_NEW_DB'
and mmudkv.value in ('start', 'finish')
group
by (CONVERT(CHAR(8), mmud.logDateTime, 112))
) as resultSet
group
by substring(yyyyMMdd, 0,7)
order
by substring(yyyyMMdd, 0,7)
in LINQ => I first fetch the record from a table that links information of the Database Name and UpdateData and then do filtering and groupby on the related information.
entry.updatedata.Where(
ue => ue.updatedataKeyValue.Any(
uedkv =>
uedkv.Value.ToLower() == "starting update" ||
uedkv.Value.ToLower() == "client release"))
.Select(
ue =>
new
{
logDateTimeyyyyMMdd = ue.logDateTime.Date,
logDateTime = ue.logDateTime
})
.GroupBy(
updateDataDetail => updateDataDetail.logDateTimeyyyyMMdd)
.Select(
groupedupdatedata => new
{
UpdateDateyyyyMM = groupedupdatedata.Key.ToString("yyyyMMdd"),
Duration =
(groupedupdatedata.Max(groupMember => groupMember.logDateTime) -
groupedupdatedata.Min(groupMember => groupMember.logDateTime)
)
.TotalMilliseconds
}
).
ToList();
var updatedataMonthlyDurations =
updatedataInDateRangeWithDescriptions.GroupBy(ue => ue.UpdateDateyyyyMM.Substring(0,6))
.Select(
group =>
new updatedataMonthlyAverageDuration
{
DbName = entry.DbName,
UpdateDateyyyyMM = group.Key.Substring(0,6),
Duration =
group.Average(
gmember =>
(gmember.Duration))
}
).ToList();
I know that GroupBy in LINQ isn't the same as GroupBy in T-SQL, but not sure what happens behind the scenes. Could anyone explain the difference and what happens in memory when I run the LINQ version? After I did the .ToList() after the first GroupBy things got a little faster. But even then this way of finding average duration is really slow.
What would be the best alternative and are there ways of improving a slow LINQ statement using Visual Studio 2012?
Your linq query is doing most of its work in linq-to-objects. You should be constructing a linq-to-entities/sql query that generates the complete query in one shot.
Your query seems to have a redundant group by clause, and I am not sure which table dbname comes from, but the following query should get you on the right track.
var query = from mmud in context.updateData
from mmudkv in context.updateDataKeyValue
.Where(x => mmud.updateDataid == x.updateDataId)
.DefaultIfEmpty()
from mmuddk in context.updateDataDetailKey
.Where(x => mmudkv.updateDataDetailKeyid == x.Id)
.DefaultIfEmpty()
where mmud.dbname == "MY_NEW_DB"
where mmudkv.value == "start" || mmudkv.value == "finish"
group mmud by mmud.logDateTime.Date into g
select new
{
Date = g.Key,
Average = EntityFunctions.DiffMilliseconds(g.Max(x => x.logDateTime), g.Min(x => x.logDateTime)),
};
var queryByMonth = from x in query
group x by new { x.Date.Year, x.Date.Month } into x
select new
{
Year = x.Key.Year,
Month = x.Key.Month,
Average = x.Average(y => y.Average)
};
// Single sql statement is to sent to your database
var result = queryByMonth.ToList();
If you are still having problems, we will need to know if you are using entityframework or linq-to-sql. And you will need to provide your context/model information

LINQ - how to sort by date

I have the following table (Records):
RecordID int,
Nickname nvarchar(max),
DateAdded datetime
I need group by max count of records for Nickname. I made it:
var users = (from i in db.Records
where i.Form.CompetitionID == cID
group i by i.Nickname into g
orderby g.Count() descending
select new TopUserModel()
{
Nickname = g.Key,
Position = g.Count()
}).Take(100).ToList();
it works
Right now I need to sort it by date too (who first got max records).
I should have a request like it:
select Nickname, Count(*) as Result, MAX(DateAdded) as MDate from Records group by Nickname order by Result Desc, MDate Asc
How to do it by LINQ?
I think this is what you want. I've used extension version of Linq which is probably more easier. The idea is to calculate MaxCount and MaxDate after GroupBy so you can use it in next OrderBy clauses.
db.Records
.Where(i => i.Form.CompetitionID == cID)
.GroupBy(i => i.Nickname)
.Select(g => new { MaxCount = g.Count(), MaxDate = g.Max(i => i.DateAdded), Nickname = g.Key})
.OrderByDescending(gx => gx.MaxCount)
.ThenByDescending(gx => gx.MaxDate)
.Select(gx => new TopUserModel()
{
Nickname = gx.Nickname,
Position = gx.MaxCount
}).Take(100).ToList();
I think what you're asking for is:
...
select new TopUserModel()
{
Nickname = g.Key,
Position = g.Count()
Date = g.Max(r => r.DateAdded)
}).Take(100).OrderByDescending(t => t.Position).ThenBy(t => t.Date).ToList();
When you use group the Key is your grouping but the enumerable is all the records you've grouped so you can still use aggregate functions on them.
If you want to sort by multiple columns, you can put them in order using the chaining above.
var users = (from i in db.Records
where i.Form.CompetitionID == cID
group i by new {i.Nickname} into g
orderby g.Count() descending,
select new TopUserModel()
{
Nickname = g.Key,
Position = g.Count()
Date = g.Max(r => r.DateAdded)
}).Take(100
).OrderBy(c => c.Date).ToList();
Just add max date for nickname to ordering. You also can introduce new range variable for position:
var users = (from r in db.Records
where r.Form.CompetitionID == cID
group r by r.Nickname into g
let position = g.Count()
orderby position descending, g.Max(r => r.DateAdded)
select new TopUserModel {
Nickname = g.Key,
Position = position
}).Take(100).ToList();
Question is already answered here: OrderBy a date field with linq and return ToList()
Add at the end of your users statement
.OrderBy(e => e.Date).ToList();

Categories

Resources