Slow EF query grouping data by Month/Year - c#

I have a record set of approximetly 1 million records. I'm trying to query the records to report monthly figures.
The following MySQL query executes in about 0.3 seconds
SELECT SUM(total), MONTH(create_datetime), YEAR(create_datetime)
FROM orders GROUP BY MONTH(create_datetime), YEAR(create_datetime)
However I am unable to figure out an entity framework lambda expression that can execute any near as fast
The only statement I have come up with that actually works is
var monthlySales = db.Orders
.Select(c => new
{
Total = c.Total,
CreateDateTime = c.CreateDateTime
})
.GroupBy(c => new { c.CreateDateTime.Year, c.CreateDateTime.Month })
.Select(c => new
{
CreateDateTime = c.FirstOrDefault().CreateDateTime,
Total = c.Sum(d => d.Total)
})
.OrderBy(c => c.CreateDateTime)
.ToList();
But it is horribly slow.
How can I get this query to execute as quickly as it does directly in MySQL

When you do ".ToList()" in the middle of query (before doing grouping) EF will effectively query all orders from database in memory and then do grouping in C#. Depending on amount of data in your table, that can take a while and I think this is why your query is so slow.
Try to rewrite your query having only 1 expression that enumerates results (ToList, ToArray, AsEnumerable)

Try this:
var monthlySales = from c in db.Orders
group c by new { y = c.CreateDateTime.Year, m = c.CreateDateTime.Month } into g
select new {
Total = c.Sum(t => t.Total),
Year = g.Key.y,
Month = g.Key.m }).ToList();

I came across this setup which executes quickly
var monthlySales = db.Orders
.GroupBy(c => new { Year = c.CreateDateTime.Year, Month = c.CreateDateTime.Month })
.Select(c => new
{
Month = c.Key.Month,
Year = c.Key.Year,
Total = c.Sum(d => d.Total)
})
.OrderByDescending(a => a.Year)
.ThenByDescending(a => a.Month)
.ToList();

Related

How do I implement tsql's sum(...) over () in linq to SQL?

I have this functional t-sql query that counts the entries in a group by clause, and at the same time produces a percentage of the count compared to the entire set.
It is blazing fast (~90 ms) in Azure. I'd like to implement in a similar manner with LINQ to SQL, but I can't figure it out...
select f.worktype, f.counted, (100.0 * f.counted)/ (sum(f.counted) over ()) as percentage
from
(SELECT
wa.skillEN AS workType,
count(wa.skillEN) counted
FROM [dbo].WorkAssignments as WA
join [dbo].WorkOrders as WO ON (WO.ID = WA.workorderID)
WHERE wo.dateTimeOfWork < ('1/1/2014')
and wo.dateTimeOfWork > ('1/1/2013')
and wo.statusEN = 'Completed'
group by wa.skillEN) as f
group by f.worktype, f.counted
The LINQ I've been trying in LINQPad...
WorkAssignments
.Where(wa => wa.WorkOrder.DateTimeofWork > DateTime.Now.AddYears(-2)
&& wa.WorkOrder.DateTimeofWork < DateTime.Now)
.GroupBy(wa => wa.SkillEN)
.Select(g => new
{
label = g.Key,
count = g.Count()
})
.GroupBy(g => new {g.label, g.count})
.Select(gg => new
{
label = gg.Key.label,
count = gg.Key.count,
pct = gg.Sum(a => a.count)
})
(The dates in the where clause are slightly different, but I don't think it's relevant)
So, how would I implement the over () feature in LINQ to SQL?

LINQ Multiple GroupBy Query Performing several times slower than T-SQL

I'm totally new to LINQ.
I have an SQL GroupBy which runs in barely a few milliseconds. But when I try to achieve the same thing via LINQ, it just seems awfully slow.
What I'm trying to achieve is fetch an average monthly duration of a ceratin database update.
In SQL =>
select SUBSTRING(yyyyMMdd, 0,7),
AVG (duration)
from (select (CONVERT(CHAR(8), mmud.logDateTime, 112)) as yyyyMMdd,
DateDIFF(ms, min(mmud.logDateTime), max(mmud.logDateTime)) as duration
from mydb.mydbo.updateData mmud
left
join mydb.mydbo.updateDataKeyValue mmudkv
on mmud.updateDataid = mmudkv.updateDataId
left
join mydb.mydbo.updateDataDetailKey mmuddk
on mmudkv.updateDataDetailKeyid = mmuddk.Id
where dbname = 'MY_NEW_DB'
and mmudkv.value in ('start', 'finish')
group
by (CONVERT(CHAR(8), mmud.logDateTime, 112))
) as resultSet
group
by substring(yyyyMMdd, 0,7)
order
by substring(yyyyMMdd, 0,7)
in LINQ => I first fetch the record from a table that links information of the Database Name and UpdateData and then do filtering and groupby on the related information.
entry.updatedata.Where(
ue => ue.updatedataKeyValue.Any(
uedkv =>
uedkv.Value.ToLower() == "starting update" ||
uedkv.Value.ToLower() == "client release"))
.Select(
ue =>
new
{
logDateTimeyyyyMMdd = ue.logDateTime.Date,
logDateTime = ue.logDateTime
})
.GroupBy(
updateDataDetail => updateDataDetail.logDateTimeyyyyMMdd)
.Select(
groupedupdatedata => new
{
UpdateDateyyyyMM = groupedupdatedata.Key.ToString("yyyyMMdd"),
Duration =
(groupedupdatedata.Max(groupMember => groupMember.logDateTime) -
groupedupdatedata.Min(groupMember => groupMember.logDateTime)
)
.TotalMilliseconds
}
).
ToList();
var updatedataMonthlyDurations =
updatedataInDateRangeWithDescriptions.GroupBy(ue => ue.UpdateDateyyyyMM.Substring(0,6))
.Select(
group =>
new updatedataMonthlyAverageDuration
{
DbName = entry.DbName,
UpdateDateyyyyMM = group.Key.Substring(0,6),
Duration =
group.Average(
gmember =>
(gmember.Duration))
}
).ToList();
I know that GroupBy in LINQ isn't the same as GroupBy in T-SQL, but not sure what happens behind the scenes. Could anyone explain the difference and what happens in memory when I run the LINQ version? After I did the .ToList() after the first GroupBy things got a little faster. But even then this way of finding average duration is really slow.
What would be the best alternative and are there ways of improving a slow LINQ statement using Visual Studio 2012?
Your linq query is doing most of its work in linq-to-objects. You should be constructing a linq-to-entities/sql query that generates the complete query in one shot.
Your query seems to have a redundant group by clause, and I am not sure which table dbname comes from, but the following query should get you on the right track.
var query = from mmud in context.updateData
from mmudkv in context.updateDataKeyValue
.Where(x => mmud.updateDataid == x.updateDataId)
.DefaultIfEmpty()
from mmuddk in context.updateDataDetailKey
.Where(x => mmudkv.updateDataDetailKeyid == x.Id)
.DefaultIfEmpty()
where mmud.dbname == "MY_NEW_DB"
where mmudkv.value == "start" || mmudkv.value == "finish"
group mmud by mmud.logDateTime.Date into g
select new
{
Date = g.Key,
Average = EntityFunctions.DiffMilliseconds(g.Max(x => x.logDateTime), g.Min(x => x.logDateTime)),
};
var queryByMonth = from x in query
group x by new { x.Date.Year, x.Date.Month } into x
select new
{
Year = x.Key.Year,
Month = x.Key.Month,
Average = x.Average(y => y.Average)
};
// Single sql statement is to sent to your database
var result = queryByMonth.ToList();
If you are still having problems, we will need to know if you are using entityframework or linq-to-sql. And you will need to provide your context/model information

Entity framework use already selected value saved in new variable later in select sentance

I wrote some entity framework select:
var query = context.MyTable
.Select(a => new
{
count = a.OtherTable.Where(b => b.id == id).Sum(c => c.value),
total = a.OtherTable2.Where(d => d.id == id) * count ...
});
I have always select total:
var query = context.MyTable
.Select(a => new
{
count = a.OtherTable.Where(b => b.id == id).Sum(c => c.value),
total = a.OtherTable2.Where(d => d.id == id) * a.OtherTable.Where(b => b.id == id).Sum(c => c.value)
});
Is it possible to select it like in my first example, because I have already retrieved the value (and how to do that) or should I select it again?
One possible approach is to use two successive selects:
var query = context.MyTable
.Select(a => new
{
count = a.OtherTable.Where(b => b.id == id).Sum(c => c.value),
total = a.OtherTable2.Where(d => d.id == id)
})
.Select(x => new
{
count = x.count,
total = x.total * x.count
};
You would simple do
var listFromDatabase = context.MyTable;
var query1 = listFromDatabase.Select(a => // do something );
var query2 = listFromDatabase.Select(a => // do something );
Although to be fair, Select requires you to return some information, and you aren't, you're somewhere getting count & total and setting their values. If you want to do that, i would advise:
var listFromDatabase = context.MyTable.ToList();
listFromDatabase.ForEach(x =>
{
count = do_some_counting;
total = do_some_totalling;
});
Note, the ToList() function stops it from being IQueryable and transforms it to a solid list, also the List object allows the Linq ForEach.
If you're going to do complex stuff inside the Select I would always do:
context.MyTable.AsEnumerable()
Because that way you're not trying to still Query from the database.
So to recap: for the top part, my point is get all the table contents into variables, use ToList() to get actual results (do a workload). Second if trying to do it from a straight Query use AsEnumerable to allow more complex functions to be used inside the Select

LINQ: how to get a group of a table ordering with a related table?

I have a doubt about the object IGrouping that results from a linq where I use a "group by" sentence.
I have two tables in the database, Products and Responses they have a relationship 1 to *. In the Responses table we have a column called FinalRate which is the rate of the product. The products can have n responses or rates.
I want to get the Products order by the sum of the FinalRate divided by the number of rates done. That is to say, order by the average rate descending from higher to lower marks.
As it can be read in the code (at the end of the question), I try to get the responses first. To sum all the finalrates and divide them by the count I use a group.
There are 2 problems with the code, even if the current code works:
1.-I tried to get the Products in a single query but it is impossible because I can not use the products table in the group and then use the Response table in the "orderby". One more thing LINQ only gives you the possibility to group one table, it is imposible to have "group prod, response".
I couldn't get this sql sentence in LINQ:
select prod.ProductID,prod.Commercial_Product_Name,prod.Manufacturer_Name,
prod.ProductImageUrl
from rev_product prod
inner join rev_response res on res.AtProductid=prod.ProductID
group by prod.ProductID,prod.Commercial_Product_Name,prod.Manufacturer_Name
,prod.ProductImageUrl
order by (sum(res.FinalRate)/count(res.AtProductid))
I tried this:
var gruposproductos = (from prod in ctx.Products
join res in ctx.Responses on prod.ProductID equals res.AtProductId
group prod by prod.ProductID into g
orderby (g.Sum(ra =>ra.FinalRate)/g.Count())
descending select g).Take(2);
But as I say, the "orderby (g.Sum..." gives an error, because "into g" groups the Product table, not the Response Table.
So this is why in my final code I don't get the products in the same LINQ sentence.
2.-Once accepted this fact, the problem is that I get an IGrouping, but I don't obtain a list of Responses that I can iterate without doing the two foreach in the code. I wanted only one loop, as one would do if you had a "List" object.
It is not really a cool method but it works. Moreover, I have to control that in the second loop there is only added 1 time.
Any better code?
var groupproducts = (from res in ctx.Responses
group res by res.AtProductId into g
orderby (g.Sum(ra =>ra.FinalRate)/g.Count())
descending select g).Take(2).ToList();
List<Product> theproducts = new List<Product>();
foreach (var groupresponse in groupproducts)
{
foreach (var response in groupresponse)
{
var producttemp= (from prod in ctx.Products
where prod.ProductID == response.AtProductId
select prod).First();
theproducts.Add(producttemp);
}
}
}
FINAL SOLUTION (thx a lot #Daniel)
var productsanonymtype = ctx.Products.Select(x => new
{
Product = x,
AverageRating = x.Responses.Count() == 0 ? 0 : x.Responses.Select(r => (double)r.FinalRate).Sum() / x.Responses.Count()
}).OrderByDescending(x => x.AverageRating);
List<Product> products = new List<Product>();
foreach (var prod in productsanonymtype)
{
products.Add(prod.Product);
}
Try this:
products.Select(x => new
{
Product = x,
AverageRating = x.Responses.Sum(x => x.FinalRate) /
x.Responses.Count()
});
The Sum overload I am using is not implemented in all providers. If that's a problem for you, you can use this alternate version:
products.Select(x => new
{
Product = x,
AverageRating = x.Responses.Select(x => x.FinalRate)
.Sum() /
x.Responses.Count()
});
If there is no navigation property from product to its responses you should first try to fix that. If you can't you can use this version:
products.Join(responses, x => x.Id, x => x.ProductId,
(p, r) => new { Product = p, Response = r })
.GroupBy(x => x.Product)
.Select(g => new { Product = g.Key,
AverageRating = g.Select(x => x.Response.FinalRate)
.Sum() /
g.Count()
});
Assuming FinalRate is an int, both methods will calculate the average rating with an int, i.e. there will be no 4.5 rating. And there will be no rounding, i.e. an actual average rating of 4.9 will result in 4. You can fix that by casting one of the operands of the division to double.
Another problem is the case with no ratings so far. The code above will result in an exception in this case. If that's a problem for you, you can change the calculation to this:
AverageRating = g.Count() == 0
? 0
: g.Select(x => (double)x.Response.FinalRate).Sum() / g.Count()
ctx.Products.GroupBy(x => new {
ProductId = x.ProductId,
FinalRate = x.Responses.Sum(y => y.FinalRate),
CountProductId = x.Responses.Count
})
.OrderBy(x => x.Key.FinalRate / x.Key.CountProductId);
And here with the projection.....
ctx.Products.Select(x => new {
ProductID = x.ProductID,
Commercial_Product_Name = x.Commercial_Product_Name,
Manufacturer_Name = x.Manufacturer_Name,
ProductImageUrl = x.ProductImageUrl,
FinalRate = x.Responses.Sum(y => y.FinalRate),
CountProductId = x.Responses.Count
})
.GroupBy(x => new {
ProductId = x.ProductId,
FinalRate = x.FinalRate,
CountProductId = x.CountProductId
})
.OrderBy(x => x.Key.FinalRate / x.Key.CountProductId);

Write a comparable LINQ query for aggregate distinct count in sql?

I want to get a count for each month but count should be only at most one per day even if there are multiple occurences . I have the SQL query which works right but having trouble to convert it into LINQ -
select
count(DISTINCT DAY(date)) as Monthly_Count,
MONTH(date) as Month,
YEAR(date)
from
activity
where
id=#id
group by
YEAR(date),
MONTH(date)
Could anyone help me translating the above query to LINQ. Thanks!
Per LINQ to SQL using GROUP BY and COUNT(DISTINCT) given by #Rick, this should work:
var query = from act in db.Activity
where act.Id == id
group act by new { act.Date.Year, act.Date.Month } into g
select new
{
MonthlyCount = g.Select(act => act.Date.Day).Distinct().Count(),
Month = g.Key.Month,
Year = g.Key.Year
};
I don't know if L2S can convert the inner g.Select(act => act.Date.Day).Distinct.Count() properly.
var results = db.activities.Where(a => a.id == myID)
.GroupBy(a => new
{
Month = a.date.Month,
Year = a.date.Year
})
.Select(g => new
{
Month = g.Key.Month,
Year = g.Key.Year,
Monthly_Count = g.Select(d => d.date.Day)
.Distinct()
.Count()
})

Categories

Resources