I have this query in SQL:
SELECT * FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY p.category ORDER BY (
SELECT AVG(CAST(r.users AS NUMERIC)) FROM description r WHERE r.company = p.id
) DESC) AS rownb, p.* FROM company p
) rs WHERE rownb <= 2
The farthest I've come with converting this query to LINQ is:
Companies
.SelectMany(r =>
Descriptions
.GroupBy(t => t.Company)
.Select(t => new {
Average = t.Average(a => (double)a.Users),
Company = t.Key })
.OrderByDescending(t => t.Average)
, (p, r) => new { Companies = p, Descriptions = r })
.Where(t => t.Companies.Id == t.Descriptions.Company)
.GroupBy(t => t.Companies.Category)
.Select(t => t.Take(2))
.SelectMany(t => t)
.Select(t => new { t.Companies.Name, t.Descriptions.Average, t.Companies.Category})
.OrderBy(t => t.Category)
But the problem is the performance. While the SQL query cost is 28% (relative to the batch), the LINQ query is 72%.
I already replaced the Join with SelectMany in LINQ, which reduced by 20% the cost. But now I don't know how to optimize this query anymore.
Also, I understand there is no ROW_NUMBER in LINQ.
I'm using LINQPad to inspect the resulting SQL query.
Question: Is ROW_NUMBER the responsible for this performance difference? Is it possible to optimize the LINQ query further?
you can emulate row_number in your select like this
.Select((t,i) => new { rowindex = i+1, t.Companies.Name, t.Descriptions.Average, t.Companies.Category})
As for Optimizations, I'm not too sure.
Related
I have this SQL statement which is pretty instantaneous when running it:
select Distinct statuses.Description, count(*) as count
from referrals
inner join statuses on referrals.StatusId = statuses.id
group by statuses.Description
But when I run the below linq code with Entity Framework Core, it takes almost 5 minutes to run and there are only 680 rows in the database.
var data = context.Referrals
.Include(s => s.Status).AsEnumerable()
.GroupBy(r => r.Status)
.Select(g => new StatusCountItem
{
Status = g.Key.Description,
Count = g.Select(r => r).Count()
}).ToList();
Is there a way to write a similar Linq statement that won't take forever to run or do I need to figure out a different way to do what I want?
EDIT: when I don't have the AsEnumerable I get this error message which is why I added it:
The LINQ expression 'DbSet().Join(inner: DbSet(),
outerKeySelector: r => EF.Property<int?>(r, "StatusId"),
innerKeySelector: s => EF.Property<int?>(s, "Id"),
resultSelector: (o, i) => new TransparentIdentifier<Referral, Status>(Outer = o, Inner = i))
.GroupBy(r => r.Inner)' could not be translated. Either rewrite the query in a form that can be translated, or switch to client evaluation explicitly by inserting a call to 'AsEnumerable', 'AsAsyncEnumerable', 'ToList', or 'ToListAsync
Try this:
var data = context.Referrals
.GroupBy(r => r.StatusId) // notice the change here, you need to group by the id
.Select(g => new StatusCountItem()
{
Status = g.First().Status.Description,
Count = g.Count()
}).ToList();
Your Sql query is built based on context.Referrals.Include(s => s.Status).AsEnumerable(), which is equivalent to:
select *
from referrals
inner join statuses on referrals.StatusId = statuses.id
Note the star, you're querying every column. In other words, remove the random AsEnumerable() in the middle of your query.
Use this one, it is simple and will improve query performance.
from r in context.Referrals
join s in context.statuses on r.StatusId equals s.Id
select new { s.Description, r.StatusId , S.Id) into result
group result by new { s.Description } into g
select new {
CompanyName = g.Key.Description,
Count = g.Count()
}
I want to query aggregated values from multiple tables like this.
SELECT
(SELECT MAX(`A`) FROM `TableA`) as `MaxA`,
(SELECT COUNT(`A`) FROM `TableA` WHERE A > 55) as `CountA`,
(SELECT MIN(`B`) FROM `TableB`) as `MinB`
Is there a way to do something like this in EF Core in one query?
You can do UNION ALL for queries.
var query = ctx.TableA.GroupBy(x => 1)
.Select(g => g.Max(a => a.A))
.Concat(
ctx.TableB.GroupBy(x => 1)
.Select(g => g.Max(b => b.B))
);
Also if You are not restricted with third party etensions usage: linq2db.EntityFrameworkCore, note that I'm one of the creators.
Then this query can be written in simple way
using var db = ctx.CreateLinqToDBConnection();
var result = db.Select(() => new
{
A = ctx.TableA.Max(a => a.A),
B = ctx.TableB.Max(b => b.B)
});
I have a table with the following structure (and sample data):
Identifier
UseDate
PartId
a123
05/01/2000
237
a123
05/01/2000
4656
a123
01/01/2000
2134
a124
04/01/2000
5234
a124
01/01/2000
2890
I need to get the most recent entry of every (non-unique) identifier, but at most one per identifier.
The SQL-Query (MariaDB) that seems to fulfill my problem is the following:
SELECT a.Identifier, a.MaxDate, b.PartId, b.UseDate
FROM
(SELECT Identifier, MAX(UseDate) AS MaxDate FROM MyTable GROUP BY Identifier) a
LEFT JOIN MyTable b ON a.Identifier = b.Identifier
WHERE a.MaxDate = b.UseDate GROUP BY a.Identifier;
However I need this to work with C# and EF Core (Pomelo.EntitiFrameworkCore.MySql 5.0.3), my attempts have been:
var q1 = db.MyTable
.GroupBy(t => t.Identifier)
.Select(t => new { Identifier = t.Key, MaxDate = t.Max(x => x.UseDate) });
return new ObjectResult(db.MyTable
.Join(
q1,
t1 => t1.Identifier,
t2 => t2.Identifier,
(t1, t2) => new { Identifier = t2.Identifier, PartId = t1.PartId, MaxDate = t1.MaxDate, UseDate = t1.UseDate })
.Where(t => t.UseDate == q1.First(x => x.Identifier == t.Identifier).MaxDate)
.GroupBy(t => t.Identifier)
.ToList()
);
and
return new ObjectResult(db.MyTable
.GroupBy(t => t.Identifier)
.Select(t => t.OrderByDescending(x => x.UseDate).FirstOrDefault())
.ToList()
);
The first one throws this error:
System.InvalidOperationException: "Unable to translate the given 'GroupBy' pattern. Call 'AsEnumerable' before 'GroupBy' to evaluate it client-side."
The second one essentially yields the same, just complaining about the LINQ expression instead of the GroupBy.
I want to avoid using raw SQL, but how do I correctly (and hopefully efficiently) implement this?
There are many ways to write such query in LINQ, with most of them being able to be translated by EF Core 5/6+.
The straightforward approach once you have defined a subquery for the necessary grouping and aggregates is to join it to the data table, but not with join operator - instead, use row limiting correlated subquery (SelectMany with Where and Take), e.g.
var query = db.MyTable
.GroupBy(t => t.Identifier)
.Select(t => new { Identifier = t.Key, MaxDate = t.Max(x => x.UseDate) })
.SelectMany(g => db.MyTable
.Where(t => t.Identifier == g.Identifier && t.UseDate == g.MaxDate)
.Take(1));
If the ordering field is unique per each other key value (i.e. in your case if UseDate is unique per each unique Identifier value), you can use directly Join operator (since lo limiting is needed), e.g.
var query = db.MyTable
.GroupBy(t => t.Identifier)
.Select(t => new { Identifier = t.Key, MaxDate = t.Max(x => x.UseDate) });
.Join(db.MyTable,
g => new { g.Identifier, UseDate = g.MaxDate },
t => new { t.Identifier, t.UseDate },
(g, t) => t);
or directly apply Max based Where condition to the data table:
var query = db.MyTable
.Where(t => t.UseDate == db.MyTable
.Where(t2 => t2.Identifier == t.Identifier)
.Max(t2 => t2.UseDate)
);
Finally, the "standard" LINQ way of getting top 1 item per group.
For EF Core 6.0+:
var query = db.MyTable
.GroupBy(t => t.Identifier)
.Select(g => g
.OrderByDescending(t => t.UseDate)
.First());
For EF Core 5.0 the grouping result set inside the query must be emulated:
var query = db.MyTable
.GroupBy(t => t.Identifier)
.Select(g => db.MyTable
.Where(t => t.Identifier == g.Key)
.OrderByDescending(t => t.UseDate)
.First());
I was asked to produce a report that is driven by a fairly complex SQL query against a SQL Server database. Since the site of the report was already using Entity Framework 4.1, I thought I would attempt to write the query using EF and LINQ:
var q = from r in ctx.Responses
.Where(x => ctx.Responses.Where(u => u.UserId == x.UserId).Count() >= VALID_RESPONSES)
.GroupBy(x => new { x.User.AwardCity, x.Category.Label, x.ResponseText })
orderby r.FirstOrDefault().User.AwardCity, r.FirstOrDefault().Category.Label, r.Count() descending
select new
{
City = r.FirstOrDefault().User.AwardCity,
Category = r.FirstOrDefault().Category.Label,
Response = r.FirstOrDefault().ResponseText,
Votes = r.Count()
};
This query tallies votes, but only from users who have submitted a certain number of required minimum votes.
This approach was a complete disaster from a performance perspective, so we switched to ADO.NET and the query ran very quickly. I did look at the LINQ generated SQL using the SQL Profiler, and although it looked atrocious as usual I didn't see any clues as to how to optimize the LINQ statement to make it more efficient.
Here's the straight TSQL version:
WITH ValidUsers(UserId)
AS
(
SELECT UserId
FROM Responses
GROUP BY UserId
HAVING COUNT(*) >= 103
)
SELECT d.AwardCity
, c.Label
, r.ResponseText
, COUNT(*) AS Votes
FROM ValidUsers u
JOIN Responses r ON r.UserId = u.UserId
JOIN Categories c ON r.CategoryId = c.CategoryId
JOIN Demographics d ON r.UserId = d.Id
GROUP BY d.AwardCity, c.Label, r.ResponseText
ORDER BY d.AwardCity, s.SectionName, COUNT(*) DESC
What I'm wondering is: is this query just too complex for EF and LINQ to handle efficiently or have I missed a trick?
Using a let to reduce the number of r.First()'s will probably improve performance. It's probably not enough yet.
var q = from r in ctx.Responses
.Where()
.GroupBy()
let response = r.First()
orderby response.User.AwardCity, response.Category.Label, r.Count() descending
select new
{
City = response.User.AwardCity,
Category = response.Category.Label,
Response = response.ResponseText,
Votes = r.Count()
};
Maybe this change improve the performance, removing the resulting nested sql select in the where clause
First get the votes of each user and put them in a Dictionary
var userVotes = ctx.Responses.GroupBy(x => x.UserId )
.ToDictionary(a => a.Key.UserId, b => b.Count());
var cityQuery = ctx.Responses.ToList().Where(x => userVotes[x.UserId] >= VALID_RESPONSES)
.GroupBy(x => new { x.User.AwardCity, x.Category.Label, x.ResponseText })
.Select(r => new
{
City = r.First().User.AwardCity,
Category = r.First().Category.Label,
Response = r.First().ResponseText,
Votes = r.Count()
})
.OrderByDescending(r => r.City, r.Category, r.Votes());
Consider a SQL Server table that's used to store events for auditing.
The need is to get only that latest entry for each CustID. We want to get the entire object/row. I am assuming that a GroupBy() will be needed in the query. Here's the query so far:
var custsLastAccess = db.CustAccesses
.Where(c.AccessReason.Length>0)
.GroupBy(c => c.CustID)
// .Select()
.ToList();
// (?) where to put the c.Max(cu=>cu.AccessDate)
Question:
How can I create the query to select the latest(the maximum AccessDate) record/object for each CustID?
I'm wondering if something like:
var custsLastAccess = db.CustAccesses
.Where(c.AccessReason.Length>0)
.GroupBy(c => c.CustID)
.Select(grp => new {
grp.Key,
LastAccess = grp
.OrderByDescending(x => x.AccessDate)
.Select(x => x.AccessDate)
.FirstOrDefault()
}).ToList();
you could also try OrderBy() and Last()
Using LINQ syntax, which I think looks cleaner:
var custsLastAccess = from c in db.CustAccesses
group c by c.CustID into grp
select grp.OrderByDescending(c => c.AccessDate).FirstOrDefault();
Here: this uses max rather than OrderByDesc, so should be more efficient.
var subquery = from c in CustAccesses
group c by c.CustID into g
select new
{
CustID = g.Key,
AccessDate = g.Max(a => a.AccessDate)
};
var query = from c in CustAccesses
join s in subquery
on c.CustID equals s.CustID
where c.AccessDate == s.AccessDate
&& !string.IsNullOrEmpty(c.AccessReason)
select c;