I have this SQL statement which is pretty instantaneous when running it:
select Distinct statuses.Description, count(*) as count
from referrals
inner join statuses on referrals.StatusId = statuses.id
group by statuses.Description
But when I run the below linq code with Entity Framework Core, it takes almost 5 minutes to run and there are only 680 rows in the database.
var data = context.Referrals
.Include(s => s.Status).AsEnumerable()
.GroupBy(r => r.Status)
.Select(g => new StatusCountItem
{
Status = g.Key.Description,
Count = g.Select(r => r).Count()
}).ToList();
Is there a way to write a similar Linq statement that won't take forever to run or do I need to figure out a different way to do what I want?
EDIT: when I don't have the AsEnumerable I get this error message which is why I added it:
The LINQ expression 'DbSet().Join(inner: DbSet(),
outerKeySelector: r => EF.Property<int?>(r, "StatusId"),
innerKeySelector: s => EF.Property<int?>(s, "Id"),
resultSelector: (o, i) => new TransparentIdentifier<Referral, Status>(Outer = o, Inner = i))
.GroupBy(r => r.Inner)' could not be translated. Either rewrite the query in a form that can be translated, or switch to client evaluation explicitly by inserting a call to 'AsEnumerable', 'AsAsyncEnumerable', 'ToList', or 'ToListAsync
Try this:
var data = context.Referrals
.GroupBy(r => r.StatusId) // notice the change here, you need to group by the id
.Select(g => new StatusCountItem()
{
Status = g.First().Status.Description,
Count = g.Count()
}).ToList();
Your Sql query is built based on context.Referrals.Include(s => s.Status).AsEnumerable(), which is equivalent to:
select *
from referrals
inner join statuses on referrals.StatusId = statuses.id
Note the star, you're querying every column. In other words, remove the random AsEnumerable() in the middle of your query.
Use this one, it is simple and will improve query performance.
from r in context.Referrals
join s in context.statuses on r.StatusId equals s.Id
select new { s.Description, r.StatusId , S.Id) into result
group result by new { s.Description } into g
select new {
CompanyName = g.Key.Description,
Count = g.Count()
}
Related
I have this query in SQL:
SELECT * FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY p.category ORDER BY (
SELECT AVG(CAST(r.users AS NUMERIC)) FROM description r WHERE r.company = p.id
) DESC) AS rownb, p.* FROM company p
) rs WHERE rownb <= 2
The farthest I've come with converting this query to LINQ is:
Companies
.SelectMany(r =>
Descriptions
.GroupBy(t => t.Company)
.Select(t => new {
Average = t.Average(a => (double)a.Users),
Company = t.Key })
.OrderByDescending(t => t.Average)
, (p, r) => new { Companies = p, Descriptions = r })
.Where(t => t.Companies.Id == t.Descriptions.Company)
.GroupBy(t => t.Companies.Category)
.Select(t => t.Take(2))
.SelectMany(t => t)
.Select(t => new { t.Companies.Name, t.Descriptions.Average, t.Companies.Category})
.OrderBy(t => t.Category)
But the problem is the performance. While the SQL query cost is 28% (relative to the batch), the LINQ query is 72%.
I already replaced the Join with SelectMany in LINQ, which reduced by 20% the cost. But now I don't know how to optimize this query anymore.
Also, I understand there is no ROW_NUMBER in LINQ.
I'm using LINQPad to inspect the resulting SQL query.
Question: Is ROW_NUMBER the responsible for this performance difference? Is it possible to optimize the LINQ query further?
you can emulate row_number in your select like this
.Select((t,i) => new { rowindex = i+1, t.Companies.Name, t.Descriptions.Average, t.Companies.Category})
As for Optimizations, I'm not too sure.
I'm totally new to LINQ.
I have an SQL GroupBy which runs in barely a few milliseconds. But when I try to achieve the same thing via LINQ, it just seems awfully slow.
What I'm trying to achieve is fetch an average monthly duration of a ceratin database update.
In SQL =>
select SUBSTRING(yyyyMMdd, 0,7),
AVG (duration)
from (select (CONVERT(CHAR(8), mmud.logDateTime, 112)) as yyyyMMdd,
DateDIFF(ms, min(mmud.logDateTime), max(mmud.logDateTime)) as duration
from mydb.mydbo.updateData mmud
left
join mydb.mydbo.updateDataKeyValue mmudkv
on mmud.updateDataid = mmudkv.updateDataId
left
join mydb.mydbo.updateDataDetailKey mmuddk
on mmudkv.updateDataDetailKeyid = mmuddk.Id
where dbname = 'MY_NEW_DB'
and mmudkv.value in ('start', 'finish')
group
by (CONVERT(CHAR(8), mmud.logDateTime, 112))
) as resultSet
group
by substring(yyyyMMdd, 0,7)
order
by substring(yyyyMMdd, 0,7)
in LINQ => I first fetch the record from a table that links information of the Database Name and UpdateData and then do filtering and groupby on the related information.
entry.updatedata.Where(
ue => ue.updatedataKeyValue.Any(
uedkv =>
uedkv.Value.ToLower() == "starting update" ||
uedkv.Value.ToLower() == "client release"))
.Select(
ue =>
new
{
logDateTimeyyyyMMdd = ue.logDateTime.Date,
logDateTime = ue.logDateTime
})
.GroupBy(
updateDataDetail => updateDataDetail.logDateTimeyyyyMMdd)
.Select(
groupedupdatedata => new
{
UpdateDateyyyyMM = groupedupdatedata.Key.ToString("yyyyMMdd"),
Duration =
(groupedupdatedata.Max(groupMember => groupMember.logDateTime) -
groupedupdatedata.Min(groupMember => groupMember.logDateTime)
)
.TotalMilliseconds
}
).
ToList();
var updatedataMonthlyDurations =
updatedataInDateRangeWithDescriptions.GroupBy(ue => ue.UpdateDateyyyyMM.Substring(0,6))
.Select(
group =>
new updatedataMonthlyAverageDuration
{
DbName = entry.DbName,
UpdateDateyyyyMM = group.Key.Substring(0,6),
Duration =
group.Average(
gmember =>
(gmember.Duration))
}
).ToList();
I know that GroupBy in LINQ isn't the same as GroupBy in T-SQL, but not sure what happens behind the scenes. Could anyone explain the difference and what happens in memory when I run the LINQ version? After I did the .ToList() after the first GroupBy things got a little faster. But even then this way of finding average duration is really slow.
What would be the best alternative and are there ways of improving a slow LINQ statement using Visual Studio 2012?
Your linq query is doing most of its work in linq-to-objects. You should be constructing a linq-to-entities/sql query that generates the complete query in one shot.
Your query seems to have a redundant group by clause, and I am not sure which table dbname comes from, but the following query should get you on the right track.
var query = from mmud in context.updateData
from mmudkv in context.updateDataKeyValue
.Where(x => mmud.updateDataid == x.updateDataId)
.DefaultIfEmpty()
from mmuddk in context.updateDataDetailKey
.Where(x => mmudkv.updateDataDetailKeyid == x.Id)
.DefaultIfEmpty()
where mmud.dbname == "MY_NEW_DB"
where mmudkv.value == "start" || mmudkv.value == "finish"
group mmud by mmud.logDateTime.Date into g
select new
{
Date = g.Key,
Average = EntityFunctions.DiffMilliseconds(g.Max(x => x.logDateTime), g.Min(x => x.logDateTime)),
};
var queryByMonth = from x in query
group x by new { x.Date.Year, x.Date.Month } into x
select new
{
Year = x.Key.Year,
Month = x.Key.Month,
Average = x.Average(y => y.Average)
};
// Single sql statement is to sent to your database
var result = queryByMonth.ToList();
If you are still having problems, we will need to know if you are using entityframework or linq-to-sql. And you will need to provide your context/model information
I was asked to produce a report that is driven by a fairly complex SQL query against a SQL Server database. Since the site of the report was already using Entity Framework 4.1, I thought I would attempt to write the query using EF and LINQ:
var q = from r in ctx.Responses
.Where(x => ctx.Responses.Where(u => u.UserId == x.UserId).Count() >= VALID_RESPONSES)
.GroupBy(x => new { x.User.AwardCity, x.Category.Label, x.ResponseText })
orderby r.FirstOrDefault().User.AwardCity, r.FirstOrDefault().Category.Label, r.Count() descending
select new
{
City = r.FirstOrDefault().User.AwardCity,
Category = r.FirstOrDefault().Category.Label,
Response = r.FirstOrDefault().ResponseText,
Votes = r.Count()
};
This query tallies votes, but only from users who have submitted a certain number of required minimum votes.
This approach was a complete disaster from a performance perspective, so we switched to ADO.NET and the query ran very quickly. I did look at the LINQ generated SQL using the SQL Profiler, and although it looked atrocious as usual I didn't see any clues as to how to optimize the LINQ statement to make it more efficient.
Here's the straight TSQL version:
WITH ValidUsers(UserId)
AS
(
SELECT UserId
FROM Responses
GROUP BY UserId
HAVING COUNT(*) >= 103
)
SELECT d.AwardCity
, c.Label
, r.ResponseText
, COUNT(*) AS Votes
FROM ValidUsers u
JOIN Responses r ON r.UserId = u.UserId
JOIN Categories c ON r.CategoryId = c.CategoryId
JOIN Demographics d ON r.UserId = d.Id
GROUP BY d.AwardCity, c.Label, r.ResponseText
ORDER BY d.AwardCity, s.SectionName, COUNT(*) DESC
What I'm wondering is: is this query just too complex for EF and LINQ to handle efficiently or have I missed a trick?
Using a let to reduce the number of r.First()'s will probably improve performance. It's probably not enough yet.
var q = from r in ctx.Responses
.Where()
.GroupBy()
let response = r.First()
orderby response.User.AwardCity, response.Category.Label, r.Count() descending
select new
{
City = response.User.AwardCity,
Category = response.Category.Label,
Response = response.ResponseText,
Votes = r.Count()
};
Maybe this change improve the performance, removing the resulting nested sql select in the where clause
First get the votes of each user and put them in a Dictionary
var userVotes = ctx.Responses.GroupBy(x => x.UserId )
.ToDictionary(a => a.Key.UserId, b => b.Count());
var cityQuery = ctx.Responses.ToList().Where(x => userVotes[x.UserId] >= VALID_RESPONSES)
.GroupBy(x => new { x.User.AwardCity, x.Category.Label, x.ResponseText })
.Select(r => new
{
City = r.First().User.AwardCity,
Category = r.First().Category.Label,
Response = r.First().ResponseText,
Votes = r.Count()
})
.OrderByDescending(r => r.City, r.Category, r.Votes());
i have 4 table in SQL: DocumentType,ClearanceDocument,Request, RequestDocument.
i want when page load and user select one request, show all Document Based on clearanceType in RequestTable and check in RequestDocument and when exist set is_exist=true
I have written this query with SqlServer Query Editor for get result this Scenario but i can't convert this Query to Linq
select *,
is_Orginal=
(select is_orginal from CLEARANCE_REQUEST_DOCUMENT
where
DOCUMENT_ID=a.DOCUMENT_ID and REQUEST_ID=3)
from
DOCUMENT_TYPES a
where
DOCUMENT_ID in
(select DOCUMENT_ID from CLEARANCE_DOCUMENTS dt
where
dt.CLEARANCE_ID=
(SELECT R.CLEARANCE_TYPE FROM CLEARANCE_REQUEST R
WHERE
R.REQUEST_ID=3))
i write this Query in linq but not work
var list = (from r in context.CLEARANCE_REQUEST
where r.REQUEST_ID == 3
join cd in context.CLEARANCE_DOCUMENTS on r.CLEARANCE_TYPE equals cd.CLEARANCE_ID
join dt in context.DOCUMENT_TYPES on cd.DOCUMENT_ID equals dt.DOCUMENT_ID into outer
from t in outer.DefaultIfEmpty()
select new
{
r.REQUEST_ID,
cd.CLEARANCE_ID,
t.DOCUMENT_ID,
t.DOCUMENT_NAME,
is_set=(from b in context.CLEARANCE_REQUEST_DOCUMENT where
b.REQUEST_ID==r.REQUEST_ID && b.DOCUMENT_ID==t.DOCUMENT_ID
select new{b.IS_ORGINAL})
}
).ToList();
I want convert this Query to LINQ. Please help me. Thanks.
There is no need to manually join objects returned from an Entity Framework context.
See Why use LINQ Join on a simple one-many relationship?
If you use the framework as intended your job will be much easier.
var result = var clearanceTypes = context.CLEARANCE_REQUEST
.Single(r => r.REQUEST_ID == 3)
.CLEARANCE_DOCUMENTS
.SelectMany(dt => dt.DOCUMENT_TYPES)
.Select(a => new
{
DocumentType = a,
IsOriginal = a.CLEARANCE_REQUEST_DOCUMENT.is_original
});
Since your query won't be executed untill you iterate over the data, you can split your query in several subqueries to help you obtain the results like this:
var clearanceIds = context.CLEARANCE_REQUEST
.Where(r => r.REQUEST_ID == 3)
.Select(r => r.CLEARANCE_TYPE);
var documentIds = context.CLEARANCE_DOCUMENTS
.Where(dt => clearanceIds.Contains(dt.CLEARANCE_ID))
.Select(dt => dt.DOCUMENT_ID);
var result = context.DOCUMENT_TYPES
.Where(a => documentIds.Contains(a.DOCUMENT_ID))
.Select(a => new
{
// Populate properties here
IsOriginal = context.CLEARANCE_REQUEST_DOCUMENT
.Single(item => item.DOCUMENT_ID == a.DOCUMENT_ID &&
item.REQUEST_ID == 3)
.IS_ORIGINAL
})
.ToList();
Consider a SQL Server table that's used to store events for auditing.
The need is to get only that latest entry for each CustID. We want to get the entire object/row. I am assuming that a GroupBy() will be needed in the query. Here's the query so far:
var custsLastAccess = db.CustAccesses
.Where(c.AccessReason.Length>0)
.GroupBy(c => c.CustID)
// .Select()
.ToList();
// (?) where to put the c.Max(cu=>cu.AccessDate)
Question:
How can I create the query to select the latest(the maximum AccessDate) record/object for each CustID?
I'm wondering if something like:
var custsLastAccess = db.CustAccesses
.Where(c.AccessReason.Length>0)
.GroupBy(c => c.CustID)
.Select(grp => new {
grp.Key,
LastAccess = grp
.OrderByDescending(x => x.AccessDate)
.Select(x => x.AccessDate)
.FirstOrDefault()
}).ToList();
you could also try OrderBy() and Last()
Using LINQ syntax, which I think looks cleaner:
var custsLastAccess = from c in db.CustAccesses
group c by c.CustID into grp
select grp.OrderByDescending(c => c.AccessDate).FirstOrDefault();
Here: this uses max rather than OrderByDesc, so should be more efficient.
var subquery = from c in CustAccesses
group c by c.CustID into g
select new
{
CustID = g.Key,
AccessDate = g.Max(a => a.AccessDate)
};
var query = from c in CustAccesses
join s in subquery
on c.CustID equals s.CustID
where c.AccessDate == s.AccessDate
&& !string.IsNullOrEmpty(c.AccessReason)
select c;