I was asked to produce a report that is driven by a fairly complex SQL query against a SQL Server database. Since the site of the report was already using Entity Framework 4.1, I thought I would attempt to write the query using EF and LINQ:
var q = from r in ctx.Responses
.Where(x => ctx.Responses.Where(u => u.UserId == x.UserId).Count() >= VALID_RESPONSES)
.GroupBy(x => new { x.User.AwardCity, x.Category.Label, x.ResponseText })
orderby r.FirstOrDefault().User.AwardCity, r.FirstOrDefault().Category.Label, r.Count() descending
select new
{
City = r.FirstOrDefault().User.AwardCity,
Category = r.FirstOrDefault().Category.Label,
Response = r.FirstOrDefault().ResponseText,
Votes = r.Count()
};
This query tallies votes, but only from users who have submitted a certain number of required minimum votes.
This approach was a complete disaster from a performance perspective, so we switched to ADO.NET and the query ran very quickly. I did look at the LINQ generated SQL using the SQL Profiler, and although it looked atrocious as usual I didn't see any clues as to how to optimize the LINQ statement to make it more efficient.
Here's the straight TSQL version:
WITH ValidUsers(UserId)
AS
(
SELECT UserId
FROM Responses
GROUP BY UserId
HAVING COUNT(*) >= 103
)
SELECT d.AwardCity
, c.Label
, r.ResponseText
, COUNT(*) AS Votes
FROM ValidUsers u
JOIN Responses r ON r.UserId = u.UserId
JOIN Categories c ON r.CategoryId = c.CategoryId
JOIN Demographics d ON r.UserId = d.Id
GROUP BY d.AwardCity, c.Label, r.ResponseText
ORDER BY d.AwardCity, s.SectionName, COUNT(*) DESC
What I'm wondering is: is this query just too complex for EF and LINQ to handle efficiently or have I missed a trick?
Using a let to reduce the number of r.First()'s will probably improve performance. It's probably not enough yet.
var q = from r in ctx.Responses
.Where()
.GroupBy()
let response = r.First()
orderby response.User.AwardCity, response.Category.Label, r.Count() descending
select new
{
City = response.User.AwardCity,
Category = response.Category.Label,
Response = response.ResponseText,
Votes = r.Count()
};
Maybe this change improve the performance, removing the resulting nested sql select in the where clause
First get the votes of each user and put them in a Dictionary
var userVotes = ctx.Responses.GroupBy(x => x.UserId )
.ToDictionary(a => a.Key.UserId, b => b.Count());
var cityQuery = ctx.Responses.ToList().Where(x => userVotes[x.UserId] >= VALID_RESPONSES)
.GroupBy(x => new { x.User.AwardCity, x.Category.Label, x.ResponseText })
.Select(r => new
{
City = r.First().User.AwardCity,
Category = r.First().Category.Label,
Response = r.First().ResponseText,
Votes = r.Count()
})
.OrderByDescending(r => r.City, r.Category, r.Votes());
Related
Is there a way i can rewrite this query so it is not a correlated subqueries ?
var query = (from o in dbcontext.Orders
let lastStatus = o.OrderStatus.Where(x => x.OrderId == o.Id).OrderByDescending(x => x.CreatedDate).FirstOrDefault()
where lastStatus.OrderId != 1
select new { o.Name, lastStatus.Id }
).ToList();
This resulted in:
SELECT [o].[Name], (
SELECT TOP(1) [x0].[Id]
FROM [OrderStatus] AS [x0]
WHERE ([x0].[OrderId] = [o].[Id]) AND ([o].[Id] = [x0].[OrderId])
ORDER BY [x0].[CreatedDate] DESC
) AS [Id]
FROM [Orders] AS [o]
WHERE (
SELECT TOP(1) [x].[OrderId]
FROM [OrderStatus] AS [x]
WHERE ([x].[OrderId] = [o].[Id]) AND ([o].[Id] = [x].[OrderId])
ORDER BY [x].[CreatedDate] DESC
) <> 1
I have tried to do a join on a subquery but EF 2.1 is doing weird things... not what I expected;
var query = (from o in dbcontext.Orders
join lastStat in (from os in dbcontext.OrderStatus
orderby os.CreatedDate descending
select new { os }
) on o.Id equals lastStat.os.OrderId
where lastStat.os.StatusId != 1
select new { o.Name, lastStat.os.StatusId }).ToList();
In EF6 replacing
let x = (...).FirstOrDefault()
with
from x in (...).Take(1).DefaultIfEmpty()
usually generates better SQL.
So normally I would suggest
var query = (from o in db.Set<Order>()
from lastStatus in o.OrderStatus
.OrderByDescending(s => s.CreatedDate)
.Take(1)
where lastStatus.Id != 1
select new { o.Name, StatusId = lastStatus.Id }
).ToList();
(no need of DefaultIfEmpty (left join) because the where condition will turn it to inner join anyway).
Unfortunately currently (EF Core 2.1.4) there is a translation issue so the above leads to client evaluation.
The current workaround is to replace the navigation property accessor o.OrderStatus with correlated subquery:
var query = (from o in db.Set<Order>()
from lastStatus in db.Set<OrderStatus>()
.Where(s => o.Id == s.OrderId)
.OrderByDescending(s => s.CreatedDate)
.Take(1)
where lastStatus.Id != 1
select new { o.Name, StatusId = lastStatus.Id }
).ToList();
which produces the following SQL for SqlServer database (lateral join):
SELECT [o].[Name], [t].[Id] AS [StatusId]
FROM [Orders] AS [o]
CROSS APPLY (
SELECT TOP(1) [s].*
FROM [OrderStatus] AS [s]
WHERE [s].[OrderId] = [o].[Id]
ORDER BY [s].[CreatedDate] DESC
) AS [t]
WHERE [t].[Id] <> 1
I will assume that you are actually fetching all the Orders, but only a portion of them (a page or a batch for processing).
In this case, it might be better to split it in two queries (not tested though):
var orders = dbcontext.Orders.Where(o => /* some filter logic */);
var orderIds = orders.Select(o => o.OrderId).ToList();
// get status for latest change - this should query OrderStatus only
var statusNameMap = dbContext.OrderStatus
.Where(os => orderIds.Contains(Id))
.GroupBy(os => os.OrderId)
.Select(grp => grp.OrderByDescending(grp => grp.CreatedDate).First())
.ToDictionary(os => os.OrderId, os => os.StatusId);
// aggregate the results
// the orders might fetch only the needed columns to have less data on the wire
var result = orders.
.ToList()
.Select(o => new { o.Name, statusNameMap[o.OrderId] });
I do not think the queries will be nicer, but it might be easier to understand what is going on here.
If you really have to process all Orders and you have many of them (or many Statuses), you might consider maintaining a LastStatusId column directly in Order table (this should be updated whenever a status is changed).
I'm wondering if it is even possible to write the below sql query as a LINQ to Entity statement. Below is a simplified example of a real world problem that I'm trying to figure out:
Select
c.CustomerID,
c.CustomerName,
(Select count(p.ProductID) from Products p
where p.CustomerID = c.CustomerID and p.Category = 'HomeAppliance') as ApplianceCount,
(Select count(p.ProductID) from Products p
where p.CustomerID = c.CustomerID and p.Category = 'Furnishing') as FurnishingCount
from Customer c
where
c.CustomerMarket = 'GB'
order by c.CustomerID desc;
Any suggestions would be appreciated. Performance of the LINQ to Entity would need to be considered as it would involve retrieving lot of rows.
Something like (assuming the obvious context):
var res = await (from c in dbCtx.Customers
where c.CustomerMarket = "GB"
let homeCount = c.Products.Where(p => p.Category = "HomeAppliance").Count()
let furnCount = c.Products.Where(p => p.Category = "Furnishing").Count()
orderby c.CustomerID descending
select new {
CustomerID = c.CustomerID,
CustomerName = c.CustomerName,
ApplianceCount = homeCount,
FurnishingCount = furnCount
}).ToListAsync();
Performance of the LINQ to Entity would need to be considered as it would involve retrieving lot of rows.
You'll need to confirm the SQL generated is reasonable (best way to help that is not getting more columns than you need), after that performance is down to how well the server runs that SQL.
Yes, it is possible:
customers
.Where(cust => cust.CustomerMarket == "GB")
.Select(cust => new
{
cust.CustomerId,
cust.CustomerName,
ApplianceCount = products
.Where(prod => prod.CustomerId == cust.CustomerId && prod.Category == "HomeAppliance")
.Select(prod => prod.ProductId)
.Count(),
FurnishingCount = products
.Where(prod => prod.CustomerId == cust.CustomerId && prod.Category == "Furnishing")
.Select(prod => prod.ProductId)
.Count(),
});
Here both customers and products are IQueryable<T>s of the respective type.
I'm totally new to LINQ.
I have an SQL GroupBy which runs in barely a few milliseconds. But when I try to achieve the same thing via LINQ, it just seems awfully slow.
What I'm trying to achieve is fetch an average monthly duration of a ceratin database update.
In SQL =>
select SUBSTRING(yyyyMMdd, 0,7),
AVG (duration)
from (select (CONVERT(CHAR(8), mmud.logDateTime, 112)) as yyyyMMdd,
DateDIFF(ms, min(mmud.logDateTime), max(mmud.logDateTime)) as duration
from mydb.mydbo.updateData mmud
left
join mydb.mydbo.updateDataKeyValue mmudkv
on mmud.updateDataid = mmudkv.updateDataId
left
join mydb.mydbo.updateDataDetailKey mmuddk
on mmudkv.updateDataDetailKeyid = mmuddk.Id
where dbname = 'MY_NEW_DB'
and mmudkv.value in ('start', 'finish')
group
by (CONVERT(CHAR(8), mmud.logDateTime, 112))
) as resultSet
group
by substring(yyyyMMdd, 0,7)
order
by substring(yyyyMMdd, 0,7)
in LINQ => I first fetch the record from a table that links information of the Database Name and UpdateData and then do filtering and groupby on the related information.
entry.updatedata.Where(
ue => ue.updatedataKeyValue.Any(
uedkv =>
uedkv.Value.ToLower() == "starting update" ||
uedkv.Value.ToLower() == "client release"))
.Select(
ue =>
new
{
logDateTimeyyyyMMdd = ue.logDateTime.Date,
logDateTime = ue.logDateTime
})
.GroupBy(
updateDataDetail => updateDataDetail.logDateTimeyyyyMMdd)
.Select(
groupedupdatedata => new
{
UpdateDateyyyyMM = groupedupdatedata.Key.ToString("yyyyMMdd"),
Duration =
(groupedupdatedata.Max(groupMember => groupMember.logDateTime) -
groupedupdatedata.Min(groupMember => groupMember.logDateTime)
)
.TotalMilliseconds
}
).
ToList();
var updatedataMonthlyDurations =
updatedataInDateRangeWithDescriptions.GroupBy(ue => ue.UpdateDateyyyyMM.Substring(0,6))
.Select(
group =>
new updatedataMonthlyAverageDuration
{
DbName = entry.DbName,
UpdateDateyyyyMM = group.Key.Substring(0,6),
Duration =
group.Average(
gmember =>
(gmember.Duration))
}
).ToList();
I know that GroupBy in LINQ isn't the same as GroupBy in T-SQL, but not sure what happens behind the scenes. Could anyone explain the difference and what happens in memory when I run the LINQ version? After I did the .ToList() after the first GroupBy things got a little faster. But even then this way of finding average duration is really slow.
What would be the best alternative and are there ways of improving a slow LINQ statement using Visual Studio 2012?
Your linq query is doing most of its work in linq-to-objects. You should be constructing a linq-to-entities/sql query that generates the complete query in one shot.
Your query seems to have a redundant group by clause, and I am not sure which table dbname comes from, but the following query should get you on the right track.
var query = from mmud in context.updateData
from mmudkv in context.updateDataKeyValue
.Where(x => mmud.updateDataid == x.updateDataId)
.DefaultIfEmpty()
from mmuddk in context.updateDataDetailKey
.Where(x => mmudkv.updateDataDetailKeyid == x.Id)
.DefaultIfEmpty()
where mmud.dbname == "MY_NEW_DB"
where mmudkv.value == "start" || mmudkv.value == "finish"
group mmud by mmud.logDateTime.Date into g
select new
{
Date = g.Key,
Average = EntityFunctions.DiffMilliseconds(g.Max(x => x.logDateTime), g.Min(x => x.logDateTime)),
};
var queryByMonth = from x in query
group x by new { x.Date.Year, x.Date.Month } into x
select new
{
Year = x.Key.Year,
Month = x.Key.Month,
Average = x.Average(y => y.Average)
};
// Single sql statement is to sent to your database
var result = queryByMonth.ToList();
If you are still having problems, we will need to know if you are using entityframework or linq-to-sql. And you will need to provide your context/model information
I am struggling converting the following SQL query I wrote into Linq. I think I'm on the right track, but I must be missing something.
The error I'm getting right now is:
System.Linq.IQueryable does not contain a definition for .Contains
Which is confusing to me because it should right?
SQL
select Users.*
from Users
where UserID in (select distinct(UserID)
from UserPermission
where SupplierID in (select SupplierID
from UserPermission
where UserID = 6))
LINQ
var Users = (from u in _db.Users
where (from up in _db.UserPermissions select up.UserID)
.Distinct()
.Contains((from up2 in _db.UserPermissions
where up2.UserID == 6
select up2.SupplierID))
select u);
EDIT: I ended up going back to SqlCommand objects as this was something I had to get done today and couldn't waste too much time trying to figure out how to do it the right way with Linq and EF. I hate code hacks :(
I think there is no need to do a distinct here (maybe I am wrong). But here is a simpler version (assuming you have all the navigational properties defined correctly)
var lstUsers = DBContext.Users.Where(
x => x.UserPermissions.Any(
y => y.Suppliers.Any(z => z.UserID == 6)
)
).ToList();
Above if you have UserID field in Supplier entity, if it is NOT you can again use the navigational property as,
var lstUsers = DBContext.Users.Where(
x => x.UserPermissions.Any(
y => y.Suppliers.Any(z => z.User.UserID == 6)
)
).ToList();
Contains() only expects a single element, so it won't work as you have it written. Try this as an alternate:
var Users = _db.Users
.Where(u => _db.UserPermissions
.Select(x => UserID)
.Distinct()
.Where(x => _db.UserPermissions
.Where(y => y.UserID == 6)
.Select(y => y.SupplierID)
.Contains(x))
);
I didn't try on my side but you can try using the let keyword:
var Users = (from u in _db.Users
let distinctUsers = (from up in _db.UserPermissions select up).Distinct()
let subQuery = (from up2 in _db.UserPermissions
where up2.UserID == 6
select up2)
where
distinctUsers.SupplierID== subQuery.SupplierID &&
u.UserID==distinctUsers.UserID
select u);
i have 4 table in SQL: DocumentType,ClearanceDocument,Request, RequestDocument.
i want when page load and user select one request, show all Document Based on clearanceType in RequestTable and check in RequestDocument and when exist set is_exist=true
I have written this query with SqlServer Query Editor for get result this Scenario but i can't convert this Query to Linq
select *,
is_Orginal=
(select is_orginal from CLEARANCE_REQUEST_DOCUMENT
where
DOCUMENT_ID=a.DOCUMENT_ID and REQUEST_ID=3)
from
DOCUMENT_TYPES a
where
DOCUMENT_ID in
(select DOCUMENT_ID from CLEARANCE_DOCUMENTS dt
where
dt.CLEARANCE_ID=
(SELECT R.CLEARANCE_TYPE FROM CLEARANCE_REQUEST R
WHERE
R.REQUEST_ID=3))
i write this Query in linq but not work
var list = (from r in context.CLEARANCE_REQUEST
where r.REQUEST_ID == 3
join cd in context.CLEARANCE_DOCUMENTS on r.CLEARANCE_TYPE equals cd.CLEARANCE_ID
join dt in context.DOCUMENT_TYPES on cd.DOCUMENT_ID equals dt.DOCUMENT_ID into outer
from t in outer.DefaultIfEmpty()
select new
{
r.REQUEST_ID,
cd.CLEARANCE_ID,
t.DOCUMENT_ID,
t.DOCUMENT_NAME,
is_set=(from b in context.CLEARANCE_REQUEST_DOCUMENT where
b.REQUEST_ID==r.REQUEST_ID && b.DOCUMENT_ID==t.DOCUMENT_ID
select new{b.IS_ORGINAL})
}
).ToList();
I want convert this Query to LINQ. Please help me. Thanks.
There is no need to manually join objects returned from an Entity Framework context.
See Why use LINQ Join on a simple one-many relationship?
If you use the framework as intended your job will be much easier.
var result = var clearanceTypes = context.CLEARANCE_REQUEST
.Single(r => r.REQUEST_ID == 3)
.CLEARANCE_DOCUMENTS
.SelectMany(dt => dt.DOCUMENT_TYPES)
.Select(a => new
{
DocumentType = a,
IsOriginal = a.CLEARANCE_REQUEST_DOCUMENT.is_original
});
Since your query won't be executed untill you iterate over the data, you can split your query in several subqueries to help you obtain the results like this:
var clearanceIds = context.CLEARANCE_REQUEST
.Where(r => r.REQUEST_ID == 3)
.Select(r => r.CLEARANCE_TYPE);
var documentIds = context.CLEARANCE_DOCUMENTS
.Where(dt => clearanceIds.Contains(dt.CLEARANCE_ID))
.Select(dt => dt.DOCUMENT_ID);
var result = context.DOCUMENT_TYPES
.Where(a => documentIds.Contains(a.DOCUMENT_ID))
.Select(a => new
{
// Populate properties here
IsOriginal = context.CLEARANCE_REQUEST_DOCUMENT
.Single(item => item.DOCUMENT_ID == a.DOCUMENT_ID &&
item.REQUEST_ID == 3)
.IS_ORIGINAL
})
.ToList();