I need help generating a more efficient LINQ query:
Table: Positions
-PositionID
-Name
Table: Person
-PersonID
-Name, etc...
Table: PersonPosition
-PersonID
-PositionID
I need a result set that groups the people assigned to each position:
PositionID Person
1 John
Bob
Frank
2 Bill
Tom
Frank, etc...
My first thought was this LINQ query:
from perspos in PersonPositions
join pers in Persons on perspos.PersonID equals pers.PersonID
group pers by perspos.PositionID into groups
select new {groups.Key, groups}
Which works great, but produces the following SQL:
SELECT [t0].[PositionID] AS [Key]
FROM [PersonPosition] AS [t0]
INNER JOIN [Person] AS [t1] ON [t0].[PersonID] = [t1].[PersonID]
GROUP BY [t0].[PositionID]
GO
-- Region Parameters
DECLARE #x1 Int = 3
-- EndRegion
SELECT [t1].[PersonID], [t1].[UserID], [t1].[Firstname], [t1].[Lastname], [t1].[Email], [t1].[Phone], [t1].[Mobile], [t1].[Comment], [t1].[Permissions]
FROM [PersonPosition] AS [t0]
INNER JOIN [Person] AS [t1] ON [t0].[PersonID] = [t1].[PersonID]
WHERE #x1 = [t0].[PositionID]
GO
-- Region Parameters
DECLARE #x1 Int = 4
-- EndRegion
SELECT [t1].[PersonID], [t1].[UserID], [t1].[Firstname], [t1].[Lastname], [t1].[Email], [t1].[Phone], [t1].[Mobile], [t1].[Comment], [t1].[Permissions]
FROM [PersonPosition] AS [t0]
INNER JOIN [Person] AS [t1] ON [t0].[PersonID] = [t1].[PersonID]
WHERE #x1 = [t0].[PositionID]
GO
-- Region Parameters
DECLARE #x1 Int = 5
-- EndRegion
SELECT [t1].[PersonID], [t1].[UserID], [t1].[Firstname], [t1].[Lastname], [t1].[Email], [t1].[Phone], [t1].[Mobile], [t1].[Comment], [t1].[Permissions]
FROM [PersonPosition] AS [t0]
INNER JOIN [Person] AS [t1] ON [t0].[PersonID] = [t1].[PersonID]
WHERE #x1 = [t0].[PositionID]
GO
on and on...
Is there a better LINQ query that translates to a more efficient SQL statement?
You should already have the relationship defined in your database, and also on your dbml.
Avoid doing joins when you don't have to; they are really tedious. Let LINQ-to-SQL do this for you. Something like this should work:
var data = context.PersonPositions
.Select(pos => new { pos.PositionID, pos.Person });
return data.GroupBy(pos => pos.PositionID);
or
return context.Positions.Select(pos =>
new { pos, pos.PersonPositions.Select(pp => pp.Person).ToList() }).ToList();
I'm fairly sure you have to just join the tables and select the result, then call .AsEnumerable(), and group after that:
(from perspos in PersonPositions
join pers in Persons
on perspos.PersonID equals pers.PersonID
select new { perspos.PositionID, Person = pers })
.AsEnumerable().GroupBy(p => p.PositionID, p => p.Person);
Related
I've been at it all day. For the life of me, I cannot figure out how to translate either of the final two select statements found within the below code snippet:
declare #Person table
(
[Name] varchar(50),
[ABA] varchar(9)
)
declare #Entity table
(
[Name] varchar(50),
[Respondent] varchar(9),
[TierRespondent] varchar(9)
)
insert into #Person ([Name], [ABA])
select 'Steve', '000000001'
union
select 'Mary', '000000002'
union
select 'Carey', '000000003'
insert into #Entity ([Name], [Respondent], [TierRespondent])
select 'Steve', '000000001', '000000006'
union
select 'Mary', '000000004', '000000002'
union
select 'Carey', '000000005', '000000008'
select *
FROM #Entity e
LEFT JOIN #Person p
ON p.[ABA] = e.Respondent
or p.[ABA] = e.[TierRespondent]
select *
FROM #Entity e
LEFT JOIN #Person p
ON p.[ABA] in (e.Respondent ,e.[TierRespondent])
The thing that boggles my mind is the logic found within the ON clause of the join statements.
I'm not a SQL wiz, so I've even failed at trying to restructure these SELECT statements into a different form that gives me the same results, but is also easier to translate to LINQ.
Any ideas, anyone?
Thanks.
After some hours of pondering, I found the following sql to be an acceptable refactoring that can easily be translated to a linq query:
select *
from #entity e
left join #Person p
on 1 = 1
where p.[ABA] = e.Respondent
or p.[ABA] = e.[TierRespondent]
and p.ABA is not null
And here is the corresponding linq query:
from entity in entities
join p in persons on 1 equals 1 into groupJoin
from person in groupJoin.DefaultIfEmpty()
where person.ABA == entity.Respondent || person.ABA == entity.TierRespondent
&& person.ABA != null
select new
{
entity.Name,
entity.Respondent,
entity.TierRespondent,
person?.Name,
person?.ABA
}
The select statements of the original post contained logic within the join clause. This logic cannot be expressed within a linq join clasue. This refactoring moves the logic from the join, and into the where clause, which can easily be handled expressed in linq.
I am confused how EF LINQ queries are compiled and executed. When I run a piece of program in LINQPad couple of times, I get varied performance results (each time the same query takes different amount of time). Please find below my test execution environment.
tools used: EF v6.1 & LINQPad v5.08.
Ref DB : ContosoUniversity DB downloaded from MSDN.
For queries, I am using Persons, Courses & Departments tables from the above DB; see below.
Now, I have below data:
Query goal: get the second person and associated departments.
Query:
var test = (
from p in Persons
join d in Departments on p.ID equals d.InstructorID
select new {
person = p,
dept = d
}
);
var result = (from pd in test
group pd by pd.person.ID into grp
orderby grp.Key
select new {
ID = grp.Key,
FirstName = grp.First().person.FirstName,
Deps = grp.Where(x => x.dept != null).Select(x => x.dept).Distinct().ToList()
}).Skip(1).Take(1).ToList();
foreach(var r in result)
{
Console.WriteLine("person is..." + r.FirstName);
Console.WriteLine(r.FirstName + "' deps are...");
foreach(var d in r.Deps){
Console.WriteLine(d.Name);
}
}
When I run this I get the result and LINQPad shows time taken value from 3.515 sec to 0.004 sec (depending how much gap I take between different runs).
If I take the generated SQL query and execute it, that query always runs between 0.015 sec to 0.001sec.
Generated query:
-- Region Parameters
DECLARE #p0 Int = 1
DECLARE #p1 Int = 1
-- EndRegion
SELECT [t7].[ID], [t7].[value] AS [FirstName]
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY [t6].[ID]) AS [ROW_NUMBER], [t6].[ID], [t6].[value]
FROM (
SELECT [t2].[ID], (
SELECT [t5].[FirstName]
FROM (
SELECT TOP (1) [t3].[FirstName]
FROM [Person] AS [t3]
INNER JOIN [Department] AS [t4] ON ([t3].[ID]) = [t4]. [InstructorID]
WHERE [t2].[ID] = [t3].[ID]
) AS [t5]
) AS [value]
FROM (
SELECT [t0].[ID]
FROM [Person] AS [t0]
INNER JOIN [Department] AS [t1] ON ([t0].[ID]) = [t1].[InstructorID]
GROUP BY [t0].[ID]
) AS [t2]
) AS [t6]
) AS [t7]
WHERE [t7].[ROW_NUMBER] BETWEEN #p0 + 1 AND #p0 + #p1
ORDER BY [t7].[ROW_NUMBER]
GO
-- Region Parameters
DECLARE #x1 Int = 2
-- EndRegion
SELECT DISTINCT [t1].[DepartmentID], [t1].[Name], [t1].[Budget], [t1]. [StartDate], [t1].[InstructorID], [t1].[RowVersion]
FROM [Person] AS [t0]
INNER JOIN [Department] AS [t1] ON ([t0].[ID]) = [t1].[InstructorID]
WHERE #x1 = [t0].[ID]
My questions:
1) Are those LINQ statements correct? Or can they be optimized?
2) Is the time difference for LINQ query execution normal?
Another different question:
I have modified the first query to execute immediately (called ToList before the second query). This time generated SQL is very simple as shown below (it doesn't look like there is a SQL query for the first LINQ statement with ToList() included):
SELECT [t0].[ID], [t0].[LastName], [t0].[FirstName], [t0].[HireDate], [t0]. [EnrollmentDate], [t0].[Discriminator], [t1].[DepartmentID], [t1].[Name], [t1]. [Budget], [t1].[StartDate], [t1].[InstructorID], [t1].[RowVersion]
FROM [Person] AS [t0]
INNER JOIN [Department] AS [t1] ON ([t0].[ID]) = [t1].[InstructorID]
Running this modified query also took varied amount of time but the difference is not as big as the first query set run.
In my application, there going to be lot of rows and I prefer first query set to second one but I am confused.
Please guide.
(Note: I have a little SQL Server knowledge so, I am using LINQPad to fine tune queries based on the performance)
Thanks
How to use leftouter join in entity framework query
I have two table one is item and other is stock available I want to get all items and also get its quantity from stock available table which depend upon the particular department
For example
LINQ Query
var query = (from p in dc.GetTable<Person>()
join pa in dc.GetTable<PersonAddress>() on p.Id equals pa.PersonId into tempAddresses
from addresses in tempAddresses.DefaultIfEmpty()
select new { p.FirstName, p.LastName, addresses.State });
SQL Translation
SELECT [t0].[FirstName], [t0].[LastName], [t1].[State] AS [State]
FROM [dbo].[Person] AS [t0]
LEFT OUTER JOIN [dbo].[PersonAddress] AS [t1] ON [t0].[Id] = [t1].[PersonID]
I was working with the first method below, but then I found the second and want to know the difference and which is best.
What is the difference between:
from a in this.dataContext.reglements
join b in this.dataContext.Clients on a.Id_client equals b.Id
select...
and
from a in this.dataContext.reglements
from b in this.dataContext.Clients
where a.Id_client == b.Id
select...
I created a test case to test out the difference, and in your scenerio it turns out they are the same.
My test example used AdventureWorks but basically there is an association between
Products->CategoryId->Categories
var q = (
from p in Products
from c in Categories
where p.CategoryID==c.CategoryID
select p
);
q.ToList();
Produces this SQL:
SELECT [t0].[ProductID], [t0].[ProductName], [t0].[CategoryID]
FROM [Products] AS [t0], [Categories] AS [t1]
WHERE [t0].[CategoryID] = ([t1].[CategoryID])
var q2 = (
from p in Products
join c in Categories
on p.CategoryID equals c.CategoryID
select p);
q2.ToList();
Produces this sql:
SELECT [t0].[ProductID], [t0].[ProductName], [t0].[CategoryID]
FROM [Products] AS [t0]
INNER JOIN [Categories] AS [t1] ON [t0].[CategoryID] = ([t1].[CategoryID])
The difference between these two syntaxes will be in the way they are translated into SQL. You can trace Entity Framework or LINQ to SQL to determine the SQL:
LINQ to SQL: http://www.reflectionit.nl/Blog/PermaLinkcba15978-c792-44c9-aff2-26dbcc0da81e.aspx
Check the resulting SQL to determine if there are any differences that could affect performance.
I have the following group by linq statement
from c in Categories
join p in Products on c equals p.Category into ps
select new { Category = new {c.CategoryID, c.CategoryName}, Products = ps };
However this generates the following left outer join query and returns all categories even if there are no products associated.
SELECT [t0].[CategoryID], [t0].[CategoryName], [t1].[ProductID], [t1].[ProductName], [t1].[SupplierID], [t1].[CategoryID] AS [CategoryID2], [t1].[QuantityPerUnit], [t1].[UnitPrice], [t1].[UnitsInStock], [t1].[UnitsOnOrder], [t1].[ReorderLevel], [t1].[Discontinued], (
SELECT COUNT(*)
FROM [Products] AS [t2]
WHERE [t0].[CategoryID] = [t2].[CategoryID]
) AS [value]
FROM [Categories] AS [t0]
LEFT OUTER JOIN [Products] AS [t1] ON [t0].[CategoryID] = [t1].[CategoryID]
ORDER BY [t0].[CategoryID], [t1].[ProductID]
What I really want is to return only those categories that have associated products. But if I re-write the linq query like so:
from c in Categories
join p in Products on c equals p.Category
group p by new {c.CategoryID, c.CategoryName} into ps
select new { Category = ps.Key, Products = ps };
This gives me the desired result but a query is generated for each category:
SELECT [t0].[CategoryID], [t0].[CategoryName]
FROM [Categories] AS [t0]
INNER JOIN [Products] AS [t1] ON [t0].[CategoryID] = [t1].[CategoryID]
GROUP BY [t0].[CategoryID], [t0].[CategoryName]
GO
-- Region Parameters
DECLARE #x1 Int SET #x1 = 1
DECLARE #x2 NVarChar(9) SET #x2 = 'Beverages'
-- EndRegion
SELECT [t1].[ProductID], [t1].[ProductName], [t1].[SupplierID], [t1].[CategoryID], [t1].[QuantityPerUnit], [t1].[UnitPrice], [t1].[UnitsInStock], [t1].[UnitsOnOrder], [t1].[ReorderLevel], [t1].[Discontinued]
FROM [Categories] AS [t0]
INNER JOIN [Products] AS [t1] ON [t0].[CategoryID] = [t1].[CategoryID]
WHERE (#x1 = [t0].[CategoryID]) AND (#x2 = [t0].[CategoryName])
GO
-- Region Parameters
DECLARE #x1 Int SET #x1 = 2
DECLARE #x2 NVarChar(10) SET #x2 = 'Condiments'
-- EndRegion
SELECT [t1].[ProductID], [t1].[ProductName], [t1].[SupplierID], [t1].[CategoryID], [t1].[QuantityPerUnit], [t1].[UnitPrice], [t1].[UnitsInStock], [t1].[UnitsOnOrder], [t1].[ReorderLevel], [t1].[Discontinued]
FROM [Categories] AS [t0]
INNER JOIN [Products] AS [t1] ON [t0].[CategoryID] = [t1].[CategoryID]
WHERE (#x1 = [t0].[CategoryID]) AND (#x2 = [t0].[CategoryName])
GO
...
Is there a way to do the equivalent of a inner join and group by and still only produce a single query like the group join?
var queryYouWant =
from c in Categories
join p in Products on c equals p.Category
select new {Category = c, Product = p};
var result =
from x in queryYouWant.AsEnumerable()
group x.Product by x.Category into g
select new { Category = g.Key, Products = g };
Is there a way to do the equivalent of a inner join and group by and still only produce a single query like the group join?
No. When you say GroupBy followed by non-aggregated access of the group elements, that's a repeated query with the group key as a filter.
What is the purpose of that join?
Your original query is identical to this:
from c in Categories
select new { Category = new { c.CategoryID, c.CategoryName }, c.Products }
Am I somehow missing something obvious???
If you want only categories with products, then do this:
from c in Categories
where c.Products.Any()
select new { Category = new { c.CategoryID, c.CategoryName }, c.Products }
Or, if you want to flatten the results:
from p in Products
select new { p, p.Category.CategoryID, p.Category.CategoryName }
The latter will translate into an inner or outer join - depending on whether that relationship is nullable. You can force the equivalent of an inner join as follows:
from p in Products
where p.Category != null
select new { p, p.Category.CategoryID, p.Category.CategoryName }