Entity Framework query with 2 many-to-many joins - c#

I'm working on a project which is using EF Code first and has the following model relationships:
Item (Id, Name, virtual List<Category>, virtual List<Tag>)
Category (Id, Name, virtual List<Item>)
Tag (Id, Name, virtual List<Item>)
I'm running a search where I would like to get all items where the item name = searchTerm, the category id is contained in a list of ints and where the tag name exists in a list of tags.
public IEnumerable<Item> Search(string searchTerm, IEnumerable<int> categoryIds, IEnumerable<string> tags)
{
var query = (
from i in context.Items
from c in context.Categories
from t in context.Tags
where i.Name.Contains(searchTerm)
&& categoryIds.Contains(c.Id)
&& tags.Contains(t.Name)
select i);
return query.ToList();
}
In SQL the query would look like the following:
SELECT I.* FROM Items I
INNER JOIN ItemItemCategories IIC ON IIC.Item_Id = I.Id
INNER JOIN ItemCategories C ON C.Id = IIC.ItemCategory_Id
INNER JOIN ItemItemTags IIT ON IIT.Item_Id = I.Id
INNER JOIN ItemTags T On T.Id = IIT.ItemTag_Id
WHERE I.Question like '%sample%' -- searchTerm
AND C.Id in (1,2) -- categoryIds
AND (T.Text like '%Difficult%' OR T.Text like '%Technical%') -- tags
My question is how can I form my code to return the query above. This is the most efficient way to perform the query from my knowledge. Currently the following query is being run from code:
SELECT
[Filter1].[Id1] AS [Id],
[Filter1].[Name] AS [Name]
FROM (
SELECT
[Extent1].[Id] AS [Id1],
[Extent1].[Name] AS [Name]
FROM [dbo].[Items] AS [Extent1]
CROSS JOIN [dbo].[Categories] AS [Extent2]
WHERE [Extent2].[Id] IN (1, 2) ) AS [Filter1]
CROSS JOIN [dbo].[Tags] AS [Extent3]
WHERE ([Filter1].[Name] LIKE #p__linq__0 ESCAPE N'~') AND ([Extent3].[Name] IN (N'Difficult', N'Technical')) AND ([Extent3].[Name] IS NOT NULL)

Try this:
var query = ( from i in context.Items
from c in i.Categories
from t in i.Tags
where i.Name.Contains(searchTerm)
&& categoryIds.Contains(c.Id)
&& tags.Contains(t.Name)
select i).ToList();
You do not have to search through all the categories and tags elements, only those who are related with you Item.
About the query you want, IMHO I don't think there's a more efficient query in Linq to Entities to get the result you are expecting that the query I propose above. Look the sql code that is generated:
SELECT
[Filter1].[Id] AS [Id],
[Filter1].[Name] AS [Name]
FROM (SELECT [Extent1].[Id] AS [Id], [Extent1].[Name] AS [Name]
FROM [dbo].[Items] AS [Extent1]
INNER JOIN [dbo].[ItemCategories] AS [Extent2] ON [Extent1].[Id] = [Extent2].[Item_Id]
WHERE [Extent2].[Category_Id] IN (1, 2) ) AS [Filter1]
INNER JOIN (SELECT [Extent3].[Item_Id] AS [Item_Id]
FROM [dbo].[TagItems] AS [Extent3]
INNER JOIN [dbo].[Tags] AS [Extent4] ON [Extent4].[Id] = [Extent3].[Tag_Id]
WHERE ([Extent4].[Name] IN (N'Difficult', N'Technical')) AND ([Extent4].[Name] IS NOT NULL) ) AS [Filter2] ON [Filter1].[Id] = [Filter2].[Item_Id]
WHERE [Filter1].[Name] LIKE #p__linq__0 ESCAPE N'~'
As you can see it is quite similar with the query that you expect.

Related

How to improve LINQ statement to use INNER JOIN in resulting SQL statement?

Assuming the following code that applies filtering logic to a passed on collection.
private IQueryable<Customer> ApplyCustomerFilter(CustomerFilter filter, IQueryable<Customer> customers)
{
...
if (filter.HasProductInBackOrder == true)
{
customers = customers.Where(c => c.Orders.Any(o => o.Products.Any(p => p.Status == ProductStatus.BackOrder)))
}
....
return customers;
}
Results in this SQL statement:
SELECT [Extent1].[CustomerId] AS [CustomerId],
[Extent1].[Status] AS [Status]
FROM [Customers] AS [Extent1]
WHERE
(
EXISTS
(
SELECT 1 AS [C1]
FROM
(
SELECT [Extent3].[OrderId] AS [OrderId]
FROM [Orders] AS [Extent3]
WHERE [Extent1].[CustomerId] = [Extent3].[CustomerId]
) AS [Project1]
WHERE EXISTS
(
SELECT 1 AS [C1]
FROM [Products] AS [Extent4]
WHERE ([Project1].[OrderId] = [Extent4].[OrderId])
AND ([Extent4].[Status] = #p__linq__6)
)
)
)
However, I would like to optimize this by forcing to use INNER JOINS so that the result will be similar to this:
SELECT [Extent1].[CustomerId] AS [CustomerId],
[Extent1].[Status] AS [Status]
FROM [Customers] AS [Extent1]
INNER JOIN [Orders] AS [Extent2] ON [Extent1].[CustomerId] = [Extent2].[CustomerId]
INNER JOIN [Products] AS [Extent3] ON [Extent2].[OrderId] = [Extent3].[OrderId]
WHERE [Extent3].[Status] = #p__linq__6
I've tried multiple approaches, but I was unable to accomplish the desired result. Any suggestions on how to force the correct joins and avoiding subselects?

Entity Framework Include returning different values for Single and Where

My application can be used to send messages to a combination of single users and teams containing sets of users. The structure is such, that a Message entity has one Recipients entity, which in turn has a list of User entities and a list of Team entities.
I am trying to get a list of Message entities using EF Code First and Linq-to-Entities, and I want to Include the Recipients and Teams to avoid large amounts of lazy loading requests later on.
The strange thing is, the Teams list is always empty if I use the Include clause. After some experimenting, it boils down to this:
var messages = GetAll()
.Include(m => m.Recipients.Teams)
.Where(m => m.Id == 123)
.ToList();
returns a list with one message, where the Teams list is empty. (GetAll() just returns an IQueryable<Message>.) But if I do
var message = GetAll()
.Include(m => m.Recipients.Teams)
.Single(m => m.Id == 123);
then I get the single message, with the Teams correctly populated.
Any ideas why this is happening?
Edit: Here's the generated SQL (taken from Entity Framework Profiler)
Where statement
SELECT *
FROM (SELECT [Extent1].[Id] AS [Id],
[Extent1].[ParentRelation] AS [ParentRelation],
[Extent1].[CreatedUtc] AS [CreatedUtc],
[Extent1].[Subject] AS [Subject],
[Extent1].[Introduction] AS [Introduction],
[Extent1].[Body] AS [Body],
[Extent1].[GlobalId] AS [GlobalId],
[Extent1].[Team_Id] AS [Team_Id],
[Extent1].[Creator_Id] AS [Creator_Id],
[Extent1].[Parent_Id] AS [Parent_Id],
[Extent1].[ReplyTo_Id] AS [ReplyTo_Id],
[Join1].[Id1] AS [Id1],
[Join1].[ToSupervisors] AS [ToSupervisors],
[Join1].[Organisation_Id] AS [Organisation_Id],
[Join1].[Id2] AS [Id2],
[Join4].[Id3] AS [Id3],
[Join4].[Name] AS [Name],
[Join4].[CreatedUtc] AS [CreatedUtc1],
[Join4].[Description] AS [Description],
[Join4].[Color] AS [Color],
[Join4].[Status] AS [Status],
[Join4].[Organisation_Id] AS [Organisation_Id1],
CASE
WHEN ([Join4].[Recipients_Id1] IS NULL) THEN CAST(NULL AS int)
ELSE 1
END AS [C1]
FROM [dbo].[Messages] AS [Extent1]
INNER JOIN (SELECT [Extent2].[Id] AS [Id1],
[Extent2].[ToSupervisors] AS [ToSupervisors],
[Extent2].[Organisation_Id] AS [Organisation_Id],
[Extent3].[Id] AS [Id2]
FROM [dbo].[Recipients] AS [Extent2]
LEFT OUTER JOIN [dbo].[MessageExtensions] AS [Extent3]
ON [Extent2].[Id] = [Extent3].[Recipients_Id]) AS [Join1]
ON [Extent1].[Recipients_Id] = [Join1].[Id1]
LEFT OUTER JOIN (SELECT [Extent4].[Recipients_Id] AS [Recipients_Id1],
[Extent5].[Id] AS [Id3],
[Extent5].[Name] AS [Name],
[Extent5].[CreatedUtc] AS [CreatedUtc],
[Extent5].[Description] AS [Description],
[Extent5].[Color] AS [Color],
[Extent5].[Status] AS [Status],
[Extent5].[Organisation_Id] AS [Organisation_Id],
[Extent6].[Recipients_Id] AS [Recipients_Id2]
FROM [dbo].[RecipientsTeams] AS [Extent4]
INNER JOIN [dbo].[Teams] AS [Extent5]
ON [Extent4].[Team_Id] = [Extent5].[Id]
INNER JOIN [dbo].[MessageExtensions] AS [Extent6]
ON 1 = 1) AS [Join4]
ON ([Extent1].[Recipients_Id] = [Join4].[Recipients_Id2])
AND ([Extent1].[Recipients_Id] = [Join4].[Recipients_Id1])
WHERE 11021 = [Extent1].[Id]) AS [Project1]
ORDER BY [Project1].[Id] ASC,
[Project1].[Id1] ASC,
[Project1].[Id2] ASC,
[Project1].[C1] ASC
Single statement
SELECT *
FROM (SELECT [Limit1].[Id1] AS [Id],
[Limit1].[ParentRelation] AS [ParentRelation],
[Limit1].[CreatedUtc] AS [CreatedUtc],
[Limit1].[Subject] AS [Subject],
[Limit1].[Introduction] AS [Introduction],
[Limit1].[Body1] AS [Body],
[Limit1].[GlobalId1] AS [GlobalId],
[Limit1].[Team_Id] AS [Team_Id],
[Limit1].[Creator_Id] AS [Creator_Id],
[Limit1].[Parent_Id] AS [Parent_Id],
[Limit1].[ReplyTo_Id] AS [ReplyTo_Id],
[Limit1].[Id2] AS [Id1],
[Limit1].[ToSupervisors] AS [ToSupervisors],
[Limit1].[Organisation_Id] AS [Organisation_Id],
[Limit1].[Id3] AS [Id2],
[Join5].[Id4] AS [Id3],
[Join5].[Name] AS [Name],
[Join5].[CreatedUtc1] AS [CreatedUtc1],
[Join5].[Description] AS [Description],
[Join5].[Color] AS [Color],
[Join5].[Status] AS [Status],
[Join5].[Organisation_Id] AS [Organisation_Id1],
CASE
WHEN ([Join5].[Recipients_Id1] IS NULL) THEN CAST(NULL AS int)
ELSE 1
END AS [C1]
FROM (SELECT TOP (2) [Extent1].[Id] AS [Id1],
[Extent1].[ParentRelation] AS [ParentRelation],
[Extent1].[CreatedUtc] AS [CreatedUtc],
[Extent1].[Subject] AS [Subject],
[Extent1].[Introduction] AS [Introduction],
[Extent1].[Body] AS [Body1],
[Extent1].[GlobalId] AS [GlobalId1],
[Extent1].[Team_Id] AS [Team_Id],
[Extent1].[Creator_Id] AS [Creator_Id],
[Extent1].[Parent_Id] AS [Parent_Id],
[Extent1].[ReplyTo_Id] AS [ReplyTo_Id],
[Join1].[Id2],
[Join1].[ToSupervisors],
[Join1].[Organisation_Id],
[Join1].[Id3]
FROM [dbo].[Messages] AS [Extent1]
INNER JOIN (SELECT [Extent2].[Id] AS [Id2],
[Extent2].[ToSupervisors] AS [ToSupervisors],
[Extent2].[Organisation_Id] AS [Organisation_Id],
[Extent3].[Id] AS [Id3]
FROM [dbo].[Recipients] AS [Extent2]
LEFT OUTER JOIN [dbo].[MessageExtensions] AS [Extent3]
ON [Extent2].[Id] = [Extent3].[Recipients_Id]) AS [Join1]
ON [Extent1].[Recipients_Id] = [Join1].[Id2]
WHERE 11021 = [Extent1].[Id]) AS [Limit1]
LEFT OUTER JOIN (SELECT [Extent4].[Recipients_Id] AS [Recipients_Id1],
[Extent5].[Id] AS [Id4],
[Extent5].[Name] AS [Name],
[Extent5].[CreatedUtc] AS [CreatedUtc1],
[Extent5].[Description] AS [Description],
[Extent5].[Color] AS [Color],
[Extent5].[Status] AS [Status],
[Extent5].[Organisation_Id] AS [Organisation_Id],
[Join4].[Id5],
[Join4].[Recipients_Id2]
FROM [dbo].[RecipientsTeams] AS [Extent4]
INNER JOIN [dbo].[Teams] AS [Extent5]
ON [Extent4].[Team_Id] = [Extent5].[Id]
INNER JOIN (SELECT [Extent6].[Id] AS [Id5],
[Extent6].[Recipients_Id] AS [Recipients_Id2]
FROM [dbo].[Messages] AS [Extent6]
LEFT OUTER JOIN [dbo].[MessageExtensions] AS [Extent7]
ON [Extent6].[Recipients_Id] = [Extent7].[Recipients_Id]) AS [Join4]
ON [Extent4].[Recipients_Id] = [Join4].[Recipients_Id2]) AS [Join5]
ON [Limit1].[Id1] = [Join5].[Id5]) AS [Project1]
ORDER BY [Project1].[Id] ASC,
[Project1].[Id1] ASC,
[Project1].[Id2] ASC,
[Project1].[C1] ASC
When I run these queries by hand, I have the same result. For the Where, the Team related properties are all NULL, while for the Single, they are populated.
Edit 2 The GetAll method is a repository method
public virtual IQueryable<T> GetAll()
{
return Context.Set<T>();
}
where T is Message
Can you try this?
var messages = GetAll().Include(m => m.Recipients.Teams) .Where(m => m.Id == 123).Select(m=>m);

LINQ and Entity Framework - Avoiding subqueries

I'm having really hard time tuning up one of my Entity Framework generated queries in my application. It is very basic query but for some reason EF uses multiple inner subqueries which seem to perform horribly in DB instead of using joins.
Here's my LINQ code:
Projects.Select(proj => new ProjectViewModel()
{
Name = proj.Name,
Id = proj.Id,
Total = proj.Subvalue.Where(subv =>
subv.Created >= startDate
&& subv.Created <= endDate
&&
(subv.StatusId == 1 ||
subv.StatusId == 2))
.Select(c => c.SubValueSum)
.DefaultIfEmpty()
.Sum()
})
.OrderByDescending(c => c.Total)
.Take(10);
EF generates really complex query with multiple subqueries which has awful query performance like this:
SELECT TOP (10)
[Project3].[Id] AS [Id],
[Project3].[Name] AS [Name],
[Project3].[C1] AS [C1]
FROM ( SELECT
[Project2].[Id] AS [Id],
[Project2].[Name] AS [Name],
[Project2].[C1] AS [C1]
FROM ( SELECT
[Extent1].[Id] AS [Id],
[Extent1].[Name] AS [Name],
(SELECT
SUM([Join1].[A1]) AS [A1]
FROM ( SELECT
CASE WHEN ([Project1].[C1] IS NULL) THEN cast(0 as decimal(18)) ELSE [Project1].[SubValueSum] END AS [A1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable1]
LEFT OUTER JOIN (SELECT
[Extent2].[SubValueSum] AS [SubValueSum],
cast(1 as tinyint) AS [C1]
FROM [dbo].[Subvalue] AS [Extent2]
WHERE ([Extent1].[Id] = [Extent2].[Id]) AND ([Extent2].[Created] >= '2015-08-01') AND ([Extent2].[Created] <= '2015-10-01') AND ([Extent2].[StatusId] IN (1,2)) ) AS [Project1] ON 1 = 1
) AS [Join1]) AS [C1]
FROM [dbo].[Project] AS [Extent1]
WHERE ([Extent1].[ProjectCountryId] = 77) AND ([Extent1].[Active] = 1)
) AS [Project2]
) AS [Project3]
ORDER BY [Project3].[C1] DESC;
The execution time of the query generated by EF is ~10 seconds. But when I write the query by hand like this:
select
TOP (10)
Proj.Id,
Proj.Name,
SUM(Subv.SubValueSum) AS Total
from
SubValue as Subv
left join
Project as Proj on Proj.Id = Subv.ProjectId
where
Subv.Created > '2015-08-01' AND Subv.Created <= '2015-10-01' AND Subv.StatusId IN (1,2)
group by
Proj.Id,
Proj.Name
order by
Total DESC
The execution time is near instant; below 30ms.
The problem clearly lies in my ability to write good EF queries with LINQ but no matter what I try to do (using Linqpad for testing) I just can't write similar performant query with LINQ\EF as I can write by hand. I've trie querying the SubValue table and Project table but the endcome is mostly the same: multiple ineffective nested subqueries instead of a single join doing the work.
How can I write a query which imitates the hand written SQL shown above? How can I control the actual query generated by EF? And most importantly: how can I get Linq2SQL and Entity Framework to use Joins when I want to instead of nested subqueries.
EF generates SQL from the LINQ expression you provide and you cannot expect this conversion to completely unravel the structure of whatever you put into the expression in order to optimize it. In your case you have created an expression tree that for each project will use a navigation property to sum some subvalues related to the project. This results in nested subqueries as you have discovered.
To improve on the generated SQL you need to avoid navigating from project to subvalue before doing all the operations on subvalue and you can do this by creating a join (which is also what you do in you hand crafted SQL):
var query = from proj in context.Project
join s in context.SubValue.Where(s => s.Created >= startDate && s.Created <= endDate && (s.StatusId == 1 || s.StatusId == 2)) on proj.Id equals s.ProjectId into s2
from subv in s2.DefaultIfEmpty()
select new { proj, subv } into x
group x by new { x.proj.Id, x.proj.Name } into g
select new {
g.Key.Id,
g.Key.Name,
Total = g.Select(y => y.subv.SubValueSum).Sum()
} into y
orderby y.Total descending
select y;
var result = query.Take(10);
The basic idea is to join projects on subvalues restricted by a where clause. To perform a left join you need the DefaultIfEmpty() but you already know that.
The joined values (x) are then grouped and the summation of SubValueSum is performed in each group.
Finally, ordering and TOP(10) is applied.
The generated SQL still contains subqueries but I would expect it to more efficient compared to SQL generated by your query:
SELECT TOP (10)
[Project1].[Id] AS [Id],
[Project1].[Name] AS [Name],
[Project1].[C1] AS [C1]
FROM ( SELECT
[GroupBy1].[A1] AS [C1],
[GroupBy1].[K1] AS [Id],
[GroupBy1].[K2] AS [Name]
FROM ( SELECT
[Extent1].[Id] AS [K1],
[Extent1].[Name] AS [K2],
SUM([Extent2].[SubValueSum]) AS [A1]
FROM [dbo].[Project] AS [Extent1]
LEFT OUTER JOIN [dbo].[SubValue] AS [Extent2] ON ([Extent2].[Created] >= #p__linq__0) AND ([Extent2].[Created] <= #p__linq__1) AND ([Extent2].[StatusId] IN (1,2)) AND ([Extent1].[Id] = [Extent2].[ProjectId])
GROUP BY [Extent1].[Id], [Extent1].[Name]
) AS [GroupBy1]
) AS [Project1]
ORDER BY [Project1].[C1] DESC

LINQ to SQL, multiple table join, generated SQL missing 2nd INNER JOIN

Can anyone tell me why the generated SQL does not contain a 2nd INNER JOIN? It seems to have been replaced with a NULL check in the WHERE clause? I'm not clear on why the 2nd INNER JOIN is not in the generated SQL.
C# code:
var cycleList = from o in entities.Orders
join c in entities.Cycles on o.Id equals c.OrderId
join calendar in entities.Calendars on c.CalendarId equals calendar.Id
where o.UnitId == unitId && o.CompanyId == companyId
select c.Id;
Generated SQL:
SELECT
[Extent2].[Id] AS [Id]
FROM [dbo].[Orders] AS [Extent1]
INNER JOIN [dbo].[Cycles] AS [Extent2] ON [Extent1].[Id] = [Extent2].[OrderId]
WHERE ([Extent2].[CalendarId] IS NOT NULL) AND ( CAST( [Extent1].[UnitId] AS int) = #p__linq__0) AND ( CAST( [Extent1].[CompanyId] AS int) = #p__linq__1)
It looks like the query generator is optimizing your query.
Since you are not selecting (or using in your where clause) any fields from the Calendars table in your query, only one join is done between the Orders table and the Cycles table. It's likely faster to check for the non-NULL foreign key than it is to join on a table from which no fields will be used.

Linq : INNER JOIN after WHERE and not before

I have a query like that :
context.Diffusions.Where(x => x.ProgrammeId == programmeID).Include("Chaines").Include("Version").ToList();
The query generated is:
SELECT
[Extent1].[Duree] AS [Duree],
[Extent1].[Id] AS [Id],
[Extent1].[ProgrammeId] AS [ProgrammeId],
[Extent1].[VersionId] AS [VersionId],
[Extent1].[ChaineId] AS [ChaineId],
[Extent1].[Debut] AS [Debut],
[Extent1].[Fin] AS [Fin],
[Extent1].[ReRun] AS [ReRun],
[Extent1].[DateModification] AS [DateModification],
[Extent1].[DateDiffusion] AS [DateDiffusion],
[Extent2].[Id] AS [Id1],
[Extent2].[Nom] AS [Nom],
[Extent2].[Code] AS [Code],
[Extent2].[Abreviation] AS [Abreviation],
[Extent3].[Id] AS [Id2],
[Extent3].[ProgrammeId] AS [ProgrammeId1],
[Extent3].[CleVersion] AS [CleVersion],
[Extent3].[Numero] AS [Numero],
[Extent3].[NumeroModification] AS [NumeroModification],
[Extent3].[VO] AS [VO],
[Extent3].[TitrePresse] AS [TitrePresse],
[Extent3].[Description] AS [Description],
[Extent3].[Remarque] AS [Remarque],
[Extent3].[SousTitre] AS [SousTitre],
[Extent3].[DureeTheorique] AS [DureeTheorique],
[Extent3].[Format] AS [Format],
[Extent3].[Interdit] AS [Interdit],
[Extent3].[LangueId] AS [LangueId],
[Extent3].[TypeCoteDiffusionId] AS [TypeCoteDiffusionId]
FROM [dbo].[Diffusion] AS [Extent1]
INNER JOIN [dbo].[Chaine] AS [Extent2] ON [Extent1].[ChaineId] = [Extent2].[Id]
INNER JOIN [dbo].[Version] AS [Extent3] ON [Extent1].[VersionId] = [Extent3].[Id]
WHERE [Extent1].[ProgrammeId] = 1926475
My problem is that the Table as a lot of entries and it makes an inner join for each entry and then do the "WHERE" so it takes like 6sec.
When I do the query without the include it's instant. I would like to have a linq query that do the "WHERE" and then the "INCLUDE" for each row returned without having to do it manually for each entry (a Programme can have like 1 000 diffusions).
try this Code And use Contains
for example follow code
context.Diffusions.Where(x => x.ProgrammeId == programmeID).Contains("Chaines").Include("Version").ToList();

Categories

Resources