I've seen this issue come up again and again and again. As I read through the answers, the common-ish answer is summarized well by this statement:
any manual join or projection will change the shape of the query and Include will not be used
-Ladislav Mrnka
Ok, so I decided to make a "hello world" sample EF code-first project with DbContext that would test that statement. I created the following query:
var result =
from c in context.Customers.Include(i => i.Addresses)
from a in c.Accounts
where a.ID > 4
select c;
The Include() statement should work, because I'm clearly meeting the requirements: (1) I am not modifying the projection by using an anonymous type and (2) I am not manually handling the joins.
Nevertheless, it doesn't work. The SQL query generated by this query is this:
SELECT
[Extent1].[ID] AS [ID],
[Extent1].[Name] AS [Name]
FROM [dbo].[Customers] AS [Extent1]
INNER JOIN [dbo].[Accounts] AS [Extent2] ON [Extent1].[ID] = [Extent2].[Customer_ID]
WHERE [Extent2].[ID] > 4
If I remove the join and filtering on the Accounts, then the include statement is correctly generated. Why is this happening?
I'm also bothered that EF official documentation doesn't seem to explain the rules of when Include() is or is not honored. Did I simply overlook something?
In a way you are "manually handling a join" by specifying the second from statement with c.Accounts.
Try the below query, whose context is solely based on a Customer entity:
from c in context.Customers.Include( i => i.Addresses )
where c.Accounts.Any( a => a.ID > 4 )
select c
Related
I'm trying to add ROW_NUMBER functionality to EF Core and filter by it.
After adding custom function, it works fine in Select but doesn't work in Where, because of malformed SQL.
Linq:
var query = dbContext.OrderItems
.Select(i => new
{
i.Name,
RowNumber = EF.Functions.RowNumber(i.ProductId)
})
.Where(i => i.RowNumber == 1);
Translates into:
SELECT
i.NAME,
ROW_NUMBER() OVER(ORDER BY i.ProductId) AS RowNumber
FROM
OrderItems AS i
WHERE
ROW_NUMBER() OVER(ORDER BY i.ProductId) = CAST(1 AS bigint)
Error:
Microsoft.Data.SqlClient.SqlException (0x80131904): Windowed functions can only appear in the SELECT or ORDER BY clauses.
To correct this SQL, I need to create a subquery:
SELECT
t.NAME,
t.RowNumber
FROM (
SELECT
i.NAME,
ROW_NUMBER() OVER(ORDER BY i.ProductId) AS RowNumber
FROM
OrderItems AS i
) t
WHERE
t.RowNumber = CAST(1 AS bigint)
I've found an article on how to do this in EF Core 2.
Probably, the easiest way is to introduce a method that gives EF a hint that the previous query should be a sub query. Fortunately, we don't have do much because internally the method AsQueryable (or rather the expression associated with it) does just that.
https://www.thinktecture.com/en/entity-framework-core/making-rownumber-more-useful-in-2-1/
But this approach does nothing in EF Core 3.1
Is there a way to create a subquery?
Looking at the EF Core 3.1 source code, the only way I see to force subquery before applying where filter is to introduce a query limit (i.e. Skip and/or Take).
From the two possible fake limit operators (Skip(0) and Take(int.MaxValue)), looks like choosing the later is better, because the former also requires some ordering (even fake).
So the workaround is to insert
.Take(int.MaxValue)
before .Where(...).
The generated SQL is not perfect (has fake TOP clause), but at least is valid.
I'm running a fairly simple query in Microsoft Entity Framework Core 3.0 that looks like this:
var dbProfile = db.Profiles.Where(x => x.SiteId == Int32.Parse(id))
.Include(x => x.Interests)
.Include(x => x.Pets)
.Include(x => x.Networks)
.Include(x => x.PersonalityTraits)
.SingleOrDefault();
It has worked fine with EF Core 2.2.6 but when upgrading to EF Core 3.0 this query runs instantly for 721 profiles but for at least one profile the query times out:
Microsoft.Data.SqlClient.SqlException: 'Execution Timeout Expired.
The timeout period elapsed prior to completion of the operation or the server is not responding.'
I then logged the actual query sent to the database server:
https://stackoverflow.com/a/58348159/3850405
SELECT [t].[Id], [t].[Age], [t].[City], [t].[Country], [t].[County], [t].[DeactivatedAccount], [t].[Gender], [t].[HasPictures], [t].[LastLogin], [t].[MemberSince], [t].[PresentationUpdated], [t].[ProfileName], [t].[ProfilePictureUrl], [t].[ProfileText], [t].[SiteId], [t].[VisitorsCount], [i].[Id], [i].[Name], [i].[ProfileId], [p0].[Id], [p0].[Description], [p0].[Name], [p0].[ProfileId], [n].[Id], [n].[Name], [n].[NetworkId], [n].[ProfileId], [p1].[Id], [p1].[Name], [p1].[ProfileId]
FROM (
SELECT TOP(2) [p].[Id], [p].[Age], [p].[City], [p].[Country], [p].[County], [p].[DeactivatedAccount], [p].[Gender], [p].[HasPictures], [p].[LastLogin], [p].[MemberSince], [p].[PresentationUpdated], [p].[ProfileName], [p].[ProfilePictureUrl], [p].[ProfileText], [p].[SiteId], [p].[VisitorsCount]
FROM [Profiles] AS [p]
WHERE ([p].[SiteId] = '123') AND '123' IS NOT NULL
) AS [t]
LEFT JOIN [Interests] AS [i] ON [t].[Id] = [i].[ProfileId]
LEFT JOIN [Pets] AS [p0] ON [t].[Id] = [p0].[ProfileId]
LEFT JOIN [Networks] AS [n] ON [t].[Id] = [n].[ProfileId]
LEFT JOIN [PersonalityTraits] AS [p1] ON [t].[Id] = [p1].[ProfileId]
ORDER BY [t].[Id], [i].[Id], [p0].[Id], [n].[Id], [p1].[Id]
I then tried to run the actual SQL in SSMS and ended up with the following error:
Msg 1105, Level 17, State 2, Line 1
Could not allocate space for object 'dbo.SORT temporary run storage: 140737692565504' in database 'tempdb' because the 'PRIMARY' filegroup is full. Create disk space by deleting unneeded files, dropping objects in the filegroup, adding additional files to the filegroup, or setting autogrowth on for existing files in the filegroup.
My tempdb had now completely filled the database disk. I have tried 10 other ids and the same query runs instantly.
I tried to shrink the tempdb again with the command DBCC SHRINKDATABASE(tempdb, 10); and it worked fine. However when I tried running the queries again the same thing happened. If I skip including tables everything works out fine. What could be the trouble here and how do I fix it? Is this a known bug in EF Core 3.0? Looking at the query in EF Core 2.2.6 it performs individual selects like this for all tables:
SELECT [x.Interests].[Id], [x.Interests].[Name], [x.Interests].[ProfileId]
FROM [Interests] AS [x.Interests]
INNER JOIN (
SELECT TOP(1) [x0].[Id]
FROM [Profiles] AS [x0]
WHERE [x0].[SiteId] = '123'
ORDER BY [x0].[Id]
) AS [t] ON [x.Interests].[ProfileId] = [t].[Id]
ORDER BY [t].[Id]
This is a documented breaking change in EF Core 3: Eager loading of related entities now happens in a single query
The new behavior is similar to the query generation in EF6, where multiple includes can create very large and expensive queries. These queries can also fail due to timeouts, query plan generation costs, or query execution resource exhaustion.
So just like in EF6 you need to refrain from including multiple, unrelated, entity include paths, as these create very expensive queries.
Instead you can use Lazy Loading, or explicitly load parts of the entity graph in separate queries, and let the Change Tracker fix-up the Navigation Properties.
EF 5 has added an option to turn off the one-big-query generation called Split Queries.
Addition to #DavidBrowne-Microsoft answer.
Suppose we have following entities & navigations in the model
Customer
Customer.Address (reference nav)
Customer.Orders (collection nav)
Order.OrderDetails (collection nav)
Order.OrderDiscount (reference nav)
You would need to rewrite only for collection navigations, reference navigations can be part of same query.
var baseQuery = db.Customers.Include(c => c.Address).Where(c => c.CustomerName == "John");
var result = baseQuery.ToList(); // Or async method, If doing FirstOrDefault, add Take(1) to base query
baseQuery.Include(c => c.Orders).ThenInclude(o => o.OrderDiscount).SelectMany(c => c.Orders).Load();
baseQuery.SelectMany(c => c.Orders).SelectMany(o => o.OrderDetails).Load();
This will generate 3 queries to server. It would be slightly more optimized SQL then what EF Core 2.2 generated. And StateManager will fix up navigations. It also avoids duplicating any records coming from server to client.
Generated SQL:
// Customer Include Address
SELECT [c].[Id], [c].[CustomerName], [a].[Id], [a].[City], [a].[CustomerId]
FROM [Customers] AS [c]
LEFT JOIN [Address] AS [a] ON [c].[Id] = [a].[CustomerId]
WHERE ([c].[CustomerName] = N'John') AND [c].[CustomerName] IS NOT NULL
// Order Include Order discount
SELECT [o].[Id], [o].[CustomerId], [o].[OrderDate], [o0].[Id], [o0].[Discount], [o0].[OrderId]
FROM [Customers] AS [c]
INNER JOIN [Order] AS [o] ON [c].[Id] = [o].[CustomerId]
LEFT JOIN [OrderDiscount] AS [o0] ON [o].[Id] = [o0].[OrderId]
WHERE ([c].[CustomerName] = N'John') AND [c].[CustomerName] IS NOT NULL
// OrderDetails
SELECT [o0].[Id], [o0].[OrderId], [o0].[ProductName]
FROM [Customers] AS [c]
INNER JOIN [Order] AS [o] ON [c].[Id] = [o].[CustomerId]
INNER JOIN [OrderDetail] AS [o0] ON [o].[Id] = [o0].[OrderId]
WHERE ([c].[CustomerName] = N'John') AND [c].[CustomerName] IS NOT NULL
Edit: If you do not have CLR navigation to use in SelectMany then you can use EF.Property to reference the collection navigation.
Source: https://github.com/aspnet/EntityFrameworkCore/issues/18022#issuecomment-537219137
Yo need to change your code like this, then it will generate more query and avoids time out
var dbProfile = db.Profiles.SingleOrDefault(x => x.SiteId == Int32.Parse(id));
dbProfile.Include(x => x.Interests).Load();
dbProfile.Include(x => x.Pets).Load();
dbProfile.Include(x => x.Networks).Load();
dbProfile.Include(x => x.PersonalityTraits).Load();
remember that query should executed in Tracking mode.
if it doesn't load childs you can add asTracking like this:
db.Profiles.AsTracking().Where(........
I'm using Entity framework Core 2.1.4 and I wrote an basic example query from C# like below.
var myList = context.HastaAdres.OrderBy(p => p.ID).Take(20).GroupBy(p => p.IlKodu).Select(d => d.FirstOrDefault()).Select(p => p.ID).ToList();
But in SQL profiler, running code like below. There is no group by in SQL and very different from classic entity framework. So, also result is different. I need only one column as a result. But first query, return all columns. Also row count is different from second query.
SQL Generated By Entity framework Core
SELECT [t].[ID], [t].[IlKodu], [t].[AcikAdres], [t].[BucakAdi], [t].[BucakKodu], [t].[BulvarKodu], [t].[CaddeKodu], [t].[CreatedBy], [t].[CreatedDate]
FROM (
SELECT TOP(20) [p].[ID], [p].[IlKodu], [p].[AcikAdres], [p].[BucakAdi], [p].[BucakKodu], [p].[BulvarKodu], [p].[CaddeKodu], [p].[CreatedBy], [p].[CreatedDate]
FROM [Ortak].[HastaAdres] AS [p]
ORDER BY [p].[ID]
) AS [t]
ORDER BY [t].[IlKodu]
When I tried this method in Entity Framework, it's generating perfect code.
SQL Generated By Entity framework
SELECT
(SELECT TOP (1)
[Limit2].[ID] AS [ID]
FROM ( SELECT TOP (20) [Extent2].[ID] AS [ID], [Extent2].[IlKodu] AS [IlKodu]
FROM [Ortak].[HastaAdres] AS [Extent2]
ORDER BY [Extent2].[ID] ASC
) AS [Limit2]
WHERE ([Distinct1].[IlKodu] = [Limit2].[IlKodu]) OR (([Distinct1].[IlKodu] IS NULL) AND ([Limit2].[IlKodu] IS NULL))) AS [C1]
FROM ( SELECT DISTINCT [distinct].[IlKodu] AS [IlKodu]
FROM ( SELECT TOP (20)
[Extent1].[IlKodu] AS [IlKodu]
FROM [Ortak].[HastaAdres] AS [Extent1]
ORDER BY [Extent1].[ID] ASC
) AS [distinct]
) AS [Distinct1]
What can be the reason of this situation?
I learnt that EF Core not supporting database level Group By and take element. But, by EF Core 2.1, came support for group by then sum,min,max,average at databse level.(You can see from this link https://learn.microsoft.com/en-us/ef/core/what-is-new/ef-core-2.1#linq-groupby-translation from commented by #jpgrassi)
So, I changed my query code as take min value and then get that row from database. So, this solved my isuue. But, I hope, most recent time, Microsoft supports group by at database level.
Why does the Entity Framework generate nested SQL queries?
I have this code
var db = new Context();
var result = db.Network.Where(x => x.ServerID == serverId)
.OrderBy(x=> x.StartTime)
.Take(limit);
Which generates this! (Note the double select statement)
SELECT
`Project1`.`Id`,
`Project1`.`ServerID`,
`Project1`.`EventId`,
`Project1`.`StartTime`
FROM (SELECT
`Extent1`.`Id`,
`Extent1`.`ServerID`,
`Extent1`.`EventId`,
`Extent1`.`StartTime`
FROM `Networkes` AS `Extent1`
WHERE `Extent1`.`ServerID` = #p__linq__0) AS `Project1`
ORDER BY
`Project1`.`StartTime` DESC LIMIT 5
What should I change so that it results in one select statement? I'm using MySQL and Entity Framework with Code First.
Update
I have the same result regardless of the type of the parameter passed to the OrderBy() method.
Update 2: Timed
Total Time (hh:mm:ss.ms) 05:34:13.000
Average Time (hh:mm:ss.ms) 25:42.000
Max Time (hh:mm:ss.ms) 51:54.000
Count 13
First Seen Nov 6, 12 19:48:19
Last Seen Nov 6, 12 20:40:22
Raw query:
SELECT `Project?`.`Id`, `Project?`.`ServerID`, `Project?`.`EventId`, `Project?`.`StartTime` FROM (SELECT `Extent?`.`Id`, `Extent?`.`ServerID`, `Extent?`.`EventId`, `Extent?`.`StartTime`, FROM `Network` AS `Extent?` WHERE `Extent?`.`ServerID` = ?) AS `Project?` ORDER BY `Project?`.`Starttime` DESC LIMIT ?
I used a program to take snapshots from the current process in MySQL.
Other queries were executed at the same time, but when I change it to just one SELECT statement, it NEVER goes over one second. Maybe I have something else that's going on; I'm asking 'cause I'm not so into DBs...
Update 3: The explain statement
The Entity Framework generated
'1', 'PRIMARY', '<derived2>', 'ALL', NULL, NULL, NULL, NULL, '46', 'Using filesort'
'2', 'DERIVED', 'Extent?', 'ref', 'serveridneventid,serverid', 'serveridneventid', '109', '', '45', 'Using where'
One liner
'1', 'SIMPLE', 'network', 'ref', 'serveridneventid,serverid', 'serveridneventid', '109', 'const', '45', 'Using where; Using filesort'
This is from my QA environment, so the timing I pasted above is not related to the rowcount explain statements. I think that there are about 500,000 records that match one server ID.
Solution
I switched from MySQL to SQL Server. I don't want to end up completely rewriting the application layer.
It's the easiest way to build the query logically from the expression tree. Usually the performance will not be an issue. If you are having performance issues you can try something like this to get the entities back:
var results = db.ExecuteStoreQuery<Network>(
"SELECT Id, ServerID, EventId, StartTime FROM Network WHERE ServerID = #ID",
serverId);
results = results.OrderBy(x=> x.StartTime).Take(limit);
My initial impression was that doing it this way would actually be more efficient, although in testing against a MSSQL server, I got <1 second responses regardless.
With a single select statement, it sorts all the records (Order By), and then filters them to the set you want to see (Where), and then takes the top 5 (Limit 5 or, for me, Top 5). On a large table, the sort takes a significant portion of the time. With a nested statement, it first filters the records down to a subset, and only then does the expensive sort operation on it.
Edit: I did test this, but I realized I had an error in my test which invalidated it. Test results removed.
Why does Entity Framework produce a nested query? The simple answer is because Entity Framework breaks your query expression down into an expression tree and then uses that expression tree to build your query. A tree naturally generates nested query expressions (i.e. a child node generates a query and a parent node generates a query on that query).
Why doesn't Entity Framework simplify the query down and write it as you would? The simple answer is because there is a limited amount of work that can go into the query generation engine, and while it's better now than it was in earlier versions it's not perfect and probably never will be.
All that said there should be no significant speed difference between the query you would write by hand and the query EF generated in this case. The database is clever enough to generate an execution plan that applies the WHERE clause first in either case.
If you want to get the EF to generate the query without the subselect, use a constant within the query, not a variable.
I have previously created my own .Where and all other LINQ methods that first traverse the expression tree and convert all variables, method calls etc. into Expression.Constant. It was done just because of this issue in Entity Framework...
I just stumbled upon this post because I suffer from the same problem. I already spend days tracking this down and it it is just a poor query generation in mysql.
I already filed a bug at mysql.com http://bugs.mysql.com/bug.php?id=75272
To summarize the problem:
This simple query
context.products
.Include(x => x.category)
.Take(10)
.ToList();
gets translated into
SELECT
`Limit1`.`C1`,
`Limit1`.`id`,
`Limit1`.`name`,
`Limit1`.`category_id`,
`Limit1`.`id1`,
`Limit1`.`name1`
FROM (SELECT
`Extent1`.`id`,
`Extent1`.`name`,
`Extent1`.`category_id`,
`Extent2`.`id` AS `id1`,
`Extent2`.`name` AS `name1`,
1 AS `C1`
FROM `products` AS `Extent1` INNER JOIN `categories` AS `Extent2` ON `Extent1`.`category_id` = `Extent2`.`id` LIMIT 10) AS `Limit1`
and performs pretty well. Anyway, the outer query is pretty much useless. Now If I add an OrderBy
context.products
.Include(x => x.category)
.OrderBy(x => x.id)
.Take(10)
.ToList();
the query changes to
SELECT
`Project1`.`C1`,
`Project1`.`id`,
`Project1`.`name`,
`Project1`.`category_id`,
`Project1`.`id1`,
`Project1`.`name1`
FROM (SELECT
`Extent1`.`id`,
`Extent1`.`name`,
`Extent1`.`category_id`,
`Extent2`.`id` AS `id1`,
`Extent2`.`name` AS `name1`,
1 AS `C1`
FROM `products` AS `Extent1` INNER JOIN `categories` AS `Extent2` ON `Extent1`.`category_id` = `Extent2`.`id`) AS `Project1`
ORDER BY
`Project1`.`id` ASC LIMIT 10
Which is bad because the order by is in the outer query. Theat means MySQL has to pull every record in order to perform an orderby which results in using filesort
I verified that SQL Server (Comapact at least) does not generate nested queries for the same code
SELECT TOP (10)
[Extent1].[id] AS [id],
[Extent1].[name] AS [name],
[Extent1].[category_id] AS [category_id],
[Extent2].[id] AS [id1],
[Extent2].[name] AS [name1],
FROM [products] AS [Extent1]
LEFT OUTER JOIN [categories] AS [Extent2] ON [Extent1].[category_id] = [Extent2].[id]
ORDER BY [Extent1].[id] ASC
Actually the queries generated by Entity Framework are few ugly, less than LINQ 2 SQL but still ugly.
However, very probably you database engine will make the desired execution plan, and the query will run smoothly.
This question already has answers here:
Simple Linq query has duplicated join against same table?
(3 answers)
Closed 3 years ago.
I have a product entity, which has 0 or 1 "BestSeller" entities. For some reason when I say:
db.Products.OrderBy(p => p.BestSeller.rating).ToList();
the SQL I get has an "extra" outer join (below). And if I add on a second 0 or 1 relation ship, and order by both, then I get 4 outer joins. It seems like each such entity is producing 2 outer joins rather than one. LINQ to SQL behaves exactly as you'd expect, with no extra join.
Has anyone else experienced this, or know how to fix it?
SELECT
[Extent1].[id] AS [id],
[Extent1].[ProductName] AS [ProductName]
FROM [dbo].[Products] AS [Extent1]
LEFT OUTER JOIN [dbo].[BestSeller] AS [Extent2] ON [Extent1].[id] = [Extent2].[id]
LEFT OUTER JOIN [dbo].[BestSeller] AS [Extent3] ON [Extent2].[id] = [Extent3].[id]
ORDER BY [Extent3].[rating] ASC
That extra outer join does seem quite superfluous. I think it's best to contact the entity framework design team. They may know if it's a bug and see if it something that needs to be resolved in a next version. You can contact them at Link
It may be a bug, but it seems like such a simple example that it is strange that the bug has not been caught and fixed.
Could you check your EF model.
Has the BestSeller table been added twice, or is there a duplication in the relationship between the tables.