Too Many Left Outer Joins in Entity Framework 4? [duplicate] - c#

This question already has answers here:
Simple Linq query has duplicated join against same table?
(3 answers)
Closed 3 years ago.
I have a product entity, which has 0 or 1 "BestSeller" entities. For some reason when I say:
db.Products.OrderBy(p => p.BestSeller.rating).ToList();
the SQL I get has an "extra" outer join (below). And if I add on a second 0 or 1 relation ship, and order by both, then I get 4 outer joins. It seems like each such entity is producing 2 outer joins rather than one. LINQ to SQL behaves exactly as you'd expect, with no extra join.
Has anyone else experienced this, or know how to fix it?
SELECT
[Extent1].[id] AS [id],
[Extent1].[ProductName] AS [ProductName]
FROM [dbo].[Products] AS [Extent1]
LEFT OUTER JOIN [dbo].[BestSeller] AS [Extent2] ON [Extent1].[id] = [Extent2].[id]
LEFT OUTER JOIN [dbo].[BestSeller] AS [Extent3] ON [Extent2].[id] = [Extent3].[id]
ORDER BY [Extent3].[rating] ASC

That extra outer join does seem quite superfluous. I think it's best to contact the entity framework design team. They may know if it's a bug and see if it something that needs to be resolved in a next version. You can contact them at Link

It may be a bug, but it seems like such a simple example that it is strange that the bug has not been caught and fixed.
Could you check your EF model.
Has the BestSeller table been added twice, or is there a duplication in the relationship between the tables.

Related

Entity Framework: What would cause slow queries after adding a table?

I have an entity framework 5 project hooked up to a SQLite database.
I did the model first approach and I was able to query 30,000 records from Table_A in roughly 3 seconds.
Now all I did was a another Table_B which has 0 to 1 references to a parent record from Table_A. It takes over 3 minutes to run the SAME query on Table_A. Table_B has ZERO records in it.
It's also worth noting that the EDMX added Navigation Properties to Table_A and Table_B. However it only added the foreign key column to Table_B. What would cause Entity Framework to slow down that much? When I revert my changes back to the old model, it runs fast.
Update
For reference the query is a standard linq to sql query.
var matches = Table_A.Where(it => it.UserName == "Waldo" || it.TimeStamp < oneMonthAgo);
I just ran the ToTraceString() to find the generated SQL query that this guy suggested in his answer here:
Turns out Entity Framework tried to be "smart" anticipating that I would use data from the child record. This is actually pretty cool! Just slows down my query a bit, so I might find a faster workaround.
Please note that this query is identical in LINQ syntax. This is just the underlying SQL that is generated as soon as I added another Table into the EDMX diagram.
Here is the FAST query: (abbreviated for clarity)
SELECT *
FROM [Table_A] AS [Extent1]
INNER JOIN (SELECT
[Extent2].[OID] AS [K1],
[Extent2].[C_Column1] AS [K2],
Max([Extent2].[Id]) AS [A1]
FROM [Table_A] AS [Extent2]
GROUP BY [Extent2].[OID], [Extent2].[C_Column1] ) AS [GroupBy1] ON [Extent1].[Id] =
[GroupBy1].[A1]
INNER JOIN [OtherExistingTable] AS [Extent3] ON [Extent1].[C_Column1] = [Extent3].[Id]
After adding Table_B this was the new query that was generated which made things much much slower.
SELECT *
FROM [Table_A] AS [Extent1]
LEFT OUTER JOIN [Table_B] AS [Extent2] ON [Extent1].[Id] = [Extent2].[Table_B_ForeignKey_To_Table_A]
INNER JOIN (SELECT
[Join2].[K1] AS [K1],
[Join2].[K2] AS [K2],
Max([Join2].[A1]) AS [A1]
FROM ( SELECT
[Extent3].[OID] AS [K1],
[Extent3].[C_Column1] AS [K2],
[Extent3].[Id] AS [A1]
FROM [Table_A] AS [Extent3]
LEFT OUTER JOIN [Table_B] AS [Extent4] ON [Extent3].[Id] = [Extent4].[Table_B_ForeignKey_To_Table_A]
) AS [Join2]
GROUP BY [K1], [K2] ) AS [GroupBy1] ON [Extent1].[Id] = [GroupBy1].[A1]
INNER JOIN [FeatureServices] AS [Extent5] ON [Extent1].[C_Column1] = [Extent5].[Id]

EF Ignores Eager-Loading "Include" statement

I've seen this issue come up again and again and again. As I read through the answers, the common-ish answer is summarized well by this statement:
any manual join or projection will change the shape of the query and Include will not be used
-Ladislav Mrnka
Ok, so I decided to make a "hello world" sample EF code-first project with DbContext that would test that statement. I created the following query:
var result =
from c in context.Customers.Include(i => i.Addresses)
from a in c.Accounts
where a.ID > 4
select c;
The Include() statement should work, because I'm clearly meeting the requirements: (1) I am not modifying the projection by using an anonymous type and (2) I am not manually handling the joins.
Nevertheless, it doesn't work. The SQL query generated by this query is this:
SELECT
[Extent1].[ID] AS [ID],
[Extent1].[Name] AS [Name]
FROM [dbo].[Customers] AS [Extent1]
INNER JOIN [dbo].[Accounts] AS [Extent2] ON [Extent1].[ID] = [Extent2].[Customer_ID]
WHERE [Extent2].[ID] > 4
If I remove the join and filtering on the Accounts, then the include statement is correctly generated. Why is this happening?
I'm also bothered that EF official documentation doesn't seem to explain the rules of when Include() is or is not honored. Did I simply overlook something?
In a way you are "manually handling a join" by specifying the second from statement with c.Accounts.
Try the below query, whose context is solely based on a Customer entity:
from c in context.Customers.Include( i => i.Addresses )
where c.Accounts.Any( a => a.ID > 4 )
select c

Why does the Entity Framework generate nested SQL queries?

Why does the Entity Framework generate nested SQL queries?
I have this code
var db = new Context();
var result = db.Network.Where(x => x.ServerID == serverId)
.OrderBy(x=> x.StartTime)
.Take(limit);
Which generates this! (Note the double select statement)
SELECT
`Project1`.`Id`,
`Project1`.`ServerID`,
`Project1`.`EventId`,
`Project1`.`StartTime`
FROM (SELECT
`Extent1`.`Id`,
`Extent1`.`ServerID`,
`Extent1`.`EventId`,
`Extent1`.`StartTime`
FROM `Networkes` AS `Extent1`
WHERE `Extent1`.`ServerID` = #p__linq__0) AS `Project1`
ORDER BY
`Project1`.`StartTime` DESC LIMIT 5
What should I change so that it results in one select statement? I'm using MySQL and Entity Framework with Code First.
Update
I have the same result regardless of the type of the parameter passed to the OrderBy() method.
Update 2: Timed
Total Time (hh:mm:ss.ms) 05:34:13.000
Average Time (hh:mm:ss.ms) 25:42.000
Max Time (hh:mm:ss.ms) 51:54.000
Count 13
First Seen Nov 6, 12 19:48:19
Last Seen Nov 6, 12 20:40:22
Raw query:
SELECT `Project?`.`Id`, `Project?`.`ServerID`, `Project?`.`EventId`, `Project?`.`StartTime` FROM (SELECT `Extent?`.`Id`, `Extent?`.`ServerID`, `Extent?`.`EventId`, `Extent?`.`StartTime`, FROM `Network` AS `Extent?` WHERE `Extent?`.`ServerID` = ?) AS `Project?` ORDER BY `Project?`.`Starttime` DESC LIMIT ?
I used a program to take snapshots from the current process in MySQL.
Other queries were executed at the same time, but when I change it to just one SELECT statement, it NEVER goes over one second. Maybe I have something else that's going on; I'm asking 'cause I'm not so into DBs...
Update 3: The explain statement
The Entity Framework generated
'1', 'PRIMARY', '<derived2>', 'ALL', NULL, NULL, NULL, NULL, '46', 'Using filesort'
'2', 'DERIVED', 'Extent?', 'ref', 'serveridneventid,serverid', 'serveridneventid', '109', '', '45', 'Using where'
One liner
'1', 'SIMPLE', 'network', 'ref', 'serveridneventid,serverid', 'serveridneventid', '109', 'const', '45', 'Using where; Using filesort'
This is from my QA environment, so the timing I pasted above is not related to the rowcount explain statements. I think that there are about 500,000 records that match one server ID.
Solution
I switched from MySQL to SQL Server. I don't want to end up completely rewriting the application layer.
It's the easiest way to build the query logically from the expression tree. Usually the performance will not be an issue. If you are having performance issues you can try something like this to get the entities back:
var results = db.ExecuteStoreQuery<Network>(
"SELECT Id, ServerID, EventId, StartTime FROM Network WHERE ServerID = #ID",
serverId);
results = results.OrderBy(x=> x.StartTime).Take(limit);
My initial impression was that doing it this way would actually be more efficient, although in testing against a MSSQL server, I got <1 second responses regardless.
With a single select statement, it sorts all the records (Order By), and then filters them to the set you want to see (Where), and then takes the top 5 (Limit 5 or, for me, Top 5). On a large table, the sort takes a significant portion of the time. With a nested statement, it first filters the records down to a subset, and only then does the expensive sort operation on it.
Edit: I did test this, but I realized I had an error in my test which invalidated it. Test results removed.
Why does Entity Framework produce a nested query? The simple answer is because Entity Framework breaks your query expression down into an expression tree and then uses that expression tree to build your query. A tree naturally generates nested query expressions (i.e. a child node generates a query and a parent node generates a query on that query).
Why doesn't Entity Framework simplify the query down and write it as you would? The simple answer is because there is a limited amount of work that can go into the query generation engine, and while it's better now than it was in earlier versions it's not perfect and probably never will be.
All that said there should be no significant speed difference between the query you would write by hand and the query EF generated in this case. The database is clever enough to generate an execution plan that applies the WHERE clause first in either case.
If you want to get the EF to generate the query without the subselect, use a constant within the query, not a variable.
I have previously created my own .Where and all other LINQ methods that first traverse the expression tree and convert all variables, method calls etc. into Expression.Constant. It was done just because of this issue in Entity Framework...
I just stumbled upon this post because I suffer from the same problem. I already spend days tracking this down and it it is just a poor query generation in mysql.
I already filed a bug at mysql.com http://bugs.mysql.com/bug.php?id=75272
To summarize the problem:
This simple query
context.products
.Include(x => x.category)
.Take(10)
.ToList();
gets translated into
SELECT
`Limit1`.`C1`,
`Limit1`.`id`,
`Limit1`.`name`,
`Limit1`.`category_id`,
`Limit1`.`id1`,
`Limit1`.`name1`
FROM (SELECT
`Extent1`.`id`,
`Extent1`.`name`,
`Extent1`.`category_id`,
`Extent2`.`id` AS `id1`,
`Extent2`.`name` AS `name1`,
1 AS `C1`
FROM `products` AS `Extent1` INNER JOIN `categories` AS `Extent2` ON `Extent1`.`category_id` = `Extent2`.`id` LIMIT 10) AS `Limit1`
and performs pretty well. Anyway, the outer query is pretty much useless. Now If I add an OrderBy
context.products
.Include(x => x.category)
.OrderBy(x => x.id)
.Take(10)
.ToList();
the query changes to
SELECT
`Project1`.`C1`,
`Project1`.`id`,
`Project1`.`name`,
`Project1`.`category_id`,
`Project1`.`id1`,
`Project1`.`name1`
FROM (SELECT
`Extent1`.`id`,
`Extent1`.`name`,
`Extent1`.`category_id`,
`Extent2`.`id` AS `id1`,
`Extent2`.`name` AS `name1`,
1 AS `C1`
FROM `products` AS `Extent1` INNER JOIN `categories` AS `Extent2` ON `Extent1`.`category_id` = `Extent2`.`id`) AS `Project1`
ORDER BY
`Project1`.`id` ASC LIMIT 10
Which is bad because the order by is in the outer query. Theat means MySQL has to pull every record in order to perform an orderby which results in using filesort
I verified that SQL Server (Comapact at least) does not generate nested queries for the same code
SELECT TOP (10)
[Extent1].[id] AS [id],
[Extent1].[name] AS [name],
[Extent1].[category_id] AS [category_id],
[Extent2].[id] AS [id1],
[Extent2].[name] AS [name1],
FROM [products] AS [Extent1]
LEFT OUTER JOIN [categories] AS [Extent2] ON [Extent1].[category_id] = [Extent2].[id]
ORDER BY [Extent1].[id] ASC
Actually the queries generated by Entity Framework are few ugly, less than LINQ 2 SQL but still ugly.
However, very probably you database engine will make the desired execution plan, and the query will run smoothly.

Entity Framework Inefficient generated queries with includes and order bys

1) First issue I'm having is if you do an include and then an order by the SQL generated generates an inner join and an outer join
var query = from l in Lead.Include("Contact")
orderby l.Contact.FirstName
select l;
Which generates the following inner join and outer join on the same table
INNER JOIN [dbo].[Contact] AS [Extent2]
ON [Extent1].[ContactId] = [Extent2].[ContactId]
LEFT OUTER JOIN [dbo].[Contact] AS [Extent3]
ON [Extent1].[ContactId] = [Extent3]. [ContactId]
ORDER BY [Extent2].[FirstName] ASC
Which makes for a slightly inefficient query
2) if I do multiple includes it always does the second one as an outer join so like
Lead.Include("OneToOne").Include("OtherOneToOne") <- in this scenario
OtherOneToOne is an outer
join and OneToOne is an inner
join
Lead.Include("OtherOneToOne").Include("OneToOne") <- in this scenario
OneToOne is an outer join
and OtherOneToOne is an
inner join
is that just how it works?
I found another post where someone was talking about this and they said that it was fixed in the June CTP release
http://blogs.msdn.com/b/adonet/archive/2011/06/30/announcing-the-microsoft-entity-framework-june-2011-ctp.aspx
But I installed and setup that to be used and it still doesn't work..
alright it won't let me answer my own question
so
EDIT:
Alright I setup an isolated test and found that http://blogs.msdn.com/b/adonet/archive/2011/06/30/announcing-the-microsoft-entity-framework-june-2011-ctp.aspx seems to have resolved these
But since I'm using RIA I'm out of luck since the june ctp doesn't support RIA :-/
A solution is to do the include yourself:
var query = from l in Lead
select new { l, l.Contact } into row
orderby row.Contact.FirstName
select row;

Entity Framework many-to-many relationship include extremely slow

I have an Entity Framework 4 model, with 2 entities containing many-to-many relationship, so 3 tables, [Q], [P] and [Q2P]-cross table. Running code like:
context.Q.Include("P");
Results in long time wait (I waited like 5 mins then aborted it). Then I checked SQL generated and found this:
SELECT *
FROM ( SELECT *
FROM [Q] AS [Extent1]
LEFT OUTER JOIN (SELECT *, CASE WHEN ([Join1].[Id] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C2]
FROM [Q_P] AS [Extent2]
INNER JOIN [P] AS [Extent3] ON [Extent3].[Id] = [Extent2].[Id] ) AS [Join1] ON [Extent1].[Id] = [Join1].[Id]
) AS [Project1]
ORDER BY [Project1].[Id] ASC, [Project1].[C2] ASC
I can't hide my suprise, WTF is this? The usual many-to-many SQL query
select * from [q2p]
join [q] on qId=q.Id
join [p] on pId=p.Id
executes in less than 1ms, while EF query executes forever.
Yep, that's not a secret that it takes long, please vote on the connection I opened about a year ago.
However, 5 minutes is something it's definitely not supposed to take.
Try separating the execution from the query generation, and use ToTraceString to see how long does it take to determine query generation time.
Eager-loading is not the great deal in current version, they said they're planning toreduce the performance cost in future.
Anyway, what you could do is use Stored Procedures or create your own ObjectQueries.
See:
- http://msdn.microsoft.com/en-us/library/bb896241.aspx
- http://msdn.microsoft.com/en-us/library/bb896238.aspx
I have switched to Nhibernate.

Categories

Resources