Entity Framework many-to-many relationship include extremely slow

Entity Framework many-to-many relationship include extremely slow - c#

I have an Entity Framework 4 model, with 2 entities containing many-to-many relationship, so 3 tables, [Q], [P] and [Q2P]-cross table. Running code like:
context.Q.Include("P");
Results in long time wait (I waited like 5 mins then aborted it). Then I checked SQL generated and found this:
SELECT *
FROM ( SELECT *
FROM [Q] AS [Extent1]
LEFT OUTER JOIN (SELECT *, CASE WHEN ([Join1].[Id] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C2]
FROM [Q_P] AS [Extent2]
INNER JOIN [P] AS [Extent3] ON [Extent3].[Id] = [Extent2].[Id] ) AS [Join1] ON [Extent1].[Id] = [Join1].[Id]
) AS [Project1]
ORDER BY [Project1].[Id] ASC, [Project1].[C2] ASC
I can't hide my suprise, WTF is this? The usual many-to-many SQL query
select * from [q2p]
join [q] on qId=q.Id
join [p] on pId=p.Id
executes in less than 1ms, while EF query executes forever.

Yep, that's not a secret that it takes long, please vote on the connection I opened about a year ago.
However, 5 minutes is something it's definitely not supposed to take.
Try separating the execution from the query generation, and use ToTraceString to see how long does it take to determine query generation time.
Eager-loading is not the great deal in current version, they said they're planning toreduce the performance cost in future.
Anyway, what you could do is use Stored Procedures or create your own ObjectQueries.
See:
- http://msdn.microsoft.com/en-us/library/bb896241.aspx
- http://msdn.microsoft.com/en-us/library/bb896238.aspx

I have switched to Nhibernate.

Related

Entity Framework: What would cause slow queries after adding a table?

I have an entity framework 5 project hooked up to a SQLite database.
I did the model first approach and I was able to query 30,000 records from Table_A in roughly 3 seconds.
Now all I did was a another Table_B which has 0 to 1 references to a parent record from Table_A. It takes over 3 minutes to run the SAME query on Table_A. Table_B has ZERO records in it.
It's also worth noting that the EDMX added Navigation Properties to Table_A and Table_B. However it only added the foreign key column to Table_B. What would cause Entity Framework to slow down that much? When I revert my changes back to the old model, it runs fast.
Update
For reference the query is a standard linq to sql query.
var matches = Table_A.Where(it => it.UserName == "Waldo" || it.TimeStamp < oneMonthAgo);

I just ran the ToTraceString() to find the generated SQL query that this guy suggested in his answer here:
Turns out Entity Framework tried to be "smart" anticipating that I would use data from the child record. This is actually pretty cool! Just slows down my query a bit, so I might find a faster workaround.
Please note that this query is identical in LINQ syntax. This is just the underlying SQL that is generated as soon as I added another Table into the EDMX diagram.
Here is the FAST query: (abbreviated for clarity)
SELECT *
FROM [Table_A] AS [Extent1]
INNER JOIN (SELECT
[Extent2].[OID] AS [K1],
[Extent2].[C_Column1] AS [K2],
Max([Extent2].[Id]) AS [A1]
FROM [Table_A] AS [Extent2]
GROUP BY [Extent2].[OID], [Extent2].[C_Column1] ) AS [GroupBy1] ON [Extent1].[Id] =
[GroupBy1].[A1]
INNER JOIN [OtherExistingTable] AS [Extent3] ON [Extent1].[C_Column1] = [Extent3].[Id]
After adding Table_B this was the new query that was generated which made things much much slower.
SELECT *
FROM [Table_A] AS [Extent1]
LEFT OUTER JOIN [Table_B] AS [Extent2] ON [Extent1].[Id] = [Extent2].[Table_B_ForeignKey_To_Table_A]
INNER JOIN (SELECT
[Join2].[K1] AS [K1],
[Join2].[K2] AS [K2],
Max([Join2].[A1]) AS [A1]
FROM ( SELECT
[Extent3].[OID] AS [K1],
[Extent3].[C_Column1] AS [K2],
[Extent3].[Id] AS [A1]
FROM [Table_A] AS [Extent3]
LEFT OUTER JOIN [Table_B] AS [Extent4] ON [Extent3].[Id] = [Extent4].[Table_B_ForeignKey_To_Table_A]
) AS [Join2]
GROUP BY [K1], [K2] ) AS [GroupBy1] ON [Extent1].[Id] = [GroupBy1].[A1]
INNER JOIN [FeatureServices] AS [Extent5] ON [Extent1].[C_Column1] = [Extent5].[Id]

Trouble with Entity Framework Linq Query: runs instantly in SSMS and 8-10s in EF LINQ

I was given the following query in SQL (variable names obfuscated) which is trying to get the values (Ch, Wa, Bu, Hi) resulting in the greatest number (cnt) of Pi entries.
select top 1 Pi.Ch, Pi.Wa, Pi.Bu, Pi.Hi, COUNT(1) as cnt
from Product, Si, Pi
where Product.Id = Si.ProductId
and Si.Id = Pi.SiId
and Product.Code = #CodeParameter
group by Pi.Ch, Pi.Wa, Pi.Bu, Pi.Hi
order by cnt desc
which runs instantly in SQL management studio on our production database. I've successfully written the code a few ways in C# LINQ and Entity Framework, but every way the code runs in 8 - 10 seconds. One attempt is the following code (doing it without the print as one call gives the same performance results):
using(var context = new MyEntities()){
var query = context.Products
.Where(p => p.Code == codeFromFunctionArgument)
.Join(context.Sis, p => p.Id, s => s.ProductId, (p, s) => new { sId = s.Id })
.Join(context.Pis, ps => ps.sId, pi => pi.SiId, (ps, pi) => new {pi.Ch, pic.Wa, pic.Bu, pic.Hi})
.GroupBy(
pi => pi,
(k, g) => new MostPisResult()
{
Ch = k.Ch,
Wa = k.Wa,
Bu = k.Bu,
Hi = k.Hi,
Count = g.Count()
}
)
.OrderByDescending(x => x.Count);
Console.WriteLine(query.ToString());
return query.First();
}
}
which outputs the following SQL statements:
SELECT
[Project1].[C2] AS [C1],
[Project1].[Ch] AS [Ch],
[Project1].[Wa] AS [Wa],
[Project1].[Bu] AS [Bu],
[Project1].[Hi] AS [Hi],
[Project1].[C1] AS [C2]
FROM ( SELECT
[GroupBy1].[A1] AS [C1],
[GroupBy1].[K1] AS [Ch],
[GroupBy1].[K2] AS [Wa],
[GroupBy1].[K3] AS [Bu],
[GroupBy1].[K4] AS [Hi],
1 AS [C2]
FROM ( SELECT
[Extent3].[Ch] AS [K1],
[Extent3].[Wa] AS [K2],
[Extent3].[Bu] AS [K3],
[Extent3].[Hi] AS [K4],
COUNT(1) AS [A1]
FROM [dbo].[Product] AS [Extent1]
INNER JOIN [dbo].[Si] AS [Extent2] ON [Extent1].[Id] = [Extent2].[ProductId]
INNER JOIN [dbo].[Pi] AS [Extent3] ON [Extent2].[Id] = [Extent3].[SiId]
WHERE ([Extent1].[Code] = #p__linq__0) AND (#p__linq__0 IS NOT NULL)
GROUP BY [Extent3].[Ch], [Extent3].[Wa], [Extent3].[Bu], [Extent3].[Hi]
) AS [GroupBy1]
) AS [Project1]
ORDER BY [Project1].[C1] DESC
I've also tried in query syntax with about the same result. I also tried (but not for very long) executing the original SQL query directly with EF, but couldn't quickly get it working.
Is there some mistake I'm doing in translating the query to LINQ? Is there an obvious way I'm missing to improve the query? Is it possible to write the query in EF / LINQ with the same performance as the SQL statements?
====== Update ======
In SQL profiler the output for the original query is exactly the same. For the LINQ query it is very similar to what I posted above.
exec sp_executesql N'SELECT TOP (1)
[Project1].[C2] AS [C1],
[Project1].[Ch] AS [Ch],
[Project1].[Wa] AS [Wa],
[Project1].[Bu] AS [Bu],
[Project1].[Hi] AS [Hi],
[Project1].[C1] AS [C2]
FROM ( SELECT
[GroupBy1].[A1] AS [C1],
[GroupBy1].[K1] AS [Ch],
[GroupBy1].[K2] AS [Wa],
[GroupBy1].[K3] AS [Bu],
[GroupBy1].[K4] AS [Hi],
1 AS [C2]
FROM ( SELECT
[Extent3].[Ch] AS [K1],
[Extent3].[Wa] AS [K2],
[Extent3].[Bu] AS [K3],
[Extent3].[Hi] AS [K4],
COUNT(1) AS [A1]
FROM [dbo].[Product] AS [Extent1]
INNER JOIN [dbo].[Si] AS [Extent2] ON [Extent1].[Id] = [Extent2].[ProductId]
INNER JOIN [dbo].[Pi] AS [Extent3] ON [Extent2].[Id] = [Extent3].[SiId]
WHERE ([Extent1].[Code] = #p__linq__0) AND (#p__linq__0 IS NOT NULL)
GROUP BY [Extent3].[Ch], [Extent3].[Wa], [Extent3].[Bu], [Extent3].[Hi]
) AS [GroupBy1]
) AS [Project1]
ORDER BY [Project1].[C1] DESC',N'#p__linq__0 nvarchar(4000)',#p__linq__0=N'109579'
====== Update 2 ======
Here's the obfuscated XML output of the query execution plan on Snipt.org. Note the variable in question here is named "MagicalCode" in the output and both values "109579" and "2449-268-550" are valid (strings in C#) as in the final line of the XML output.
<ParameterList>
<ColumnReference
Column="#p__linq__0"
ParameterCompiledValue="N'109579'"
ParameterRuntimeValue="N'2449-268-550'" />
</ParameterList>
Plan image with actual row counts displayed
====== Update 3 ======
(hidden in a comment) I ran the EF generated SQL from entity framework in SSMS and it ran instantly. So I might be suffering from some form of parameter sniffing as hinted by this question. I'm not sure how to deal with it in the context of entity framework.
====== Update 4 ======
Updated Entity Framework SQL Execution Plan and SSMS SQL Query Execution Plan that can be opened with Plan Explorer.
====== Update 5 ======
Some workaround attempts
Running the original query using context.Database.SqlQuery<ReturnObject>(...) ran in ~4-5 seconds.
Running the original query using SqlCommand and the connection string obtained from EF context took about 3 seconds (context initialization overhead).
Running the original query using SqlCommand took with hardcoded connection string takes about 1.5 seconds.
So I ended up using the last one for now. The last thing I can think of is writing a stored procedure to get closer to the "instant" performance of running the query in SSMS.

You could try using IQueryable.AsNoTracking() see http://msdn.microsoft.com/en-us/library/gg679352(v=vs.103).aspx. It is safe to use AsNoTracking() in cases where you are not going to edit the results and save them back to the database again. Usually it makes a big difference when a Query returns a large number of rows. Make sure you put System.Data.Entity in your uses if you want to use .AsNoTracking()

It could be a problem of cached execution plan. Try execute stored procedure to clean query execution plans:
DBCC FREEPROCCACHE
Also this thread might be helpful: Entity Framework cached query plan performance degrades with different parameters

Refer to temporary table in Entity Framework query

There is a list list in memory of 50,000 Product IDs. I would like to get all these Products from the DB. Using dbContext.Products.Where(p => list.contains(p.ID)) generates a giant IN in the SQL - WHERE ID IN (2134,1324543,5675,32451,45735...), and it takes forever. This is partly because it takes time for SQL Server to parse such a large string, and also the execution plan is bad. (I know this from trying to use a temporary table instead).
So I used SQLBulkCopy to insert the IDs to a temporary table, and then ran
dbContext.Set<Product>().SqlQuery("SELECT * FROM Products WHERE ID IN (SELECT ID FROM #tmp))"
This gave good performance. However, now I need the products, with their suppliers (multiple for every product). Using a custom SQL command there is no way to get back a complex object that I know of. So how can I get the products with their suppliers, using the temporary table?
(If I can somehow refer to the temporary table in LINQ, then it would be OK - I could just do dbContext.Products.Where(p => dbContext.TempTable.Any(t => t.ID==p.ID)). If I could refer to it in a UDF that would also be good - but you can't. I cannot use a real table, since concurrent users would leave it in an inconsistent state.)
Thanks

I was curious to explore the sql generated using Join syntax rather than Contains. Here is the code for my test:
IQueryable<Product> queryable = Uow.ProductRepository.All;
List<int> inMemKeys = new int[] { 2134, 1324543, 5675, 32451, 45735 }.ToList();
string sql1 = queryable.Where(p => inMemKeys.Contains(p.ID)).ToString();
string sql2 = queryable.Join(inMemKeys, t => t.ID, pk => pk, (t, pk) => t).ToString();
This is the sql generated using Contains (sql1)
SELECT
[extent1].[id] AS [id],...etc
FROM [dbo].[products] AS [extent1]
WHERE ([extent1].[id] IN (2134, 1324543, 5675, 32451, 45735))
This is the sql generated using Join:
SELECT
[extent1].[id] AS [id],...etc
FROM [dbo].[products] AS [extent1]
INNER JOIN (SELECT
[unionall3].[c1] AS [c1]
FROM (SELECT
[unionall2].[c1] AS [c1]
FROM (SELECT
[unionall1].[c1] AS [c1]
FROM (SELECT
2134 AS [c1]
FROM (SELECT
1 AS x) AS [singlerowtable1] UNION ALL SELECT
1324543 AS [c1]
FROM (SELECT
1 AS x) AS [singlerowtable2]) AS [unionall1] UNION ALL SELECT
5675 AS [c1]
FROM (SELECT
1 AS x) AS [singlerowtable3]) AS [unionall2] UNION ALL SELECT
32451 AS [c1]
FROM (SELECT
1 AS x) AS [singlerowtable4]) AS [unionall3] UNION ALL SELECT
45735 AS [c1]
FROM (SELECT
1 AS x) AS [singlerowtable5]) AS [unionall4]
ON [extent1].[id] = [unionall4].[c1]
So the sql creates a big select statement using union all to create the equivalent of your temporary table, then it joins to that table. The sql is more verbose, but it may well be efficient - I'm afraid I'm not qualified to say.
While it doesn't answer the question as set out in the heading, it does show a way to avoid the giant IN . OK.... now it's a giant UNION ALL.... anyways...I hope that this contribution is useful to some

I suggest you extend the filter table (TempTable in the code above) to store something like a UserId or SessionId as well as ProductID's:
this will give you all the performance you're after
it will work for concurrent users
If this filter table is changing a lot then consider updating it in a separate transaction (i.e. a different instance of dbContext) to avoid holding a write lock on this table for longer than necessary.

How to write this SQL query in Entity Framework?

I have this query that I want translated pretty much 1:1 from Entity Framework to SQL:
SELECT GroupId, ItemId, count(*) as total
FROM [TESTDB].[dbo].[TestTable]
WHERE GroupId = '64025'
GROUP BY GroupId, ItemId
ORDER BY GroupId, total DESC
This SQL query should sort based on the number occurrence of the same ItemId (for that group).
I have this now:
from x in dataContext.TestTable.AsNoTracking()
where x.GroupId = 64025
group x by new {x.GroupId, x.ItemId}
into g
orderby g.Key.GroupId, g.Count() descending
select new {g.Key.GroupId, g.Key.ItemId, Count = g.Count()};
But this generates the following SQL code:
SELECT
[GroupBy1].[K1] AS [GroupId],
[GroupBy1].[K2] AS [ItemId],
[GroupBy1].[A2] AS [C1]
FROM ( SELECT
[Extent1].[GroupId] AS [K1],
[Extent1].[ItemId] AS [K2],
COUNT(1) AS [A1],
COUNT(1) AS [A2]
FROM [dbo].[TestTable] AS [Extent1]
WHERE 64025 = [Extent1].[GroupId]
GROUP BY [Extent1].[GroupId], [Extent1].[ItemId]
) AS [GroupBy1]
ORDER BY [GroupBy1].[K1] ASC, [GroupBy1].[A1] DESC
This also works but is a factor 2 slower than the SQL I created.
I've been fiddling around with the linq code for a while but I haven't managed to create something similar to my query.
Execution plan (only the last two items, the first two are identical):
FIRST: |--Stream Aggregate(GROUP BY:([Extent1].[ItemId]) DEFINE:([Expr1006]=Count(*), [Extent1].[GroupId]=ANY([TESTDB].[dbo].[TestTable].[GroupId] as [Extent1].[GroupId])))
|--Index Seek(OBJECT:([TESTDB].[dbo].[TestTable].[IX_Group]), SEEK:([TESTDB].[dbo].[TestTable].[GroupId]=(64034)) ORDERED FORWARD)
SECOND: |--Stream Aggregate(GROUP BY:([TESTDB].[dbo].[TestTable].[ItemId]) DEFINE:([Expr1007]=Count(*), [TESTDB].[dbo].[TestTable].[GroupId]=ANY([TESTDB].[dbo].[TestTable].[GroupId])))
|--Index Seek(OBJECT:([TESTDB].[dbo].[TestTable].[IX_Group] AS [Extent1]), SEEK:([Extent1].[GroupId]=(64034)) ORDERED FORWARD)

The query that Entity Framework generates and your hand crafted query are semantically the same and will give the same plan.
The derived table definition is inlined during query optimisation so the only difference might be some extremely minor additional overhead during parsing and compilation.
The snippets of SHOWPLAN_TEXT you have posted are the same plan. The only difference is aliases. It looks as though your table definition is something like.
CREATE TABLE [dbo].[TestTable]
(
[GroupId] INT,
[ItemId] INT
)
CREATE NONCLUSTERED INDEX IX_Group ON [dbo].[TestTable] ([GroupId], [ItemId])
And you are getting a plan like this
To all intents and purposes the plans are the same. Your performance testing methodology is probably flawed. Maybe your first query brought pages into cache that then benefited the second query for example.

Too Many Left Outer Joins in Entity Framework 4? [duplicate]

This question already has answers here:
Simple Linq query has duplicated join against same table?
(3 answers)
Closed 3 years ago.
I have a product entity, which has 0 or 1 "BestSeller" entities. For some reason when I say:
db.Products.OrderBy(p => p.BestSeller.rating).ToList();
the SQL I get has an "extra" outer join (below). And if I add on a second 0 or 1 relation ship, and order by both, then I get 4 outer joins. It seems like each such entity is producing 2 outer joins rather than one. LINQ to SQL behaves exactly as you'd expect, with no extra join.
Has anyone else experienced this, or know how to fix it?
SELECT
[Extent1].[id] AS [id],
[Extent1].[ProductName] AS [ProductName]
FROM [dbo].[Products] AS [Extent1]
LEFT OUTER JOIN [dbo].[BestSeller] AS [Extent2] ON [Extent1].[id] = [Extent2].[id]
LEFT OUTER JOIN [dbo].[BestSeller] AS [Extent3] ON [Extent2].[id] = [Extent3].[id]
ORDER BY [Extent3].[rating] ASC

That extra outer join does seem quite superfluous. I think it's best to contact the entity framework design team. They may know if it's a bug and see if it something that needs to be resolved in a next version. You can contact them at Link

It may be a bug, but it seems like such a simple example that it is strange that the bug has not been caught and fixed.
Could you check your EF model.
Has the BestSeller table been added twice, or is there a duplication in the relationship between the tables.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Entity Framework many-to-many relationship include extremely slow - c#

I have switched to Nhibernate.

Related

Entity Framework: What would cause slow queries after adding a table?

Trouble with Entity Framework Linq Query: runs instantly in SSMS and 8-10s in EF LINQ

Refer to temporary table in Entity Framework query

How to write this SQL query in Entity Framework?

Too Many Left Outer Joins in Entity Framework 4? [duplicate]

Categories

Resources