'let' in Linq Statement generates Cross Join

'let' in Linq Statement generates Cross Join - c#

Considering Following linq statement
var users = from a in dbContext.Users
select a;
var list = (from a in users
let count = users.Count()
where a.IsActive == true
select new { a.UserId, count }).ToList();
If we check profiler for this linq statement , it shows cross join to get count for every record.
SELECT
[Extent1].[UserId] AS [UserId],
[GroupBy1].[A1] AS [C1]
FROM [dbo].[Users] AS [Extent1]
CROSS JOIN (SELECT
COUNT(1) AS [A1]
FROM [dbo].[Users] AS [Extent2] ) AS [GroupBy1]
WHERE 1 = [Extent1].[IsActive]
I think cross join overhead for sql statement and may cause a performance issue when records are in huge amounts.
As a solution I can move that data.Count() outside of linq statment and then put in in select , but it cause two db operation.
var count = (from a in dbContext.Users
select a).Count();
var list = (from a in dbContext.Users
where a.IsActive == true
select new { a.UserId, count }).ToList();
By looking into profiler ,It will generate below two operation.
SELECT
[GroupBy1].[A1] AS [C1]
FROM ( SELECT
COUNT(1) AS [A1]
FROM [dbo].[Users] AS [Extent1]
) AS [GroupBy1]
exec sp_executesql N'SELECT
[Extent1].[UserId] AS [UserId],
#p__linq__0 AS [C1]
FROM [dbo].[Users] AS [Extent1]
WHERE 1 = [Extent1].[IsActive]',N'#p__linq__0 int',#p__linq__0=26
Can anybody have better solution than this. Or can anybody suggest best way among putting let inside linq or getting it previously?

I think cross join overhead for sql statement and may cause a performance issue when records are in huge amounts.
Not necessarily. Notice that this is joining to a sub-query, which is a single row/column of data (count). You can write this query in different ways, but in the end, it needs to join in order to return {UserId,count}. You can't return that data without a join. And the join it's doing right now is pretty efficient. So, I would recommend to not try'n optimize a problem you don't have (i.e. premature optimization).
UPDATE: adding an actual execution plan (see how to) for the following query. You can see that it's joining to a scalar value (e.g. only running the Count select query once).
Query:
SELECT
[Extent1].[UserId] AS [UserId],
[GroupBy1].[A1] AS [C1]
FROM [dbo].[Users] AS [Extent1]
CROSS JOIN (SELECT
COUNT(1) AS [A1]
FROM [dbo].[Users] AS [Extent2] ) AS [GroupBy1]
WHERE 1 = [Extent1].[IsActive]
Execution plan:

There shouldn't be any performance issues with the generated sql. The cross join results in one record and the optimizer will only have to calculate it once regardless of the amount of active users in your table.
If you are not convinced compare the execution plan to your alternative. I can only think of using a sub select, but it doesn't look better to me.
Sub Select
SELECT
[UserId],
(SELECT count(*) FROM [dbo].[Users]) as [Cnt]
FROM [dbo].[Users]
WHERE 1 = [IsActive]

Related

Entity Framework: What would cause slow queries after adding a table?

I have an entity framework 5 project hooked up to a SQLite database.
I did the model first approach and I was able to query 30,000 records from Table_A in roughly 3 seconds.
Now all I did was a another Table_B which has 0 to 1 references to a parent record from Table_A. It takes over 3 minutes to run the SAME query on Table_A. Table_B has ZERO records in it.
It's also worth noting that the EDMX added Navigation Properties to Table_A and Table_B. However it only added the foreign key column to Table_B. What would cause Entity Framework to slow down that much? When I revert my changes back to the old model, it runs fast.
Update
For reference the query is a standard linq to sql query.
var matches = Table_A.Where(it => it.UserName == "Waldo" || it.TimeStamp < oneMonthAgo);

I just ran the ToTraceString() to find the generated SQL query that this guy suggested in his answer here:
Turns out Entity Framework tried to be "smart" anticipating that I would use data from the child record. This is actually pretty cool! Just slows down my query a bit, so I might find a faster workaround.
Please note that this query is identical in LINQ syntax. This is just the underlying SQL that is generated as soon as I added another Table into the EDMX diagram.
Here is the FAST query: (abbreviated for clarity)
SELECT *
FROM [Table_A] AS [Extent1]
INNER JOIN (SELECT
[Extent2].[OID] AS [K1],
[Extent2].[C_Column1] AS [K2],
Max([Extent2].[Id]) AS [A1]
FROM [Table_A] AS [Extent2]
GROUP BY [Extent2].[OID], [Extent2].[C_Column1] ) AS [GroupBy1] ON [Extent1].[Id] =
[GroupBy1].[A1]
INNER JOIN [OtherExistingTable] AS [Extent3] ON [Extent1].[C_Column1] = [Extent3].[Id]
After adding Table_B this was the new query that was generated which made things much much slower.
SELECT *
FROM [Table_A] AS [Extent1]
LEFT OUTER JOIN [Table_B] AS [Extent2] ON [Extent1].[Id] = [Extent2].[Table_B_ForeignKey_To_Table_A]
INNER JOIN (SELECT
[Join2].[K1] AS [K1],
[Join2].[K2] AS [K2],
Max([Join2].[A1]) AS [A1]
FROM ( SELECT
[Extent3].[OID] AS [K1],
[Extent3].[C_Column1] AS [K2],
[Extent3].[Id] AS [A1]
FROM [Table_A] AS [Extent3]
LEFT OUTER JOIN [Table_B] AS [Extent4] ON [Extent3].[Id] = [Extent4].[Table_B_ForeignKey_To_Table_A]
) AS [Join2]
GROUP BY [K1], [K2] ) AS [GroupBy1] ON [Extent1].[Id] = [GroupBy1].[A1]
INNER JOIN [FeatureServices] AS [Extent5] ON [Extent1].[C_Column1] = [Extent5].[Id]

What is the different between SIngleorDefault code writing concepts in Entity Framewrok

I have write one line get one row from database using SingleOrDefault
Context.TableName.SingleOrDefault(x=>x.id==1);
And i also write the code like
Context.TableName.Where(x=>x.id==1).SingleOrDefault();
The result gave same values. But i want to know, what is the different?
I have read some articles, but they have explained,
Nothing Difference, differences only for coding standard.
I think the both codes are looking standard codes.
So Please let me know, If any difference for the two types of codes.
Also i have same doubt for these code writing concepts
First()
Single()
FirstOrDefault()

There is no difference in the SQL statements generated. You can log the SQL statements easily by adding this line (Entity Framework 6)...
context.Database.Log = Console.WriteLine;
These 2 lines...
context.Users.SingleOrDefault(u => u.Id == "foo");
context.Users.Where(u => u.Id == "foo").SingleOrDefault();
Both return the exact same SQL statement using the SQL Server provider...
SELECT
[Limit1].[C1] AS [C1],
[Limit1].[Id] AS [Id],
[Limit1].[UserName] AS [UserName],
[Limit1].[PasswordHash] AS [PasswordHash],
[Limit1].[SecurityStamp] AS [SecurityStamp]
FROM ( SELECT TOP (2)
[Extent1].[Id] AS [Id],
[Extent1].[UserName] AS [UserName],
[Extent1].[PasswordHash] AS [PasswordHash],
[Extent1].[SecurityStamp] AS [SecurityStamp],
'0X0X' AS [C1]
FROM [dbo].[AspNetUsers] AS [Extent1]
WHERE ([Extent1].[Discriminator] = N'ApplicationUser') AND (N'foo' = [Extent1].[Id])
) AS [Limit1]
Same thing for FirstOrDefault():
context.Users.FirstOrDefault(u => u.Id == "foo");
context.Users.Where(u => u.Id == "foo").FirstOrDefault();
Both generate the same query:
SELECT
[Limit1].[C1] AS [C1],
[Limit1].[Id] AS [Id],
[Limit1].[UserName] AS [UserName],
[Limit1].[PasswordHash] AS [PasswordHash],
[Limit1].[SecurityStamp] AS [SecurityStamp]
FROM ( SELECT TOP (1)
[Extent1].[Id] AS [Id],
[Extent1].[UserName] AS [UserName],
[Extent1].[PasswordHash] AS [PasswordHash],
[Extent1].[SecurityStamp] AS [SecurityStamp],
'0X0X' AS [C1]
FROM [dbo].[AspNetUsers] AS [Extent1]
WHERE ([Extent1].[Discriminator] = N'ApplicationUser') AND (N'foo' = [Extent1].[Id])
) AS [Limit1]
You can see that SingleOrDefault() does a TOP(2), this is because it's ensuring there is only 1 record returned, whereas FirstOrDefault() doesn't care.

If you want to know what technical or algorithmical difference between SingleOrDefault and FirstOrDefault or SingleOrDefault and Single - there is a lot of question with pretty good answers. For example:
LINQ: When to use SingleOrDefault vs. FirstOrDefault() with filtering criteria
If you doubt between writing
Context.TableName.SingleOrDefault(x=>x.id==1);
and
Context.TableName.Where(x=>x.id==1).SingleOrDefault();
-- result, of course will be same. But my opinion - second sample contains redundant Where. And it is redundant may be not only technically, but and with point of view of readability of your code. And in second case it is not clear - what do you expect from you Where condition, is it especially for something or not.

Trouble with Entity Framework Linq Query: runs instantly in SSMS and 8-10s in EF LINQ

I was given the following query in SQL (variable names obfuscated) which is trying to get the values (Ch, Wa, Bu, Hi) resulting in the greatest number (cnt) of Pi entries.
select top 1 Pi.Ch, Pi.Wa, Pi.Bu, Pi.Hi, COUNT(1) as cnt
from Product, Si, Pi
where Product.Id = Si.ProductId
and Si.Id = Pi.SiId
and Product.Code = #CodeParameter
group by Pi.Ch, Pi.Wa, Pi.Bu, Pi.Hi
order by cnt desc
which runs instantly in SQL management studio on our production database. I've successfully written the code a few ways in C# LINQ and Entity Framework, but every way the code runs in 8 - 10 seconds. One attempt is the following code (doing it without the print as one call gives the same performance results):
using(var context = new MyEntities()){
var query = context.Products
.Where(p => p.Code == codeFromFunctionArgument)
.Join(context.Sis, p => p.Id, s => s.ProductId, (p, s) => new { sId = s.Id })
.Join(context.Pis, ps => ps.sId, pi => pi.SiId, (ps, pi) => new {pi.Ch, pic.Wa, pic.Bu, pic.Hi})
.GroupBy(
pi => pi,
(k, g) => new MostPisResult()
{
Ch = k.Ch,
Wa = k.Wa,
Bu = k.Bu,
Hi = k.Hi,
Count = g.Count()
}
)
.OrderByDescending(x => x.Count);
Console.WriteLine(query.ToString());
return query.First();
}
}
which outputs the following SQL statements:
SELECT
[Project1].[C2] AS [C1],
[Project1].[Ch] AS [Ch],
[Project1].[Wa] AS [Wa],
[Project1].[Bu] AS [Bu],
[Project1].[Hi] AS [Hi],
[Project1].[C1] AS [C2]
FROM ( SELECT
[GroupBy1].[A1] AS [C1],
[GroupBy1].[K1] AS [Ch],
[GroupBy1].[K2] AS [Wa],
[GroupBy1].[K3] AS [Bu],
[GroupBy1].[K4] AS [Hi],
1 AS [C2]
FROM ( SELECT
[Extent3].[Ch] AS [K1],
[Extent3].[Wa] AS [K2],
[Extent3].[Bu] AS [K3],
[Extent3].[Hi] AS [K4],
COUNT(1) AS [A1]
FROM [dbo].[Product] AS [Extent1]
INNER JOIN [dbo].[Si] AS [Extent2] ON [Extent1].[Id] = [Extent2].[ProductId]
INNER JOIN [dbo].[Pi] AS [Extent3] ON [Extent2].[Id] = [Extent3].[SiId]
WHERE ([Extent1].[Code] = #p__linq__0) AND (#p__linq__0 IS NOT NULL)
GROUP BY [Extent3].[Ch], [Extent3].[Wa], [Extent3].[Bu], [Extent3].[Hi]
) AS [GroupBy1]
) AS [Project1]
ORDER BY [Project1].[C1] DESC
I've also tried in query syntax with about the same result. I also tried (but not for very long) executing the original SQL query directly with EF, but couldn't quickly get it working.
Is there some mistake I'm doing in translating the query to LINQ? Is there an obvious way I'm missing to improve the query? Is it possible to write the query in EF / LINQ with the same performance as the SQL statements?
====== Update ======
In SQL profiler the output for the original query is exactly the same. For the LINQ query it is very similar to what I posted above.
exec sp_executesql N'SELECT TOP (1)
[Project1].[C2] AS [C1],
[Project1].[Ch] AS [Ch],
[Project1].[Wa] AS [Wa],
[Project1].[Bu] AS [Bu],
[Project1].[Hi] AS [Hi],
[Project1].[C1] AS [C2]
FROM ( SELECT
[GroupBy1].[A1] AS [C1],
[GroupBy1].[K1] AS [Ch],
[GroupBy1].[K2] AS [Wa],
[GroupBy1].[K3] AS [Bu],
[GroupBy1].[K4] AS [Hi],
1 AS [C2]
FROM ( SELECT
[Extent3].[Ch] AS [K1],
[Extent3].[Wa] AS [K2],
[Extent3].[Bu] AS [K3],
[Extent3].[Hi] AS [K4],
COUNT(1) AS [A1]
FROM [dbo].[Product] AS [Extent1]
INNER JOIN [dbo].[Si] AS [Extent2] ON [Extent1].[Id] = [Extent2].[ProductId]
INNER JOIN [dbo].[Pi] AS [Extent3] ON [Extent2].[Id] = [Extent3].[SiId]
WHERE ([Extent1].[Code] = #p__linq__0) AND (#p__linq__0 IS NOT NULL)
GROUP BY [Extent3].[Ch], [Extent3].[Wa], [Extent3].[Bu], [Extent3].[Hi]
) AS [GroupBy1]
) AS [Project1]
ORDER BY [Project1].[C1] DESC',N'#p__linq__0 nvarchar(4000)',#p__linq__0=N'109579'
====== Update 2 ======
Here's the obfuscated XML output of the query execution plan on Snipt.org. Note the variable in question here is named "MagicalCode" in the output and both values "109579" and "2449-268-550" are valid (strings in C#) as in the final line of the XML output.
<ParameterList>
<ColumnReference
Column="#p__linq__0"
ParameterCompiledValue="N'109579'"
ParameterRuntimeValue="N'2449-268-550'" />
</ParameterList>
Plan image with actual row counts displayed
====== Update 3 ======
(hidden in a comment) I ran the EF generated SQL from entity framework in SSMS and it ran instantly. So I might be suffering from some form of parameter sniffing as hinted by this question. I'm not sure how to deal with it in the context of entity framework.
====== Update 4 ======
Updated Entity Framework SQL Execution Plan and SSMS SQL Query Execution Plan that can be opened with Plan Explorer.
====== Update 5 ======
Some workaround attempts
Running the original query using context.Database.SqlQuery<ReturnObject>(...) ran in ~4-5 seconds.
Running the original query using SqlCommand and the connection string obtained from EF context took about 3 seconds (context initialization overhead).
Running the original query using SqlCommand took with hardcoded connection string takes about 1.5 seconds.
So I ended up using the last one for now. The last thing I can think of is writing a stored procedure to get closer to the "instant" performance of running the query in SSMS.

You could try using IQueryable.AsNoTracking() see http://msdn.microsoft.com/en-us/library/gg679352(v=vs.103).aspx. It is safe to use AsNoTracking() in cases where you are not going to edit the results and save them back to the database again. Usually it makes a big difference when a Query returns a large number of rows. Make sure you put System.Data.Entity in your uses if you want to use .AsNoTracking()

It could be a problem of cached execution plan. Try execute stored procedure to clean query execution plans:
DBCC FREEPROCCACHE
Also this thread might be helpful: Entity Framework cached query plan performance degrades with different parameters

Refer to temporary table in Entity Framework query

There is a list list in memory of 50,000 Product IDs. I would like to get all these Products from the DB. Using dbContext.Products.Where(p => list.contains(p.ID)) generates a giant IN in the SQL - WHERE ID IN (2134,1324543,5675,32451,45735...), and it takes forever. This is partly because it takes time for SQL Server to parse such a large string, and also the execution plan is bad. (I know this from trying to use a temporary table instead).
So I used SQLBulkCopy to insert the IDs to a temporary table, and then ran
dbContext.Set<Product>().SqlQuery("SELECT * FROM Products WHERE ID IN (SELECT ID FROM #tmp))"
This gave good performance. However, now I need the products, with their suppliers (multiple for every product). Using a custom SQL command there is no way to get back a complex object that I know of. So how can I get the products with their suppliers, using the temporary table?
(If I can somehow refer to the temporary table in LINQ, then it would be OK - I could just do dbContext.Products.Where(p => dbContext.TempTable.Any(t => t.ID==p.ID)). If I could refer to it in a UDF that would also be good - but you can't. I cannot use a real table, since concurrent users would leave it in an inconsistent state.)
Thanks

I was curious to explore the sql generated using Join syntax rather than Contains. Here is the code for my test:
IQueryable<Product> queryable = Uow.ProductRepository.All;
List<int> inMemKeys = new int[] { 2134, 1324543, 5675, 32451, 45735 }.ToList();
string sql1 = queryable.Where(p => inMemKeys.Contains(p.ID)).ToString();
string sql2 = queryable.Join(inMemKeys, t => t.ID, pk => pk, (t, pk) => t).ToString();
This is the sql generated using Contains (sql1)
SELECT
[extent1].[id] AS [id],...etc
FROM [dbo].[products] AS [extent1]
WHERE ([extent1].[id] IN (2134, 1324543, 5675, 32451, 45735))
This is the sql generated using Join:
SELECT
[extent1].[id] AS [id],...etc
FROM [dbo].[products] AS [extent1]
INNER JOIN (SELECT
[unionall3].[c1] AS [c1]
FROM (SELECT
[unionall2].[c1] AS [c1]
FROM (SELECT
[unionall1].[c1] AS [c1]
FROM (SELECT
2134 AS [c1]
FROM (SELECT
1 AS x) AS [singlerowtable1] UNION ALL SELECT
1324543 AS [c1]
FROM (SELECT
1 AS x) AS [singlerowtable2]) AS [unionall1] UNION ALL SELECT
5675 AS [c1]
FROM (SELECT
1 AS x) AS [singlerowtable3]) AS [unionall2] UNION ALL SELECT
32451 AS [c1]
FROM (SELECT
1 AS x) AS [singlerowtable4]) AS [unionall3] UNION ALL SELECT
45735 AS [c1]
FROM (SELECT
1 AS x) AS [singlerowtable5]) AS [unionall4]
ON [extent1].[id] = [unionall4].[c1]
So the sql creates a big select statement using union all to create the equivalent of your temporary table, then it joins to that table. The sql is more verbose, but it may well be efficient - I'm afraid I'm not qualified to say.
While it doesn't answer the question as set out in the heading, it does show a way to avoid the giant IN . OK.... now it's a giant UNION ALL.... anyways...I hope that this contribution is useful to some

I suggest you extend the filter table (TempTable in the code above) to store something like a UserId or SessionId as well as ProductID's:
this will give you all the performance you're after
it will work for concurrent users
If this filter table is changing a lot then consider updating it in a separate transaction (i.e. a different instance of dbContext) to avoid holding a write lock on this table for longer than necessary.

Entity Framework many-to-many relationship include extremely slow

I have an Entity Framework 4 model, with 2 entities containing many-to-many relationship, so 3 tables, [Q], [P] and [Q2P]-cross table. Running code like:
context.Q.Include("P");
Results in long time wait (I waited like 5 mins then aborted it). Then I checked SQL generated and found this:
SELECT *
FROM ( SELECT *
FROM [Q] AS [Extent1]
LEFT OUTER JOIN (SELECT *, CASE WHEN ([Join1].[Id] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C2]
FROM [Q_P] AS [Extent2]
INNER JOIN [P] AS [Extent3] ON [Extent3].[Id] = [Extent2].[Id] ) AS [Join1] ON [Extent1].[Id] = [Join1].[Id]
) AS [Project1]
ORDER BY [Project1].[Id] ASC, [Project1].[C2] ASC
I can't hide my suprise, WTF is this? The usual many-to-many SQL query
select * from [q2p]
join [q] on qId=q.Id
join [p] on pId=p.Id
executes in less than 1ms, while EF query executes forever.

Yep, that's not a secret that it takes long, please vote on the connection I opened about a year ago.
However, 5 minutes is something it's definitely not supposed to take.
Try separating the execution from the query generation, and use ToTraceString to see how long does it take to determine query generation time.
Eager-loading is not the great deal in current version, they said they're planning toreduce the performance cost in future.
Anyway, what you could do is use Stored Procedures or create your own ObjectQueries.
See:
- http://msdn.microsoft.com/en-us/library/bb896241.aspx
- http://msdn.microsoft.com/en-us/library/bb896238.aspx

I have switched to Nhibernate.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

'let' in Linq Statement generates Cross Join - c#

Related

Entity Framework: What would cause slow queries after adding a table?

What is the different between SIngleorDefault code writing concepts in Entity Framewrok

Trouble with Entity Framework Linq Query: runs instantly in SSMS and 8-10s in EF LINQ

Refer to temporary table in Entity Framework query

Entity Framework many-to-many relationship include extremely slow

Categories

Resources