Optimal Count() operation from a DB using LINQ to SQL - c#

DBAs have told me that when using T-SQL:
select count(id) from tableName
is faster than
select count(*) from tablenName
if id is the primary key.
Extrapolating that to LINQ-TO-SQL is the following accurate?
This LINQ-to-SQL statement:
int count = dataContext.TableName.Select(primaryKeyId => primaryKeyId).Count();
is more performant than this one:
int count = dataContext.TableName.Count();

As I understand it there's no difference between your two select count statements.
Using LINQPad we can examine the T-SQL generated by different LINQ statements.
For Linq to SQL both
TableName.Select(primaryKeyId => primaryKeyId).Count();
and
TableName.Count();
generate the same SQL
SELECT COUNT(*) AS [value] FROM [dbo].[TableName] AS [t0]
For Linq to Entites, again they both generate the same SQL, but now it's
SELECT
[GroupBy1].[A1] AS [C1]
FROM ( SELECT
COUNT(1) AS [A1]
FROM [dbo].[TableName] AS [Extent1]
) AS [GroupBy1]

I know this is an old one but watch out with sql server! Count does not count null values so the two statements may not be equivalent if your primary key field is nullable. see
create table #a(col int null)
insert into #a values (null)
select COUNT(*)
from #a;
select COUNT(col)
from #a;

Related

Linq.dynamic.core IQueryable to SQL statement

I'm currently using System.Linq.Dynamic.Core to generate the SQL statements in my application. The problem is when I tried
db.table.select("new (column1 as a1)").ToString()
The generated SQL string is automatically adding another 1 AS [C1] & the column alias a1 didn't show up in the output string:
SELECT
1 AS [C1],
[Extent1].[column1] AS [column1]
FROM
(SELECT
[table].[column1] AS [column1]
FROM
table AS [table]) AS [Extent1]
My question is how to achieve below outcome & why above behaviour happening.
SELECT
[Extent1].[column1] AS [c1]
FROM
(SELECT
[table].[column1] AS [column1]
FROM
table AS [table]) AS [Extent1]
Try changing the line of code that gets data to
var list = db.table.Select("new (column1 as a1)").ToDynamicList();
The result will be List<dynamic> of dynamic objects that each have a1 property.
The actual SQL statement may differ, but the name of the property returned will respect the alias name.

'let' in Linq Statement generates Cross Join

Considering Following linq statement
var users = from a in dbContext.Users
select a;
var list = (from a in users
let count = users.Count()
where a.IsActive == true
select new { a.UserId, count }).ToList();
If we check profiler for this linq statement , it shows cross join to get count for every record.
SELECT
[Extent1].[UserId] AS [UserId],
[GroupBy1].[A1] AS [C1]
FROM [dbo].[Users] AS [Extent1]
CROSS JOIN (SELECT
COUNT(1) AS [A1]
FROM [dbo].[Users] AS [Extent2] ) AS [GroupBy1]
WHERE 1 = [Extent1].[IsActive]
I think cross join overhead for sql statement and may cause a performance issue when records are in huge amounts.
As a solution I can move that data.Count() outside of linq statment and then put in in select , but it cause two db operation.
var count = (from a in dbContext.Users
select a).Count();
var list = (from a in dbContext.Users
where a.IsActive == true
select new { a.UserId, count }).ToList();
By looking into profiler ,It will generate below two operation.
SELECT
[GroupBy1].[A1] AS [C1]
FROM ( SELECT
COUNT(1) AS [A1]
FROM [dbo].[Users] AS [Extent1]
) AS [GroupBy1]
exec sp_executesql N'SELECT
[Extent1].[UserId] AS [UserId],
#p__linq__0 AS [C1]
FROM [dbo].[Users] AS [Extent1]
WHERE 1 = [Extent1].[IsActive]',N'#p__linq__0 int',#p__linq__0=26
Can anybody have better solution than this. Or can anybody suggest best way among putting let inside linq or getting it previously?
I think cross join overhead for sql statement and may cause a performance issue when records are in huge amounts.
Not necessarily. Notice that this is joining to a sub-query, which is a single row/column of data (count). You can write this query in different ways, but in the end, it needs to join in order to return {UserId,count}. You can't return that data without a join. And the join it's doing right now is pretty efficient. So, I would recommend to not try'n optimize a problem you don't have (i.e. premature optimization).
UPDATE: adding an actual execution plan (see how to) for the following query. You can see that it's joining to a scalar value (e.g. only running the Count select query once).
Query:
SELECT
[Extent1].[UserId] AS [UserId],
[GroupBy1].[A1] AS [C1]
FROM [dbo].[Users] AS [Extent1]
CROSS JOIN (SELECT
COUNT(1) AS [A1]
FROM [dbo].[Users] AS [Extent2] ) AS [GroupBy1]
WHERE 1 = [Extent1].[IsActive]
Execution plan:
There shouldn't be any performance issues with the generated sql. The cross join results in one record and the optimizer will only have to calculate it once regardless of the amount of active users in your table.
If you are not convinced compare the execution plan to your alternative. I can only think of using a sub select, but it doesn't look better to me.
Sub Select
SELECT
[UserId],
(SELECT count(*) FROM [dbo].[Users]) as [Cnt]
FROM [dbo].[Users]
WHERE 1 = [IsActive]

Trouble with Entity Framework Linq Query: runs instantly in SSMS and 8-10s in EF LINQ

I was given the following query in SQL (variable names obfuscated) which is trying to get the values (Ch, Wa, Bu, Hi) resulting in the greatest number (cnt) of Pi entries.
select top 1 Pi.Ch, Pi.Wa, Pi.Bu, Pi.Hi, COUNT(1) as cnt
from Product, Si, Pi
where Product.Id = Si.ProductId
and Si.Id = Pi.SiId
and Product.Code = #CodeParameter
group by Pi.Ch, Pi.Wa, Pi.Bu, Pi.Hi
order by cnt desc
which runs instantly in SQL management studio on our production database. I've successfully written the code a few ways in C# LINQ and Entity Framework, but every way the code runs in 8 - 10 seconds. One attempt is the following code (doing it without the print as one call gives the same performance results):
using(var context = new MyEntities()){
var query = context.Products
.Where(p => p.Code == codeFromFunctionArgument)
.Join(context.Sis, p => p.Id, s => s.ProductId, (p, s) => new { sId = s.Id })
.Join(context.Pis, ps => ps.sId, pi => pi.SiId, (ps, pi) => new {pi.Ch, pic.Wa, pic.Bu, pic.Hi})
.GroupBy(
pi => pi,
(k, g) => new MostPisResult()
{
Ch = k.Ch,
Wa = k.Wa,
Bu = k.Bu,
Hi = k.Hi,
Count = g.Count()
}
)
.OrderByDescending(x => x.Count);
Console.WriteLine(query.ToString());
return query.First();
}
}
which outputs the following SQL statements:
SELECT
[Project1].[C2] AS [C1],
[Project1].[Ch] AS [Ch],
[Project1].[Wa] AS [Wa],
[Project1].[Bu] AS [Bu],
[Project1].[Hi] AS [Hi],
[Project1].[C1] AS [C2]
FROM ( SELECT
[GroupBy1].[A1] AS [C1],
[GroupBy1].[K1] AS [Ch],
[GroupBy1].[K2] AS [Wa],
[GroupBy1].[K3] AS [Bu],
[GroupBy1].[K4] AS [Hi],
1 AS [C2]
FROM ( SELECT
[Extent3].[Ch] AS [K1],
[Extent3].[Wa] AS [K2],
[Extent3].[Bu] AS [K3],
[Extent3].[Hi] AS [K4],
COUNT(1) AS [A1]
FROM [dbo].[Product] AS [Extent1]
INNER JOIN [dbo].[Si] AS [Extent2] ON [Extent1].[Id] = [Extent2].[ProductId]
INNER JOIN [dbo].[Pi] AS [Extent3] ON [Extent2].[Id] = [Extent3].[SiId]
WHERE ([Extent1].[Code] = #p__linq__0) AND (#p__linq__0 IS NOT NULL)
GROUP BY [Extent3].[Ch], [Extent3].[Wa], [Extent3].[Bu], [Extent3].[Hi]
) AS [GroupBy1]
) AS [Project1]
ORDER BY [Project1].[C1] DESC
I've also tried in query syntax with about the same result. I also tried (but not for very long) executing the original SQL query directly with EF, but couldn't quickly get it working.
Is there some mistake I'm doing in translating the query to LINQ? Is there an obvious way I'm missing to improve the query? Is it possible to write the query in EF / LINQ with the same performance as the SQL statements?
====== Update ======
In SQL profiler the output for the original query is exactly the same. For the LINQ query it is very similar to what I posted above.
exec sp_executesql N'SELECT TOP (1)
[Project1].[C2] AS [C1],
[Project1].[Ch] AS [Ch],
[Project1].[Wa] AS [Wa],
[Project1].[Bu] AS [Bu],
[Project1].[Hi] AS [Hi],
[Project1].[C1] AS [C2]
FROM ( SELECT
[GroupBy1].[A1] AS [C1],
[GroupBy1].[K1] AS [Ch],
[GroupBy1].[K2] AS [Wa],
[GroupBy1].[K3] AS [Bu],
[GroupBy1].[K4] AS [Hi],
1 AS [C2]
FROM ( SELECT
[Extent3].[Ch] AS [K1],
[Extent3].[Wa] AS [K2],
[Extent3].[Bu] AS [K3],
[Extent3].[Hi] AS [K4],
COUNT(1) AS [A1]
FROM [dbo].[Product] AS [Extent1]
INNER JOIN [dbo].[Si] AS [Extent2] ON [Extent1].[Id] = [Extent2].[ProductId]
INNER JOIN [dbo].[Pi] AS [Extent3] ON [Extent2].[Id] = [Extent3].[SiId]
WHERE ([Extent1].[Code] = #p__linq__0) AND (#p__linq__0 IS NOT NULL)
GROUP BY [Extent3].[Ch], [Extent3].[Wa], [Extent3].[Bu], [Extent3].[Hi]
) AS [GroupBy1]
) AS [Project1]
ORDER BY [Project1].[C1] DESC',N'#p__linq__0 nvarchar(4000)',#p__linq__0=N'109579'
====== Update 2 ======
Here's the obfuscated XML output of the query execution plan on Snipt.org. Note the variable in question here is named "MagicalCode" in the output and both values "109579" and "2449-268-550" are valid (strings in C#) as in the final line of the XML output.
<ParameterList>
<ColumnReference
Column="#p__linq__0"
ParameterCompiledValue="N'109579'"
ParameterRuntimeValue="N'2449-268-550'" />
</ParameterList>
Plan image with actual row counts displayed
====== Update 3 ======
(hidden in a comment) I ran the EF generated SQL from entity framework in SSMS and it ran instantly. So I might be suffering from some form of parameter sniffing as hinted by this question. I'm not sure how to deal with it in the context of entity framework.
====== Update 4 ======
Updated Entity Framework SQL Execution Plan and SSMS SQL Query Execution Plan that can be opened with Plan Explorer.
====== Update 5 ======
Some workaround attempts
Running the original query using context.Database.SqlQuery<ReturnObject>(...) ran in ~4-5 seconds.
Running the original query using SqlCommand and the connection string obtained from EF context took about 3 seconds (context initialization overhead).
Running the original query using SqlCommand took with hardcoded connection string takes about 1.5 seconds.
So I ended up using the last one for now. The last thing I can think of is writing a stored procedure to get closer to the "instant" performance of running the query in SSMS.
You could try using IQueryable.AsNoTracking() see http://msdn.microsoft.com/en-us/library/gg679352(v=vs.103).aspx. It is safe to use AsNoTracking() in cases where you are not going to edit the results and save them back to the database again. Usually it makes a big difference when a Query returns a large number of rows. Make sure you put System.Data.Entity in your uses if you want to use .AsNoTracking()
It could be a problem of cached execution plan. Try execute stored procedure to clean query execution plans:
DBCC FREEPROCCACHE
Also this thread might be helpful: Entity Framework cached query plan performance degrades with different parameters

Refer to temporary table in Entity Framework query

There is a list list in memory of 50,000 Product IDs. I would like to get all these Products from the DB. Using dbContext.Products.Where(p => list.contains(p.ID)) generates a giant IN in the SQL - WHERE ID IN (2134,1324543,5675,32451,45735...), and it takes forever. This is partly because it takes time for SQL Server to parse such a large string, and also the execution plan is bad. (I know this from trying to use a temporary table instead).
So I used SQLBulkCopy to insert the IDs to a temporary table, and then ran
dbContext.Set<Product>().SqlQuery("SELECT * FROM Products WHERE ID IN (SELECT ID FROM #tmp))"
This gave good performance. However, now I need the products, with their suppliers (multiple for every product). Using a custom SQL command there is no way to get back a complex object that I know of. So how can I get the products with their suppliers, using the temporary table?
(If I can somehow refer to the temporary table in LINQ, then it would be OK - I could just do dbContext.Products.Where(p => dbContext.TempTable.Any(t => t.ID==p.ID)). If I could refer to it in a UDF that would also be good - but you can't. I cannot use a real table, since concurrent users would leave it in an inconsistent state.)
Thanks
I was curious to explore the sql generated using Join syntax rather than Contains. Here is the code for my test:
IQueryable<Product> queryable = Uow.ProductRepository.All;
List<int> inMemKeys = new int[] { 2134, 1324543, 5675, 32451, 45735 }.ToList();
string sql1 = queryable.Where(p => inMemKeys.Contains(p.ID)).ToString();
string sql2 = queryable.Join(inMemKeys, t => t.ID, pk => pk, (t, pk) => t).ToString();
This is the sql generated using Contains (sql1)
SELECT
[extent1].[id] AS [id],...etc
FROM [dbo].[products] AS [extent1]
WHERE ([extent1].[id] IN (2134, 1324543, 5675, 32451, 45735))
This is the sql generated using Join:
SELECT
[extent1].[id] AS [id],...etc
FROM [dbo].[products] AS [extent1]
INNER JOIN (SELECT
[unionall3].[c1] AS [c1]
FROM (SELECT
[unionall2].[c1] AS [c1]
FROM (SELECT
[unionall1].[c1] AS [c1]
FROM (SELECT
2134 AS [c1]
FROM (SELECT
1 AS x) AS [singlerowtable1] UNION ALL SELECT
1324543 AS [c1]
FROM (SELECT
1 AS x) AS [singlerowtable2]) AS [unionall1] UNION ALL SELECT
5675 AS [c1]
FROM (SELECT
1 AS x) AS [singlerowtable3]) AS [unionall2] UNION ALL SELECT
32451 AS [c1]
FROM (SELECT
1 AS x) AS [singlerowtable4]) AS [unionall3] UNION ALL SELECT
45735 AS [c1]
FROM (SELECT
1 AS x) AS [singlerowtable5]) AS [unionall4]
ON [extent1].[id] = [unionall4].[c1]
So the sql creates a big select statement using union all to create the equivalent of your temporary table, then it joins to that table. The sql is more verbose, but it may well be efficient - I'm afraid I'm not qualified to say.
While it doesn't answer the question as set out in the heading, it does show a way to avoid the giant IN . OK.... now it's a giant UNION ALL.... anyways...I hope that this contribution is useful to some
I suggest you extend the filter table (TempTable in the code above) to store something like a UserId or SessionId as well as ProductID's:
this will give you all the performance you're after
it will work for concurrent users
If this filter table is changing a lot then consider updating it in a separate transaction (i.e. a different instance of dbContext) to avoid holding a write lock on this table for longer than necessary.

How to write this SQL query in Entity Framework?

I have this query that I want translated pretty much 1:1 from Entity Framework to SQL:
SELECT GroupId, ItemId, count(*) as total
FROM [TESTDB].[dbo].[TestTable]
WHERE GroupId = '64025'
GROUP BY GroupId, ItemId
ORDER BY GroupId, total DESC
This SQL query should sort based on the number occurrence of the same ItemId (for that group).
I have this now:
from x in dataContext.TestTable.AsNoTracking()
where x.GroupId = 64025
group x by new {x.GroupId, x.ItemId}
into g
orderby g.Key.GroupId, g.Count() descending
select new {g.Key.GroupId, g.Key.ItemId, Count = g.Count()};
But this generates the following SQL code:
SELECT
[GroupBy1].[K1] AS [GroupId],
[GroupBy1].[K2] AS [ItemId],
[GroupBy1].[A2] AS [C1]
FROM ( SELECT
[Extent1].[GroupId] AS [K1],
[Extent1].[ItemId] AS [K2],
COUNT(1) AS [A1],
COUNT(1) AS [A2]
FROM [dbo].[TestTable] AS [Extent1]
WHERE 64025 = [Extent1].[GroupId]
GROUP BY [Extent1].[GroupId], [Extent1].[ItemId]
) AS [GroupBy1]
ORDER BY [GroupBy1].[K1] ASC, [GroupBy1].[A1] DESC
This also works but is a factor 2 slower than the SQL I created.
I've been fiddling around with the linq code for a while but I haven't managed to create something similar to my query.
Execution plan (only the last two items, the first two are identical):
FIRST: |--Stream Aggregate(GROUP BY:([Extent1].[ItemId]) DEFINE:([Expr1006]=Count(*), [Extent1].[GroupId]=ANY([TESTDB].[dbo].[TestTable].[GroupId] as [Extent1].[GroupId])))
|--Index Seek(OBJECT:([TESTDB].[dbo].[TestTable].[IX_Group]), SEEK:([TESTDB].[dbo].[TestTable].[GroupId]=(64034)) ORDERED FORWARD)
SECOND: |--Stream Aggregate(GROUP BY:([TESTDB].[dbo].[TestTable].[ItemId]) DEFINE:([Expr1007]=Count(*), [TESTDB].[dbo].[TestTable].[GroupId]=ANY([TESTDB].[dbo].[TestTable].[GroupId])))
|--Index Seek(OBJECT:([TESTDB].[dbo].[TestTable].[IX_Group] AS [Extent1]), SEEK:([Extent1].[GroupId]=(64034)) ORDERED FORWARD)
The query that Entity Framework generates and your hand crafted query are semantically the same and will give the same plan.
The derived table definition is inlined during query optimisation so the only difference might be some extremely minor additional overhead during parsing and compilation.
The snippets of SHOWPLAN_TEXT you have posted are the same plan. The only difference is aliases. It looks as though your table definition is something like.
CREATE TABLE [dbo].[TestTable]
(
[GroupId] INT,
[ItemId] INT
)
CREATE NONCLUSTERED INDEX IX_Group ON [dbo].[TestTable] ([GroupId], [ItemId])
And you are getting a plan like this
To all intents and purposes the plans are the same. Your performance testing methodology is probably flawed. Maybe your first query brought pages into cache that then benefited the second query for example.

Categories

Resources