How to write this SQL query in Entity Framework? - c#

I have this query that I want translated pretty much 1:1 from Entity Framework to SQL:
SELECT GroupId, ItemId, count(*) as total
FROM [TESTDB].[dbo].[TestTable]
WHERE GroupId = '64025'
GROUP BY GroupId, ItemId
ORDER BY GroupId, total DESC
This SQL query should sort based on the number occurrence of the same ItemId (for that group).
I have this now:
from x in dataContext.TestTable.AsNoTracking()
where x.GroupId = 64025
group x by new {x.GroupId, x.ItemId}
into g
orderby g.Key.GroupId, g.Count() descending
select new {g.Key.GroupId, g.Key.ItemId, Count = g.Count()};
But this generates the following SQL code:
SELECT
[GroupBy1].[K1] AS [GroupId],
[GroupBy1].[K2] AS [ItemId],
[GroupBy1].[A2] AS [C1]
FROM ( SELECT
[Extent1].[GroupId] AS [K1],
[Extent1].[ItemId] AS [K2],
COUNT(1) AS [A1],
COUNT(1) AS [A2]
FROM [dbo].[TestTable] AS [Extent1]
WHERE 64025 = [Extent1].[GroupId]
GROUP BY [Extent1].[GroupId], [Extent1].[ItemId]
) AS [GroupBy1]
ORDER BY [GroupBy1].[K1] ASC, [GroupBy1].[A1] DESC
This also works but is a factor 2 slower than the SQL I created.
I've been fiddling around with the linq code for a while but I haven't managed to create something similar to my query.
Execution plan (only the last two items, the first two are identical):
FIRST: |--Stream Aggregate(GROUP BY:([Extent1].[ItemId]) DEFINE:([Expr1006]=Count(*), [Extent1].[GroupId]=ANY([TESTDB].[dbo].[TestTable].[GroupId] as [Extent1].[GroupId])))
|--Index Seek(OBJECT:([TESTDB].[dbo].[TestTable].[IX_Group]), SEEK:([TESTDB].[dbo].[TestTable].[GroupId]=(64034)) ORDERED FORWARD)
SECOND: |--Stream Aggregate(GROUP BY:([TESTDB].[dbo].[TestTable].[ItemId]) DEFINE:([Expr1007]=Count(*), [TESTDB].[dbo].[TestTable].[GroupId]=ANY([TESTDB].[dbo].[TestTable].[GroupId])))
|--Index Seek(OBJECT:([TESTDB].[dbo].[TestTable].[IX_Group] AS [Extent1]), SEEK:([Extent1].[GroupId]=(64034)) ORDERED FORWARD)

The query that Entity Framework generates and your hand crafted query are semantically the same and will give the same plan.
The derived table definition is inlined during query optimisation so the only difference might be some extremely minor additional overhead during parsing and compilation.
The snippets of SHOWPLAN_TEXT you have posted are the same plan. The only difference is aliases. It looks as though your table definition is something like.
CREATE TABLE [dbo].[TestTable]
(
[GroupId] INT,
[ItemId] INT
)
CREATE NONCLUSTERED INDEX IX_Group ON [dbo].[TestTable] ([GroupId], [ItemId])
And you are getting a plan like this
To all intents and purposes the plans are the same. Your performance testing methodology is probably flawed. Maybe your first query brought pages into cache that then benefited the second query for example.

Related

Horrifically inefficient query generated by Entity Framework 6

Here's the query I want:
select top 10 *
from vw_BoosterTargetLog
where OrganizationId = 4125
order by Id desc
It executes subsecond.
Here's my Entity Framework (6.1.2) equivalent in C#:
return await db.vw_BoosterTargetLog
.Where(x => x.OrganizationId == organizationId)
.OrderByDescending(x => x.Id)
.Take(numberToRun)
.ToListNolockAsync();
And here's the SQL that it generates:
SELECT TOP (10)
[Project1].[OrganizationId] AS [OrganizationId],
[Project1].[BoosterTriggerId] AS [BoosterTriggerId],
[Project1].[IsAutomatic] AS [IsAutomatic],
[Project1].[C1] AS [C1],
[Project1].[CustomerUserId] AS [CustomerUserId],
[Project1].[SourceUrl] AS [SourceUrl],
[Project1].[TargetUrl] AS [TargetUrl],
[Project1].[ShowedOn] AS [ShowedOn],
[Project1].[ClickedOn] AS [ClickedOn],
[Project1].[BoosterTargetId] AS [BoosterTargetId],
[Project1].[TriggerEventGroup] AS [TriggerEventGroup],
[Project1].[TriggerIgnoreIdentifiedUsers] AS [TriggerIgnoreIdentifiedUsers],
[Project1].[TargetTitle] AS [TargetTitle],
[Project1].[BoosterTargetVersionId] AS [BoosterTargetVersionId],
[Project1].[Version] AS [Version],
[Project1].[CookieId] AS [CookieId],
[Project1].[CoalescedId] AS [CoalescedId],
[Project1].[OrganizationName] AS [OrganizationName],
[Project1].[ShowedOnDate] AS [ShowedOnDate],
[Project1].[SampleGroupSectionName] AS [SampleGroupSectionName],
[Project1].[Selector] AS [Selector],
[Project1].[SelectorStep] AS [SelectorStep]
FROM ( SELECT
[Extent1].[OrganizationId] AS [OrganizationId],
[Extent1].[OrganizationName] AS [OrganizationName],
[Extent1].[BoosterTriggerId] AS [BoosterTriggerId],
[Extent1].[IsAutomatic] AS [IsAutomatic],
[Extent1].[SampleGroupSectionName] AS [SampleGroupSectionName],
[Extent1].[Selector] AS [Selector],
[Extent1].[SelectorStep] AS [SelectorStep],
[Extent1].[BoosterTargetId] AS [BoosterTargetId],
[Extent1].[CookieId] AS [CookieId],
[Extent1].[CustomerUserId] AS [CustomerUserId],
[Extent1].[CoalescedId] AS [CoalescedId],
[Extent1].[SourceUrl] AS [SourceUrl],
[Extent1].[TriggerEventGroup] AS [TriggerEventGroup],
[Extent1].[TriggerIgnoreIdentifiedUsers] AS [TriggerIgnoreIdentifiedUsers],
[Extent1].[TargetTitle] AS [TargetTitle],
[Extent1].[TargetUrl] AS [TargetUrl],
[Extent1].[ShowedOn] AS [ShowedOn],
[Extent1].[ShowedOnDate] AS [ShowedOnDate],
[Extent1].[ClickedOn] AS [ClickedOn],
[Extent1].[BoosterTargetVersionId] AS [BoosterTargetVersionId],
[Extent1].[Version] AS [Version],
CAST( [Extent1].[Id] AS int) AS [C1]
FROM (SELECT
[vw_BoosterTargetLog].[OrganizationId] AS [OrganizationId],
[vw_BoosterTargetLog].[OrganizationName] AS [OrganizationName],
[vw_BoosterTargetLog].[BoosterTriggerId] AS [BoosterTriggerId],
[vw_BoosterTargetLog].[IsAutomatic] AS [IsAutomatic],
[vw_BoosterTargetLog].[SampleGroupSectionName] AS [SampleGroupSectionName],
[vw_BoosterTargetLog].[Selector] AS [Selector],
[vw_BoosterTargetLog].[SelectorStep] AS [SelectorStep],
[vw_BoosterTargetLog].[BoosterTargetId] AS [BoosterTargetId],
[vw_BoosterTargetLog].[CookieId] AS [CookieId],
[vw_BoosterTargetLog].[CustomerUserId] AS [CustomerUserId],
[vw_BoosterTargetLog].[CoalescedId] AS [CoalescedId],
[vw_BoosterTargetLog].[Id] AS [Id],
[vw_BoosterTargetLog].[SourceUrl] AS [SourceUrl],
[vw_BoosterTargetLog].[TriggerEventGroup] AS [TriggerEventGroup],
[vw_BoosterTargetLog].[TriggerIgnoreIdentifiedUsers] AS [TriggerIgnoreIdentifiedUsers],
[vw_BoosterTargetLog].[TargetTitle] AS [TargetTitle],
[vw_BoosterTargetLog].[TargetUrl] AS [TargetUrl],
[vw_BoosterTargetLog].[ShowedOn] AS [ShowedOn],
[vw_BoosterTargetLog].[ShowedOnDate] AS [ShowedOnDate],
[vw_BoosterTargetLog].[ClickedOn] AS [ClickedOn],
[vw_BoosterTargetLog].[BoosterTargetVersionId] AS [BoosterTargetVersionId],
[vw_BoosterTargetLog].[Version] AS [Version]
FROM [dbo].[vw_BoosterTargetLog] AS [vw_BoosterTargetLog]) AS [Extent1]
WHERE [Extent1].[OrganizationId] = 4125
) AS [Project1]
ORDER BY [Project1].[C1] DESC
It's ugly as hell, of course, as all EF queries are: I'm not complaining about that. My gripe is that in my testing, best-case, it executes about 10x slower than the first, and worst-case, about 100x slower.
For a query this simple, that seems way beyond all reasonable expectation.
Obviously I can execute SQL directly, or execute a sproc, or something of that sort. And while I'm waiting for feedback, that's what I'll do. But does anyone have any other suggestions about how to speed this up? Is there any way to encourage EF to generate reasonable SQL in a situation like this?
The queries EF produces, while terrible from a readability perspective, are usually still quite good reasonable -- and I say that as someone who does almost all data access through stored procedures with hand-written queries. But in order for it to work, the model EF has of the database needs to match the actual database, or else conversions will be introduced, and when that happens it's very easy to get horrible performance drops while all the data is converted and no indexes can be used.
If we eliminate some nesting, the EF query can be simplified to
SELECT TOP (10) *
FROM (
SELECT *, CAST(Id AS INT) AS C1
FROM vw_BoosterTargetLog
WHERE OrganizationId = 4125
) _
ORDER BY C1 DESC
(This is not the actual result set because Id isn't part of the final result set in the real query, but pretend I wrote out all the columns just like EF did.)
If vw_BoosterTargetLog.Id is not actually an INT, this forces a conversion of all rows before the ordering takes place, which is much slower. The solution is to figure out the actual type of the column (in this case, BIGINT) and update your model accordingly.

Refer to temporary table in Entity Framework query

There is a list list in memory of 50,000 Product IDs. I would like to get all these Products from the DB. Using dbContext.Products.Where(p => list.contains(p.ID)) generates a giant IN in the SQL - WHERE ID IN (2134,1324543,5675,32451,45735...), and it takes forever. This is partly because it takes time for SQL Server to parse such a large string, and also the execution plan is bad. (I know this from trying to use a temporary table instead).
So I used SQLBulkCopy to insert the IDs to a temporary table, and then ran
dbContext.Set<Product>().SqlQuery("SELECT * FROM Products WHERE ID IN (SELECT ID FROM #tmp))"
This gave good performance. However, now I need the products, with their suppliers (multiple for every product). Using a custom SQL command there is no way to get back a complex object that I know of. So how can I get the products with their suppliers, using the temporary table?
(If I can somehow refer to the temporary table in LINQ, then it would be OK - I could just do dbContext.Products.Where(p => dbContext.TempTable.Any(t => t.ID==p.ID)). If I could refer to it in a UDF that would also be good - but you can't. I cannot use a real table, since concurrent users would leave it in an inconsistent state.)
Thanks
I was curious to explore the sql generated using Join syntax rather than Contains. Here is the code for my test:
IQueryable<Product> queryable = Uow.ProductRepository.All;
List<int> inMemKeys = new int[] { 2134, 1324543, 5675, 32451, 45735 }.ToList();
string sql1 = queryable.Where(p => inMemKeys.Contains(p.ID)).ToString();
string sql2 = queryable.Join(inMemKeys, t => t.ID, pk => pk, (t, pk) => t).ToString();
This is the sql generated using Contains (sql1)
SELECT
[extent1].[id] AS [id],...etc
FROM [dbo].[products] AS [extent1]
WHERE ([extent1].[id] IN (2134, 1324543, 5675, 32451, 45735))
This is the sql generated using Join:
SELECT
[extent1].[id] AS [id],...etc
FROM [dbo].[products] AS [extent1]
INNER JOIN (SELECT
[unionall3].[c1] AS [c1]
FROM (SELECT
[unionall2].[c1] AS [c1]
FROM (SELECT
[unionall1].[c1] AS [c1]
FROM (SELECT
2134 AS [c1]
FROM (SELECT
1 AS x) AS [singlerowtable1] UNION ALL SELECT
1324543 AS [c1]
FROM (SELECT
1 AS x) AS [singlerowtable2]) AS [unionall1] UNION ALL SELECT
5675 AS [c1]
FROM (SELECT
1 AS x) AS [singlerowtable3]) AS [unionall2] UNION ALL SELECT
32451 AS [c1]
FROM (SELECT
1 AS x) AS [singlerowtable4]) AS [unionall3] UNION ALL SELECT
45735 AS [c1]
FROM (SELECT
1 AS x) AS [singlerowtable5]) AS [unionall4]
ON [extent1].[id] = [unionall4].[c1]
So the sql creates a big select statement using union all to create the equivalent of your temporary table, then it joins to that table. The sql is more verbose, but it may well be efficient - I'm afraid I'm not qualified to say.
While it doesn't answer the question as set out in the heading, it does show a way to avoid the giant IN . OK.... now it's a giant UNION ALL.... anyways...I hope that this contribution is useful to some
I suggest you extend the filter table (TempTable in the code above) to store something like a UserId or SessionId as well as ProductID's:
this will give you all the performance you're after
it will work for concurrent users
If this filter table is changing a lot then consider updating it in a separate transaction (i.e. a different instance of dbContext) to avoid holding a write lock on this table for longer than necessary.

Optimal Count() operation from a DB using LINQ to SQL

DBAs have told me that when using T-SQL:
select count(id) from tableName
is faster than
select count(*) from tablenName
if id is the primary key.
Extrapolating that to LINQ-TO-SQL is the following accurate?
This LINQ-to-SQL statement:
int count = dataContext.TableName.Select(primaryKeyId => primaryKeyId).Count();
is more performant than this one:
int count = dataContext.TableName.Count();
As I understand it there's no difference between your two select count statements.
Using LINQPad we can examine the T-SQL generated by different LINQ statements.
For Linq to SQL both
TableName.Select(primaryKeyId => primaryKeyId).Count();
and
TableName.Count();
generate the same SQL
SELECT COUNT(*) AS [value] FROM [dbo].[TableName] AS [t0]
For Linq to Entites, again they both generate the same SQL, but now it's
SELECT
[GroupBy1].[A1] AS [C1]
FROM ( SELECT
COUNT(1) AS [A1]
FROM [dbo].[TableName] AS [Extent1]
) AS [GroupBy1]
I know this is an old one but watch out with sql server! Count does not count null values so the two statements may not be equivalent if your primary key field is nullable. see
create table #a(col int null)
insert into #a values (null)
select COUNT(*)
from #a;
select COUNT(col)
from #a;

LINQ generating SQL with duplicate nested selects

I'm very new to the .NET Entity Framework, and I think it's awesome, but somehow I'm getting this strange issue (sorry for the spanish but my program is in that language, anyway it's not a big deal, just the column or property names): I'm doing a normal LINQ To Entities query to get a list of UltimaConsulta, like this:
var query = from uc in bd.UltimasConsultas
select uc;
UltimasConsultas is a view, btw. The thing is that LINQ is generating this SQL for the query:
SELECT
[Extent1].[IdPaciente] AS [IdPaciente],
[Extent1].[Nombre] AS [Nombre],
[Extent1].[PrimerApellido] AS [PrimerApellido],
[Extent1].[SegundoApellido] AS [SegundoApellido],
[Extent1].[Fecha] AS [Fecha]
FROM (SELECT
[UltimasConsultas].[IdPaciente] AS [IdPaciente],
[UltimasConsultas].[Nombre] AS [Nombre],
[UltimasConsultas].[PrimerApellido] AS [PrimerApellido],
[UltimasConsultas].[SegundoApellido] AS [SegundoApellido],
[UltimasConsultas].[Fecha] AS [Fecha]
FROM [dbo].[UltimasConsultas] AS [UltimasConsultas]) AS [Extent1]
Why is LINQ generating a nested Select? I thought from videos and examples that it generates normal SQL selects for this kind of queries. Do I have to configure something (the entity model was generating from a wizard, so it's default configuration)? Thanks in advance for your answers.
To be clear, LINQ to Entities does not generate the SQL. Instead, it generates an ADO.NET canonical command tree, and the ADO.NET provider for your database, presumably SQL Server in this case, generates the SQL.
So why does it generate this derived table (I think "derived table" is the more correct term for the SQL feature in use here)? Because the code which generates the SQL has to generate SQL for a wide variety of LINQ queries, most of which are not nearly as trivial as the one you show. These queries will often be selecting data for multiple types (many of which might be anonymous, rather than named types), and in order to keep the SQL generation relatively sane, they are grouped into extents for each type.
Another question: Why should you care? It's easy to demonstrate that the use of the derived table in this statement is "free" from a performance point of view.
I selected a table at random from a populated database, and run the following query:
SELECT [AddressId]
,[Address1]
,[Address2]
,[City]
,[State]
,[ZIP]
,[ZIPExtension]
FROM [VertexRM].[dbo].[Address]
Let's look at the cost:
<StmtSimple StatementCompId="1" StatementEstRows="7900" StatementId="1" StatementOptmLevel="TRIVIAL" StatementSubTreeCost="0.123824" StatementText="/****** Script for SelectTopNRows command from SSMS ******/
SELECT [AddressId]
,[Address1]
,[Address2]
,[City]
,[State]
,[ZIP]
,[ZIPExtension]
FROM [VertexRM].[dbo].[Address]" StatementType="SELECT">
<StatementSetOptions ANSI_NULLS="false" ANSI_PADDING="false" ANSI_WARNINGS="false" ARITHABORT="true" CONCAT_NULL_YIELDS_NULL="false" NUMERIC_ROUNDABORT="false" QUOTED_IDENTIFIER="false" />
<QueryPlan CachedPlanSize="9" CompileTime="0" CompileCPU="0" CompileMemory="64">
<RelOp AvgRowSize="246" EstimateCPU="0.008847" EstimateIO="0.114977" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="7900" LogicalOp="Clustered Index Scan" NodeId="0" Parallel="false" PhysicalOp="Clustered Index Scan" EstimatedTotalSubtreeCost="0.123824">
Now let's compare that to the query with the derived table:
SELECT
[Extent1].[AddressId]
,[Extent1].[Address1]
,[Extent1].[Address2]
,[Extent1].[City]
,[Extent1].[State]
,[Extent1].[ZIP]
,[Extent1].[ZIPExtension]
FROM (SELECT [AddressId]
,[Address1]
,[Address2]
,[City]
,[State]
,[ZIP]
,[ZIPExtension]
FROM[VertexRM].[dbo].[Address]) AS [Extent1]
And the cost:
<StmtSimple StatementCompId="1" StatementEstRows="7900" StatementId="1" StatementOptmLevel="TRIVIAL" StatementSubTreeCost="0.123824" StatementText="/****** Script for SelectTopNRows command from SSMS ******/
SELECT
[Extent1].[AddressId]
,[Extent1].[Address1]
,[Extent1].[Address2]
,[Extent1].[City]
,[Extent1].[State]
,[Extent1].[ZIP]
,[Extent1].[ZIPExtension]
FROM (SELECT [AddressId]
,[Address1]
,[Address2]
,[City]
,[State]
,[ZIP]
,[ZIPExtension]
FROM[VertexRM].[dbo].[Address]) AS [Extent1]" StatementType="SELECT">
<StatementSetOptions ANSI_NULLS="false" ANSI_PADDING="false" ANSI_WARNINGS="false" ARITHABORT="true" CONCAT_NULL_YIELDS_NULL="false" NUMERIC_ROUNDABORT="false" QUOTED_IDENTIFIER="false" />
<QueryPlan CachedPlanSize="9" CompileTime="0" CompileCPU="0" CompileMemory="64">
<RelOp AvgRowSize="246" EstimateCPU="0.008847" EstimateIO="0.114977" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="7900" LogicalOp="Clustered Index Scan" NodeId="0" Parallel="false" PhysicalOp="Clustered Index Scan" EstimatedTotalSubtreeCost="0.123824">
In both cases, SQL Server simply scans the clustered index. Not surprisingly, the cost is almost precisely the same.
Let's take a look at a slightly more complicated query. I fired up LINQPad, and entered the following query against the same table, plus one related table:
from a in Addresses
select new
{
Id = a.Id,
Address1 = a.Address1,
Address2 = a.Address2,
City = a.City,
State = a.State,
ZIP = a.ZIP,
ZIPExtension = a.ZIPExtension,
PersonCount = a.EntityAddresses.Count()
}
This generates the following SQL:
SELECT
1 AS [C1],
[Project1].[AddressId] AS [AddressId],
[Project1].[Address1] AS [Address1],
[Project1].[Address2] AS [Address2],
[Project1].[City] AS [City],
[Project1].[State] AS [State],
[Project1].[ZIP] AS [ZIP],
[Project1].[ZIPExtension] AS [ZIPExtension],
[Project1].[C1] AS [C2]
FROM ( SELECT
[Extent1].[AddressId] AS [AddressId],
[Extent1].[Address1] AS [Address1],
[Extent1].[Address2] AS [Address2],
[Extent1].[City] AS [City],
[Extent1].[State] AS [State],
[Extent1].[ZIP] AS [ZIP],
[Extent1].[ZIPExtension] AS [ZIPExtension],
(SELECT
COUNT(cast(1 as bit)) AS [A1]
FROM [dbo].[EntityAddress] AS [Extent2]
WHERE [Extent1].[AddressId] = [Extent2].[AddressId]) AS [C1]
FROM [dbo].[Address] AS [Extent1]
) AS [Project1]
Analyzing this, we can see that Project1 is the projection onto the anonymous type. Extent1 is the Address table/entity. And Extent2 is the table for the association. Now there is no derived table for Address, but there is one for the projection.
I don't know if you have ever written a SQL generation system, but it isn't easy. I believe that the general problem of proving that a LINQ to Entities query and a SQL query are equivalent is NP-hard, although certain specific cases are obviously much easier. SQL is intentionally Turing-incomplete, because its designers wanted all SQL queries to execute in bounded time. LINQ, not so.
In short, this is a very difficult problem to solve, and the combination of the Entity Framework and its providers do occasionally sacrifice some readability in favor of consistency over a wide range of queries. But it shouldn't be a performance issue.
Basically it's defining what Extent1 consists of and what variables will relate to each entry. Then its mapping the actual database table to Extent1 so that it can return all entries for that table.
This is what your query is asking for. Its just that LINQ can't add in a wildcard character as you would if you'd done it by hand.

Why did the following linq to sql query generate a subquery?

I did the following query:
var list = from book in books
where book.price > 50
select book;
list = list.Take(50);
I would expect the above to generate something like:
SELECT top 50 id, title, price, author
FROM Books
WHERE price > 50
but it generates:
SELECT
[Limit1].[C1] as [C1]
[Limit1].[id] as [Id],
[Limit1].[title] as [title],
[Limit1].[price] as [price],
[Limit1].[author]
FROM (SELECT TOP (50)
[Extent1].[id] as as [Id],
[Extent1].[title] as [title],
[Extent1].[price] as [price],
[Extent1].[author] as [author]
FROM Books as [Extent1]
WHERE [Extent1].[price] > 50
) AS [Limit1]
Why does the above linq query generate a subquery and where does the C1 come from?
Disclaimer: I've never used LINQ before...
My guess would be paging support? I guess you have some sort of Take(50, 50) method that gets 50 records, starting at record 50. Take a look at the SQL that query generates and you will probably find that it uses a similar sub query structure to allow it to return any 50 rows in a query in approximately the amount of time that it returns the first 50 rows.
In any case, the nested sub query doesn't add any performance overhead as it's automagically optimised away during compilation of the execution plan.
You could still make it cleaner like this:
var c = (from co in db.countries
where co.regionID == 5
select co).Take(50);
This will result in:
Table(country).Where(co => (co.regionID = Convert(5))).Take(50)
Equivalent to:
SELECT TOP (50) [t0].[countryID], [t0].[regionID], [t0].[countryName], [t0].[code]
FROM [dbo].[countries] AS [t0]
WHERE [t0].[regionID] = 5
EDIT: Comments, Its Not necessarily because with separate Take(), you can still use it like this:
var c = (from co in db.countries
where co.regionID == 5
select co);
var l = c.Take(50).ToList();
And the Result would be the same as before.
SELECT TOP (50) [t0].[countryID], [t0].[regionID], [t0].[countryName], [t0].[code]
FROM [dbo].[countries] AS [t0]
WHERE [t0].[regionID] = #p0
The fact that you wrote IQueryable = IQueryable.Take(50) is the tricky part here.
The subquery is generated for projection purposes, it makes more sense when you select from multiple tables into a single anonymous object, then the outer query is used to gather the results.
Try what happens with something like this:
from book in books
where price > 50
select new
{
Title = book.title,
Chapters = from chapter in book.Chapters
select chapter.Title
}
Isn't it a case of the first query returning the total number of rows while the second extracts the subset of rows based on the call to the .Take() method?
I agree with #Justin Swartsel. There was no error involved, so this is largely an academic matter.
Linq-to-SQL endeavors to generate SQL that runs efficiently (which it did in your case).
But it does not make any effort to generate conventional SQL that a human would likely create.
The Linq-to-SQL implementers likely used the builder pattern to generate the SQL.
If so, it would be easier to append a substring (or a subquery in this case) than it would be to backtrack and insert a 'TOP x' fragment into the SELECT clause.

Categories

Resources