Using both LINQ query syntax and LINQ method syntax [duplicate] - c#

I read this article:
Logical Processing Order of the SELECT statement
in end of article has been write ON and JOIN clause consider before WHERE.
Consider we have a master table that has 10 million records and a detail table (that has reference to master table(FK)) with 50 million record. We have a query that want just 100 record of detail table according a PK in master table.
In this situation ON and JOIN execute before WHERE?I mean that do we have 500 million record after JOIN and then WHERE apply to it?or first WHERE apply and then JOIN and ON Consider? If second answer is true do it has incoherence with top article?
thanks

In the case of an INNER JOIN or a table on the left in a LEFT JOIN, in many cases, the optimizer will find that it is better to perform any filtering first (highest selectivity) before actually performing whatever type of physical join - so there are obviously physical order of operations which are better.
To some extent you can sometimes control this (or interfere with this) with your SQL, for instance, with aggregates in subqueries.
The logical order of processing the constraints in the query can only be transformed according to known invariant transformations.
So:
SELECT *
FROM a
INNER JOIN b
ON a.id = b.id
WHERE a.something = something
AND b.something = something
is still logically equivalent to:
SELECT *
FROM a
INNER JOIN b
ON a.id = b.id
AND a.something = something
AND b.something = something
and they will generally have the same execution plan.
On the other hand:
SELECT *
FROM a
LEFT JOIN b
ON a.id = b.id
WHERE a.something = something
AND b.something = something
is NOT equivalent to:
SELECT *
FROM a
LEFT JOIN b
ON a.id = b.id
AND a.something = something
AND b.something = something
and so the optimizer isn't going to transform them into the same execution plan.
The optimizer is very smart and is able to move things around pretty successfully, including collapsing views and inline table-valued functions as well as even pushing things down through certain kinds of aggregates fairly successfully.
Typically, when you write SQL, it needs to be understandable, maintainable and correct. As far as efficiency in execution, if the optimizer is having difficulty turning the declarative SQL into an execution plan with acceptable performance, the code can sometimes be simplified or appropriate indexes or hints added or broken down into steps which should perform more quickly - all in successive orders of invasiveness.

It doesn't matter
Logical processing order is always honoured: regardless of actual processing order
INNER JOINs and WHERE conditions are effectively associative and commutative (hence the ANSI-89 "join in the where" syntax) so actual order doesn't matter
Logical order becomes important with outer joins and more complex queries: applying WHERE on an OUTER table changes the logic completely.
Again, it doesn't matter how the optimiser does it internally so long as the query semantics are maintained by following logical processing order.
And the key word here is "optimiser": it does exactly what it says

Just re-reading Paul White's excellent series on the Query Optimiser and remembered this question.
It is possible to use an undocumented command to disable specific transformation rules and get some insight into the transformations applied.
For (hopefully!) obvious reasons only try this on a development instance and remember to re-enable them and remove any suboptimal plans from the cache.
USE AdventureWorks2008;
/*Disable the rules*/
DBCC RULEOFF ('SELonJN');
DBCC RULEOFF ('BuildSpool');
SELECT P.ProductNumber,
P.ProductID,
I.Quantity
FROM Production.Product P
JOIN Production.ProductInventory I
ON I.ProductID = P.ProductID
WHERE I.ProductID < 3
OPTION (RECOMPILE)
You can see with those two rules disabled it does a cartesian join and filter after.
/*Re-enable them*/
DBCC RULEON ('SELonJN');
DBCC RULEON ('BuildSpool');
SELECT P.ProductNumber,
P.ProductID,
I.Quantity
FROM Production.Product P
JOIN Production.ProductInventory I
ON I.ProductID = P.ProductID
WHERE I.ProductID < 3
OPTION (RECOMPILE)
With them enabled the predicate is pushed right down into the index seek and so reduces the number of rows processed by the join operation.

There is no defined order. The SQL engine determines what order to perform the operations based on the execution strategy chosen by its optimizer.

I think you have misread ON as IN in the article.
However, the order it is showing in the article is correct (obviously it is msdn anyway). The ON and JOIN are executed before WHERE naturally because WHERE has to be applied as a filter on the temporary resultset obtained due to JOINS
The article just says it is logical order of execution and also at end of the paragraph it adds this line too ;)
"Note that the actual physical execution of the statement is determined by the query processor and the order may vary from this list."

Related

.netcore EF linq - this is a BUG? Very strange behavior

I have two table in sql. Document and User. Document have relation to User and I want to get users that I sent document recently.
I need to sort by the date document was sent and get unique (distinct) user with relation to this document
This is my linq queries
var recentClients = documentCaseRepository.Entities
.Where(docCase => docCase.AssignedByAgentId == WC.UserContext.UserId)
.OrderByDescending(userWithDate => userWithDate.LastUpdateDate)
.Take(1000) // I need this because if I comment this line then EF generate completely different sql query.
.Select(doc => new { doc.AssignedToClient.Id, doc.AssignedToClient.FirstName, doc.AssignedToClient.LastName })
.Distinct()
.Take(configuration.MaxRecentClientsResults)
.ToList();
and generated sql query is:
SELECT DISTINCT TOP(5) [t].*
FROM (
SELECT TOP(1000) [docCase.AssignedToClient].[Id]
FROM [DocumentCase] AS [docCase]
INNER JOIN [User] AS [docCase.AssignedToClient]
ON ([docCase].[AssignedToClientId] = [docCase.AssignedToClient].[Id])
WHERE [docCase].[AssignedByAgentId] = 3
ORDER BY [docCase].[LastUpdateDate] DESC
)
AS [t]
Every thing is correct for now. But if I delete this line
.Take(1000) // I need this because...
EF generated completely different query such as:
SELECT DISTINCT TOP(5)
[docCase.AssignedToClient].[Id]
FROM [DocumentCase] AS [docCase]
INNER JOIN [User] AS [docCase.AssignedToClient]
ON ([docCase].[AssignedToClientId] = [docCase.AssignedToClient].[Id])
WHERE [docCase].[AssignedByAgentId] = 3
My question is: why EF not generated orderby clause and subquery with distinct?
This is a BUG EF or I'm doing something wrong? And what I must do to generate in linq this sql query ()
SELECT DISTINCT TOP 5 [t].*
FROM ( SELECT [docCase.AssignedToClient].[Id]
FROM [DocumentCase] AS [docCase]
INNER JOIN [User] AS [docCase.AssignedToClient]
ON [docCase].[AssignedToClientId] = [docCase.AssignedToClient].[Id]
WHERE [docCase].[AssignedByAgentId] = 1
ORDER BY [docCase].[LastUpdateDate] DESC
) AS [t]
OrderBy information not always retained across other operators such as Distinct. Entity Framework does not document (to my knowledge) how exactly OrderBy is propagated.
This kind of makes sense because some operators have undefined output order. The fact that ordering is retained in many situations is a convenience for the developer.
Move the OrderBy to the end of the query (or at least past the Distinct).
The reason for the difference in queries is that Distinct messes up result order. So when you first execute OrderBy and then Distinct, you can just es well not execute OrderBy, because this order is lost anyway. So EF can just optimize it away.
Calling Take in between causes the result set to be semantically different: You first order the items, take the first 1000 items of that order and then call Distinct on them.
What you can change in your query depends mainly on the result you want to achieve. Maybe you want to first make the result set distinct then order by date and finally take the amount of items. Other options are also thinkable based on your requirements.

IQueryable Count() method issue with Inner Join created by .Include() related entities

IQueryable<EntityOne> query = entities.EntityOne
.Include(t => t.EntityRelated1)
.Include(t => t.EntityRelated2)
.AsQueryable();
The query generated in "query" variable :
SELECT
[Extent1].[Id] AS [IdEntityOne],
...
[Extent2].[Id] AS [IdEntityRelatedOne],
...
[Extent3].[Id] AS [IdEntityRelatedTwo],
...
FROM [dbo].[EntityOne] AS [Extent1]
INNER JOIN [dbo].[EntityRelatedOne] AS [Extent2]
ON [Extent1].[IdRelatedOne] = [Extent2].[Id]
INNER JOIN [dbo].[EntityRelatedTwo] AS [Extent3]
ON [Extent1].[IdRelatedTwo] = [Extent3].[Id]
After that, on C# code those are the result of counting:
var wrongCount = query.Count(); // Returns 295
var correctCount = query.AsEnumerable().Count(); // Returns 10
The 295 count is the full EntityOne set numbers of registers. (wrong)
The 10 Count is the desired count after Inner Join.
It sounds like the IQueryable.Count() is counting before executing the InnerJoin on database. I don't want to generate the IEnumerable since I hope the count to be executed on Sql Server together with the inner join.
UPDATE 1
Trying to manually execute the inner join:
IQueryable<EntityOne> query2 = entities.EntityOne.Join(entities.EntityTwo,
eone=> eone.IdRelatedOne, en => en.Id,
(x, y) => x);
The SQL code generated in "query2" is :
SELECT
[Extent1].[Id] AS [Id],
...
FROM [dbo].[EntityOne] AS [Extent1]
As you can see, the related entity is not included on Inner Join forced by linq Join statement.
Update 2
I dont know if it matters but, the IdEntityRelated1 on EntityOne is a required property, plus its not a foreign key on database, just a Integer field that stores the Id of the related entity. (Im working with POCO classes with Database First)
I have another working sources where fields but they're nullable integers instead of required. Maybe should I not try to do an Include to force Inner Join between required relationships?
You have a required association, but the expected objects are not present in the database.
But let's first see what EF does.
In the first count...
var wrongCount = query.Count();
...the Includes are ignored. There's no reason to execute them because EF has been told that the referred objects (EntityRelated1 and EntityRelated2 are mandatory, so inner joins are expected to find the related records. If they do, EF figures it may as well just count entities.EntityOne and skip the rest. The Includes are only going to make the query more expensive and they don't affect the result.
You can check that by monitoring the SQL that's executed for the count. That's not the SQL generated when you look at query only! It's probably something that simply boils down to
SELECT COUNT(*) FROM [dbo].[EntityOne]
So the first count returns a correct count of all EntityOne records in the database.
For the second count you force execution of the entire query that's stored in the query variable, the SQL statement that you show. Then you count its results in memory – and it returns 10. This means that the query with the inner joins does actually return 10 records. That, in turn, can only mean one thing: there are 285 EntityOne.IdRelatedOne values that don't point to an existing EntityRelatedOne record. But you mapped the association as required, so EF generates an inner join. An outer join would also return 295.
Include is not a LINQ method proper, is an EntityFramework extension designed to do eager loading and not much else. Includes are lost if the query shape changes:
When you call the Include method, the query path is only valid on the returned instance of the IQueryable of T. Other instances of IQueryable of T and the context itself are not affected.
Specifically this means that, for instance aggregates on top of the Included IQueryable<T> are going to loose the Include (which is exactly what you see).
See Tip 22 - How to make Include really Include, IQueryable.Include() gets ignored, Include in following query does not include really and many more.

"Linq to sql" Query Optimization [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I am Working with Linq queries of select at the time of debugging I found some Unusual thing.
This is the Linq query for selecting particular columns from table
var result = from p in Context.Accounts
join a in _Context.People on p.PersonId equals a.PersonId
join b in _Context.BusinessTypes on p.BusinessTypeId equals
b.BusinessTypeId where b.Name == Operator
&& p.AccountId == AccId && a.PersonId == Pid && p.IsDelete == false
select new
{
AccountId = p.AccountId,
PersonId = p.PersonId,
AccountName = p.AccountName,
Active = p.Active,
};
This is the Linq To SQL Conversion which is of same query
SELECT
[Filter1].[AccountId] AS [AccountId],
[Filter1].[PersonId1] AS [PersonId],
[Filter1].[FirstName] AS [FirstName],
[Filter1].[LastName] AS [LastName],
[Filter1].[AccountName] AS [AccountName],
[Filter1].[Active1] AS [Active]
FROM (SELECT [Extent1].[AccountId] AS [AccountId], [Extent1].[BusinessTypeId] AS [BusinessTypeId],
[Extent1].[PersonId] AS [PersonId1], [Extent1].[Active] AS [Active1], [Extent1].[AccountName] AS [AccountName],
[Extent2].[PersonId] AS [PersonId2], [Extent2].[FirstName] AS [FirstName], [Extent2].[LastName] AS [LastName]
FROM [ysmgr].[Account] AS [Extent1]
INNER JOIN [ysmgr].[Person] AS [Extent2] ON [Extent1].[PersonId] = [Extent2].[PersonId]
WHERE 0 = [Extent1].[IsDelete] ) AS [Filter1]
INNER JOIN [ysmgr].[BusinessType] AS [Extent3] ON [Filter1].[BusinessTypeId] = [Extent3].[BusinessTypeId]
WHERE ([Extent3].[Name] = Operator) AND ([Filter1].[AccountId] = AccId ) AND ([Filter1].[PersonId2] = Pid)
We can see that in conversion it in second select query(i.e Extent1) it selects the all columns and after that it selects the particular colums (first select i.e. Filter1) .
Does anyone know why it happens?
I consider choosing a tool as signing an MoU: I do my job, you do your job.
Your job is to write correct application code. Entity Framework's job (or one of it) is to convert LINQ into correct SQL. With each version of EF, SQL generation has improved and this is an ongoing process. This means that any tweaks in your code may prove unnecessary, or even counter-productive, with newer versions of EF. Also, a minute modification in a LINQ query that produces the SQL you like, may produce completely different SQL.
That said, it's always sensible to keep an eye on the generated SQL, as you do. If for some reason EF fails badly in producing efficient SQL, you should know.
And then there are known cases where choosing the right LINQ constructs makes a difference. To list a few:
Forcing inner joins. Navigation properties or Includes can produce outer joins. An explicit join in LINQ will always be an inner join in SQL (and this won't change with upgrades).
Avoiding constructs that are notoriously bad. This depends on the query provider. For instance, with Sql Server, performance deteriorates rapidly when Contains is called with "many" items. Also, using set operators like Except, Intersect, Any, or All can produce horrible, non-scalable SQL queries. But later versions may be better at this.
Avoiding forced execution. Or: defer execution as long as possible. Take these two statements:
var name = context.Companies.Single(c => c.CompanyId == id).Name;
var name = context.Companies.Where(c => c.CompanyId == id)
.Select(c => c.Name).Single();
The first query is shorter, better readable, so let's go for it! Or...? Single is one of the LINQ statements that forces query execution. Here it draws the whole record into memory, and then only Name is actually used. This can be very inefficient with large records. The second query only fetches Name from the database.
As for your query, I wouldn't worry about the generated SQL. It's longer and clunkier than what a human being would produce, but it's not too bad. And, as said in the comments, the database engine's query optimizer will probably be able to turn it into an efficient query plan. You may even consider using navigation properties in stead of explicit joins.
Also, as said, your first and main concern is correctness. Tackle performance when it becomes an issue. So far, with EF against Sql Server I've never found incorrect results from LINQ statements and I think that's a tremendous achievement from the EF team.

OrderBy a Many To Many relationship with Entity Sql

I'm trying to better utilize the resources of the Entity Sql in the following scenario: I have a table Book which has a Many-To-Many relationship with the Author table. Each book may have from 0 to N authors. I would like to sort the books by the first author name, ie the first record found in this relationship (or null when no authors are linked to a book).
With T-SQL it can be done without difficulty:
SELECT
b.*
FROM
Book AS b
JOIN BookAuthor AS ba ON b.BookId = ba.BookId
JOIN Author AS a ON ba.AuthorId = a.AuthorId
ORDER BY
a.AuthorName;
But I cannot think of how to adapt my code bellow to achieve it. Indeed I don't know how to write something equivalent directly with Entity Sql too.
Entities e = new Entities();
var books = e.Books;
var query = books.Include("Authors");
if (sorting == null)
query = query.OrderBy("it.Title asc");
else
query = query.OrderBy("it.Authors.Name asc"); // This isn't it.
return query.Skip(paging.Skip).Take(paging.Take).ToList();
Could someone explain me how to modify my code to generate the Entity Sql for the desired result? Or even explain me how to write by hand a query using CreateQuery<Book>() to achieve it?
EDIT
Just to elucidate, I'll be working with a very large collection of books (around 100k). Sorting them in memory would be very impactful on the performance. I wish the answers would focus on how to generate the desired ordering using Entity Sql, so the orderby will happens on the database.
The OrderBy method expects you to give it a lambda expression (well, actually a Func delegate, but most people would use lambdas to make them) that can be run to select the field to sort by. Also, OrderBy always orders ascending; if you want descending order there is an OrderByDescending method.
var query = books
.Include("Authors")
.OrderBy(book => book.Authors.Any()
? book.Authors.FirstOrDefault().Name
: string.Empty);
This is basically telling the OrderBy method: "for each book in the sequence, if there are any authors, select the first one's name as my sort key; otherwise, select the empty string. Then return me the books sorted by the sort key."
You could put anything in place of the string.Empty, including for example book.Title or any other property of the book to use in place of the last name for sorting.
EDIT from comments:
As long as the sorting behavior you ask for isn't too complex, the Entity Framework's query provider can usually figure out how to turn it into SQL. It will try really, really hard to do that, and if it can't you'll get a query error. The only time the sorting would be done in client-side objects is if you forced the query to run (e.g. .AsEnumerable()) before the OrderBy was called.
In this case, the EF outputs a select statement that includes the following calculated field:
CASE WHEN ( EXISTS (SELECT
1 AS [C1]
FROM [dbo].[BookAuthor] AS [Extent4]
WHERE [Extent1].[Id] = [Extent4].[Books_Id]
)) THEN [Limit1].[Name] ELSE #p__linq__0 END AS [C1],
Then orders by that.
#p__linq__0 is a parameter, passed in as string.Empty, so you can see it converted the lambda expression into SQL pretty directly. Extent and Limit are just aliases used in the generated SQL for the joined tables etc. Extent1 is [Books] and Limit1 is:
SELECT TOP (1) -- Field list goes here.
FROM [dbo].[BookAuthor] AS [Extent2]
INNER JOIN [dbo].[Authors] AS [Extent3] ON [Extent3].[Id] = [Extent2].[Authors_Id]
WHERE [Extent1].[Id] = [Extent2].[Books_Id]
If you don't care where the sorting is happening (i.e. SQL vs In Code), you can retrieve your result set, and sort it using your own sorting code after the query results have been returned. In my experience, getting specialized sorting like this to work with Entity Framework can be very difficult, frustrating and time consuming.

Choosing better query out of 2 queries, both returning same results

I am having two different sql queries one written by me and one automatically generated by C# when used with linq, both are giving same results.
I am not sure which one to choose, Iam looking for
Whats the best way to choose one query out of many, when all returns same result (most optimized query).
Out of my queries (below written), which one should i choose.
Hand Written
select * from People P
inner join SubscriptionItemXes S
on
P.Id=S.Person_Id
inner join FoodTagXFoods T1
on T1.FoodTagX_Id = S.Tag2
inner join FoodTagXFoods T2
on T2.FoodTagX_Id = S.Tag1
inner join Foods F
on
F.Id= T1.Food_Id and F.Id= T2.Food_Id
where p.id='1'
Automatically Generated by LINQ
SELECT
[Distinct1].[Id] AS [Id],
[Distinct1].[Item] AS [Item]
FROM ( SELECT DISTINCT
[Extent2].[Id] AS [Id],
[Extent2].[Item] AS [Item]
FROM [dbo].[People] AS [Extent1]
CROSS JOIN [dbo].[Foods] AS [Extent2]
INNER JOIN [dbo].[FoodTagXFoods] AS [Extent3]
ON [Extent2].[Id] = [Extent3].[Food_Id]
INNER JOIN [dbo].[SubscriptionItemXes] AS [Extent4]
ON [Extent1].[Id] = [Extent4].[Person_Id]
WHERE (N'rusi' = [Extent1].[Name]) AND ( EXISTS (SELECT
1 AS [C1]
FROM [dbo].[FoodTagXFoods] AS [Extent5]
WHERE ([Extent2].[Id] = [Extent5].[Food_Id])
AND ([Extent5].[FoodTagX_Id] = [Extent4].[Tag1])
)) AND ( EXISTS (SELECT
1 AS [C1]
FROM [dbo].[FoodTagXFoods] AS [Extent6]
WHERE ([Extent2].[Id] = [Extent6].[Food_Id])
AND ([Extent6].[FoodTagX_Id] = [Extent4].[Tag2])
))
) AS [Distinct1]
Execution Plan Results
Hand Written: Query Cost (relative to batch):33%
Linq Generated: Query Cost (relative to batch):67%
I have found that two different queries, one hand-written and one generated by Linq might look wildly different but, actually, when you analyse the query plan in SSMS, you find that actually they are almost identical.
You need to actually run these queries in SSMS with Display Actual Execution Plan switched on, and analyse the different plans. It's the only way to correctly analyse the two and find out which is better.
In general, Linq is actually very good at generating efficient queries; even if the actual SQL itself is pretty ugly (in some cases, it's the kind of SQL that a human would write if they had the time!). If course, that said, it can also generate some pigs!
Additionally, asking SO to help with performance of a query over so many tables is fraught with problems for us, since it will be governed so much by your indexes :)
But they aren't quite returning the same thing... The first query grabs everyting (SELECT *) while LINQ is going to extract what you really want (id and item). Trivial you may say but streaming back lots of data that's never used is a good waste of bandwidth and will make your application appear sluggish. Additionally, the LINQ query seems to be doing a lot more which may or may not be the correct solution especially as data is populated into FoodTagXFoods
As for which performs better, I couldn't tell you without something like the actual query plans and/or results of statistics io from both queries. My money is on hand-written but maybe because I like my hands.
Examining SQL Server Query Execution Plan is the best way to choose the better query.
Hope below tutorials help.
SQL Tuning Tutorial - Understanding a Database Execution Plan (1)
Examining Query Execution Plans
SQL Server Query Execution Plan Analysis
SQL SERVER – Actual Execution Plan vs. Estimated Execution Plan

Categories

Resources