"Linq to sql" Query Optimization [closed]

"Linq to sql" Query Optimization [closed] - c#

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I am Working with Linq queries of select at the time of debugging I found some Unusual thing.
This is the Linq query for selecting particular columns from table
var result = from p in Context.Accounts
join a in _Context.People on p.PersonId equals a.PersonId
join b in _Context.BusinessTypes on p.BusinessTypeId equals
b.BusinessTypeId where b.Name == Operator
&& p.AccountId == AccId && a.PersonId == Pid && p.IsDelete == false
select new
{
AccountId = p.AccountId,
PersonId = p.PersonId,
AccountName = p.AccountName,
Active = p.Active,
};
This is the Linq To SQL Conversion which is of same query
SELECT
[Filter1].[AccountId] AS [AccountId],
[Filter1].[PersonId1] AS [PersonId],
[Filter1].[FirstName] AS [FirstName],
[Filter1].[LastName] AS [LastName],
[Filter1].[AccountName] AS [AccountName],
[Filter1].[Active1] AS [Active]
FROM (SELECT [Extent1].[AccountId] AS [AccountId], [Extent1].[BusinessTypeId] AS [BusinessTypeId],
[Extent1].[PersonId] AS [PersonId1], [Extent1].[Active] AS [Active1], [Extent1].[AccountName] AS [AccountName],
[Extent2].[PersonId] AS [PersonId2], [Extent2].[FirstName] AS [FirstName], [Extent2].[LastName] AS [LastName]
FROM [ysmgr].[Account] AS [Extent1]
INNER JOIN [ysmgr].[Person] AS [Extent2] ON [Extent1].[PersonId] = [Extent2].[PersonId]
WHERE 0 = [Extent1].[IsDelete] ) AS [Filter1]
INNER JOIN [ysmgr].[BusinessType] AS [Extent3] ON [Filter1].[BusinessTypeId] = [Extent3].[BusinessTypeId]
WHERE ([Extent3].[Name] = Operator) AND ([Filter1].[AccountId] = AccId ) AND ([Filter1].[PersonId2] = Pid)
We can see that in conversion it in second select query(i.e Extent1) it selects the all columns and after that it selects the particular colums (first select i.e. Filter1) .
Does anyone know why it happens?

I consider choosing a tool as signing an MoU: I do my job, you do your job.
Your job is to write correct application code. Entity Framework's job (or one of it) is to convert LINQ into correct SQL. With each version of EF, SQL generation has improved and this is an ongoing process. This means that any tweaks in your code may prove unnecessary, or even counter-productive, with newer versions of EF. Also, a minute modification in a LINQ query that produces the SQL you like, may produce completely different SQL.
That said, it's always sensible to keep an eye on the generated SQL, as you do. If for some reason EF fails badly in producing efficient SQL, you should know.
And then there are known cases where choosing the right LINQ constructs makes a difference. To list a few:
Forcing inner joins. Navigation properties or Includes can produce outer joins. An explicit join in LINQ will always be an inner join in SQL (and this won't change with upgrades).
Avoiding constructs that are notoriously bad. This depends on the query provider. For instance, with Sql Server, performance deteriorates rapidly when Contains is called with "many" items. Also, using set operators like Except, Intersect, Any, or All can produce horrible, non-scalable SQL queries. But later versions may be better at this.
Avoiding forced execution. Or: defer execution as long as possible. Take these two statements:
var name = context.Companies.Single(c => c.CompanyId == id).Name;
var name = context.Companies.Where(c => c.CompanyId == id)
.Select(c => c.Name).Single();
The first query is shorter, better readable, so let's go for it! Or...? Single is one of the LINQ statements that forces query execution. Here it draws the whole record into memory, and then only Name is actually used. This can be very inefficient with large records. The second query only fetches Name from the database.
As for your query, I wouldn't worry about the generated SQL. It's longer and clunkier than what a human being would produce, but it's not too bad. And, as said in the comments, the database engine's query optimizer will probably be able to turn it into an efficient query plan. You may even consider using navigation properties in stead of explicit joins.
Also, as said, your first and main concern is correctness. Tackle performance when it becomes an issue. So far, with EF against Sql Server I've never found incorrect results from LINQ statements and I think that's a tremendous achievement from the EF team.

Related

Using both LINQ query syntax and LINQ method syntax [duplicate]

I read this article:
Logical Processing Order of the SELECT statement
in end of article has been write ON and JOIN clause consider before WHERE.
Consider we have a master table that has 10 million records and a detail table (that has reference to master table(FK)) with 50 million record. We have a query that want just 100 record of detail table according a PK in master table.
In this situation ON and JOIN execute before WHERE?I mean that do we have 500 million record after JOIN and then WHERE apply to it?or first WHERE apply and then JOIN and ON Consider? If second answer is true do it has incoherence with top article?
thanks

In the case of an INNER JOIN or a table on the left in a LEFT JOIN, in many cases, the optimizer will find that it is better to perform any filtering first (highest selectivity) before actually performing whatever type of physical join - so there are obviously physical order of operations which are better.
To some extent you can sometimes control this (or interfere with this) with your SQL, for instance, with aggregates in subqueries.
The logical order of processing the constraints in the query can only be transformed according to known invariant transformations.
So:
SELECT *
FROM a
INNER JOIN b
ON a.id = b.id
WHERE a.something = something
AND b.something = something
is still logically equivalent to:
SELECT *
FROM a
INNER JOIN b
ON a.id = b.id
AND a.something = something
AND b.something = something
and they will generally have the same execution plan.
On the other hand:
SELECT *
FROM a
LEFT JOIN b
ON a.id = b.id
WHERE a.something = something
AND b.something = something
is NOT equivalent to:
SELECT *
FROM a
LEFT JOIN b
ON a.id = b.id
AND a.something = something
AND b.something = something
and so the optimizer isn't going to transform them into the same execution plan.
The optimizer is very smart and is able to move things around pretty successfully, including collapsing views and inline table-valued functions as well as even pushing things down through certain kinds of aggregates fairly successfully.
Typically, when you write SQL, it needs to be understandable, maintainable and correct. As far as efficiency in execution, if the optimizer is having difficulty turning the declarative SQL into an execution plan with acceptable performance, the code can sometimes be simplified or appropriate indexes or hints added or broken down into steps which should perform more quickly - all in successive orders of invasiveness.

It doesn't matter
Logical processing order is always honoured: regardless of actual processing order
INNER JOINs and WHERE conditions are effectively associative and commutative (hence the ANSI-89 "join in the where" syntax) so actual order doesn't matter
Logical order becomes important with outer joins and more complex queries: applying WHERE on an OUTER table changes the logic completely.
Again, it doesn't matter how the optimiser does it internally so long as the query semantics are maintained by following logical processing order.
And the key word here is "optimiser": it does exactly what it says

Just re-reading Paul White's excellent series on the Query Optimiser and remembered this question.
It is possible to use an undocumented command to disable specific transformation rules and get some insight into the transformations applied.
For (hopefully!) obvious reasons only try this on a development instance and remember to re-enable them and remove any suboptimal plans from the cache.
USE AdventureWorks2008;
/*Disable the rules*/
DBCC RULEOFF ('SELonJN');
DBCC RULEOFF ('BuildSpool');
SELECT P.ProductNumber,
P.ProductID,
I.Quantity
FROM Production.Product P
JOIN Production.ProductInventory I
ON I.ProductID = P.ProductID
WHERE I.ProductID < 3
OPTION (RECOMPILE)
You can see with those two rules disabled it does a cartesian join and filter after.
/*Re-enable them*/
DBCC RULEON ('SELonJN');
DBCC RULEON ('BuildSpool');
SELECT P.ProductNumber,
P.ProductID,
I.Quantity
FROM Production.Product P
JOIN Production.ProductInventory I
ON I.ProductID = P.ProductID
WHERE I.ProductID < 3
OPTION (RECOMPILE)
With them enabled the predicate is pushed right down into the index seek and so reduces the number of rows processed by the join operation.

There is no defined order. The SQL engine determines what order to perform the operations based on the execution strategy chosen by its optimizer.

I think you have misread ON as IN in the article.
However, the order it is showing in the article is correct (obviously it is msdn anyway). The ON and JOIN are executed before WHERE naturally because WHERE has to be applied as a filter on the temporary resultset obtained due to JOINS
The article just says it is logical order of execution and also at end of the paragraph it adds this line too ;)
"Note that the actual physical execution of the statement is determined by the query processor and the order may vary from this list."

OleDB JOIN Syntax Not Correct

I recently asked a question on StackOverflow (MySQL Returns All Rows When field = 0) regarding a query statement not working in MySQL. I now have a very similar problem, this time using OleDB where I am trying to use a join to include fields that have 0 as an entry, but not select every field in the table as a result.
The new look MySQL query posted in the above question as the accepted answer works without a hitch. However the OleDB counterpart I have written to do almost the same does not. It's a bit messy as the tables are not named very well (I didn't create this database) and I'm getting a simple syntax error;
myQuery.CommandText = "SELECT s.scm_num, s.scm_name, c.cr3_text, q.qsp_value, s.scm_bprefix, s.scm_nxbnum FROM qspreset q INNER JOIN sdccomp s LEFT OUTER JOIN cntref3 c ON s.scm_cntref3=c.cr3_id AND q.qsp_relat=s.scm_invtype AND q.qsp_what=13";
I'm querying another table here as well as the two involved in the LEFT OUTER JOIN and I believe that is where I am making the mistake.

Join conditions need to be with the join
myQuery.CommandText =
"SELECT s.scm_num, s.scm_name, c.cr3_text, q.qsp_value, s.scm_bprefix, s.scm_nxbnum
FROM qspreset q
INNER JOIN sdccomp s
on q.qsp_relat = s.scm_invtype AND q.qsp_what = 13
LEFT OUTER JOIN cntref3 c
ON s.scm_cntref3 = c.cr3_id";
q.qsp_what = 13 can be moved to a where
I happen to like this style
In the case of MSSQL T-SQL and some queries with a lot of joins I have gotten more efficient query plan by moving a where condition up into a join. The filter happened early rather that last.
If you don't believe you can put a hard value in a join see SQLfiddle

Why does the Entity Framework generate nested SQL queries?

Why does the Entity Framework generate nested SQL queries?
I have this code
var db = new Context();
var result = db.Network.Where(x => x.ServerID == serverId)
.OrderBy(x=> x.StartTime)
.Take(limit);
Which generates this! (Note the double select statement)
SELECT
`Project1`.`Id`,
`Project1`.`ServerID`,
`Project1`.`EventId`,
`Project1`.`StartTime`
FROM (SELECT
`Extent1`.`Id`,
`Extent1`.`ServerID`,
`Extent1`.`EventId`,
`Extent1`.`StartTime`
FROM `Networkes` AS `Extent1`
WHERE `Extent1`.`ServerID` = #p__linq__0) AS `Project1`
ORDER BY
`Project1`.`StartTime` DESC LIMIT 5
What should I change so that it results in one select statement? I'm using MySQL and Entity Framework with Code First.
Update
I have the same result regardless of the type of the parameter passed to the OrderBy() method.
Update 2: Timed
Total Time (hh:mm:ss.ms) 05:34:13.000
Average Time (hh:mm:ss.ms) 25:42.000
Max Time (hh:mm:ss.ms) 51:54.000
Count 13
First Seen Nov 6, 12 19:48:19
Last Seen Nov 6, 12 20:40:22
Raw query:
SELECT `Project?`.`Id`, `Project?`.`ServerID`, `Project?`.`EventId`, `Project?`.`StartTime` FROM (SELECT `Extent?`.`Id`, `Extent?`.`ServerID`, `Extent?`.`EventId`, `Extent?`.`StartTime`, FROM `Network` AS `Extent?` WHERE `Extent?`.`ServerID` = ?) AS `Project?` ORDER BY `Project?`.`Starttime` DESC LIMIT ?
I used a program to take snapshots from the current process in MySQL.
Other queries were executed at the same time, but when I change it to just one SELECT statement, it NEVER goes over one second. Maybe I have something else that's going on; I'm asking 'cause I'm not so into DBs...
Update 3: The explain statement
The Entity Framework generated
'1', 'PRIMARY', '<derived2>', 'ALL', NULL, NULL, NULL, NULL, '46', 'Using filesort'
'2', 'DERIVED', 'Extent?', 'ref', 'serveridneventid,serverid', 'serveridneventid', '109', '', '45', 'Using where'
One liner
'1', 'SIMPLE', 'network', 'ref', 'serveridneventid,serverid', 'serveridneventid', '109', 'const', '45', 'Using where; Using filesort'
This is from my QA environment, so the timing I pasted above is not related to the rowcount explain statements. I think that there are about 500,000 records that match one server ID.
Solution
I switched from MySQL to SQL Server. I don't want to end up completely rewriting the application layer.

It's the easiest way to build the query logically from the expression tree. Usually the performance will not be an issue. If you are having performance issues you can try something like this to get the entities back:
var results = db.ExecuteStoreQuery<Network>(
"SELECT Id, ServerID, EventId, StartTime FROM Network WHERE ServerID = #ID",
serverId);
results = results.OrderBy(x=> x.StartTime).Take(limit);

My initial impression was that doing it this way would actually be more efficient, although in testing against a MSSQL server, I got <1 second responses regardless.
With a single select statement, it sorts all the records (Order By), and then filters them to the set you want to see (Where), and then takes the top 5 (Limit 5 or, for me, Top 5). On a large table, the sort takes a significant portion of the time. With a nested statement, it first filters the records down to a subset, and only then does the expensive sort operation on it.
Edit: I did test this, but I realized I had an error in my test which invalidated it. Test results removed.

Why does Entity Framework produce a nested query? The simple answer is because Entity Framework breaks your query expression down into an expression tree and then uses that expression tree to build your query. A tree naturally generates nested query expressions (i.e. a child node generates a query and a parent node generates a query on that query).
Why doesn't Entity Framework simplify the query down and write it as you would? The simple answer is because there is a limited amount of work that can go into the query generation engine, and while it's better now than it was in earlier versions it's not perfect and probably never will be.
All that said there should be no significant speed difference between the query you would write by hand and the query EF generated in this case. The database is clever enough to generate an execution plan that applies the WHERE clause first in either case.

If you want to get the EF to generate the query without the subselect, use a constant within the query, not a variable.
I have previously created my own .Where and all other LINQ methods that first traverse the expression tree and convert all variables, method calls etc. into Expression.Constant. It was done just because of this issue in Entity Framework...

I just stumbled upon this post because I suffer from the same problem. I already spend days tracking this down and it it is just a poor query generation in mysql.
I already filed a bug at mysql.com http://bugs.mysql.com/bug.php?id=75272
To summarize the problem:
This simple query
context.products
.Include(x => x.category)
.Take(10)
.ToList();
gets translated into
SELECT
`Limit1`.`C1`,
`Limit1`.`id`,
`Limit1`.`name`,
`Limit1`.`category_id`,
`Limit1`.`id1`,
`Limit1`.`name1`
FROM (SELECT
`Extent1`.`id`,
`Extent1`.`name`,
`Extent1`.`category_id`,
`Extent2`.`id` AS `id1`,
`Extent2`.`name` AS `name1`,
1 AS `C1`
FROM `products` AS `Extent1` INNER JOIN `categories` AS `Extent2` ON `Extent1`.`category_id` = `Extent2`.`id` LIMIT 10) AS `Limit1`
and performs pretty well. Anyway, the outer query is pretty much useless. Now If I add an OrderBy
context.products
.Include(x => x.category)
.OrderBy(x => x.id)
.Take(10)
.ToList();
the query changes to
SELECT
`Project1`.`C1`,
`Project1`.`id`,
`Project1`.`name`,
`Project1`.`category_id`,
`Project1`.`id1`,
`Project1`.`name1`
FROM (SELECT
`Extent1`.`id`,
`Extent1`.`name`,
`Extent1`.`category_id`,
`Extent2`.`id` AS `id1`,
`Extent2`.`name` AS `name1`,
1 AS `C1`
FROM `products` AS `Extent1` INNER JOIN `categories` AS `Extent2` ON `Extent1`.`category_id` = `Extent2`.`id`) AS `Project1`
ORDER BY
`Project1`.`id` ASC LIMIT 10
Which is bad because the order by is in the outer query. Theat means MySQL has to pull every record in order to perform an orderby which results in using filesort
I verified that SQL Server (Comapact at least) does not generate nested queries for the same code
SELECT TOP (10)
[Extent1].[id] AS [id],
[Extent1].[name] AS [name],
[Extent1].[category_id] AS [category_id],
[Extent2].[id] AS [id1],
[Extent2].[name] AS [name1],
FROM [products] AS [Extent1]
LEFT OUTER JOIN [categories] AS [Extent2] ON [Extent1].[category_id] = [Extent2].[id]
ORDER BY [Extent1].[id] ASC

Actually the queries generated by Entity Framework are few ugly, less than LINQ 2 SQL but still ugly.
However, very probably you database engine will make the desired execution plan, and the query will run smoothly.

OrderBy a Many To Many relationship with Entity Sql

I'm trying to better utilize the resources of the Entity Sql in the following scenario: I have a table Book which has a Many-To-Many relationship with the Author table. Each book may have from 0 to N authors. I would like to sort the books by the first author name, ie the first record found in this relationship (or null when no authors are linked to a book).
With T-SQL it can be done without difficulty:
SELECT
b.*
FROM
Book AS b
JOIN BookAuthor AS ba ON b.BookId = ba.BookId
JOIN Author AS a ON ba.AuthorId = a.AuthorId
ORDER BY
a.AuthorName;
But I cannot think of how to adapt my code bellow to achieve it. Indeed I don't know how to write something equivalent directly with Entity Sql too.
Entities e = new Entities();
var books = e.Books;
var query = books.Include("Authors");
if (sorting == null)
query = query.OrderBy("it.Title asc");
else
query = query.OrderBy("it.Authors.Name asc"); // This isn't it.
return query.Skip(paging.Skip).Take(paging.Take).ToList();
Could someone explain me how to modify my code to generate the Entity Sql for the desired result? Or even explain me how to write by hand a query using CreateQuery<Book>() to achieve it?
EDIT
Just to elucidate, I'll be working with a very large collection of books (around 100k). Sorting them in memory would be very impactful on the performance. I wish the answers would focus on how to generate the desired ordering using Entity Sql, so the orderby will happens on the database.

The OrderBy method expects you to give it a lambda expression (well, actually a Func delegate, but most people would use lambdas to make them) that can be run to select the field to sort by. Also, OrderBy always orders ascending; if you want descending order there is an OrderByDescending method.
var query = books
.Include("Authors")
.OrderBy(book => book.Authors.Any()
? book.Authors.FirstOrDefault().Name
: string.Empty);
This is basically telling the OrderBy method: "for each book in the sequence, if there are any authors, select the first one's name as my sort key; otherwise, select the empty string. Then return me the books sorted by the sort key."
You could put anything in place of the string.Empty, including for example book.Title or any other property of the book to use in place of the last name for sorting.
EDIT from comments:
As long as the sorting behavior you ask for isn't too complex, the Entity Framework's query provider can usually figure out how to turn it into SQL. It will try really, really hard to do that, and if it can't you'll get a query error. The only time the sorting would be done in client-side objects is if you forced the query to run (e.g. .AsEnumerable()) before the OrderBy was called.
In this case, the EF outputs a select statement that includes the following calculated field:
CASE WHEN ( EXISTS (SELECT
1 AS [C1]
FROM [dbo].[BookAuthor] AS [Extent4]
WHERE [Extent1].[Id] = [Extent4].[Books_Id]
)) THEN [Limit1].[Name] ELSE #p__linq__0 END AS [C1],
Then orders by that.
#p__linq__0 is a parameter, passed in as string.Empty, so you can see it converted the lambda expression into SQL pretty directly. Extent and Limit are just aliases used in the generated SQL for the joined tables etc. Extent1 is [Books] and Limit1 is:
SELECT TOP (1) -- Field list goes here.
FROM [dbo].[BookAuthor] AS [Extent2]
INNER JOIN [dbo].[Authors] AS [Extent3] ON [Extent3].[Id] = [Extent2].[Authors_Id]
WHERE [Extent1].[Id] = [Extent2].[Books_Id]

If you don't care where the sorting is happening (i.e. SQL vs In Code), you can retrieve your result set, and sort it using your own sorting code after the query results have been returned. In my experience, getting specialized sorting like this to work with Entity Framework can be very difficult, frustrating and time consuming.

Choosing better query out of 2 queries, both returning same results

I am having two different sql queries one written by me and one automatically generated by C# when used with linq, both are giving same results.
I am not sure which one to choose, Iam looking for
Whats the best way to choose one query out of many, when all returns same result (most optimized query).
Out of my queries (below written), which one should i choose.
Hand Written
select * from People P
inner join SubscriptionItemXes S
on
P.Id=S.Person_Id
inner join FoodTagXFoods T1
on T1.FoodTagX_Id = S.Tag2
inner join FoodTagXFoods T2
on T2.FoodTagX_Id = S.Tag1
inner join Foods F
on
F.Id= T1.Food_Id and F.Id= T2.Food_Id
where p.id='1'
Automatically Generated by LINQ
SELECT
[Distinct1].[Id] AS [Id],
[Distinct1].[Item] AS [Item]
FROM ( SELECT DISTINCT
[Extent2].[Id] AS [Id],
[Extent2].[Item] AS [Item]
FROM [dbo].[People] AS [Extent1]
CROSS JOIN [dbo].[Foods] AS [Extent2]
INNER JOIN [dbo].[FoodTagXFoods] AS [Extent3]
ON [Extent2].[Id] = [Extent3].[Food_Id]
INNER JOIN [dbo].[SubscriptionItemXes] AS [Extent4]
ON [Extent1].[Id] = [Extent4].[Person_Id]
WHERE (N'rusi' = [Extent1].[Name]) AND ( EXISTS (SELECT
1 AS [C1]
FROM [dbo].[FoodTagXFoods] AS [Extent5]
WHERE ([Extent2].[Id] = [Extent5].[Food_Id])
AND ([Extent5].[FoodTagX_Id] = [Extent4].[Tag1])
)) AND ( EXISTS (SELECT
1 AS [C1]
FROM [dbo].[FoodTagXFoods] AS [Extent6]
WHERE ([Extent2].[Id] = [Extent6].[Food_Id])
AND ([Extent6].[FoodTagX_Id] = [Extent4].[Tag2])
))
) AS [Distinct1]
Execution Plan Results
Hand Written: Query Cost (relative to batch):33%
Linq Generated: Query Cost (relative to batch):67%

I have found that two different queries, one hand-written and one generated by Linq might look wildly different but, actually, when you analyse the query plan in SSMS, you find that actually they are almost identical.
You need to actually run these queries in SSMS with Display Actual Execution Plan switched on, and analyse the different plans. It's the only way to correctly analyse the two and find out which is better.
In general, Linq is actually very good at generating efficient queries; even if the actual SQL itself is pretty ugly (in some cases, it's the kind of SQL that a human would write if they had the time!). If course, that said, it can also generate some pigs!
Additionally, asking SO to help with performance of a query over so many tables is fraught with problems for us, since it will be governed so much by your indexes :)

But they aren't quite returning the same thing... The first query grabs everyting (SELECT *) while LINQ is going to extract what you really want (id and item). Trivial you may say but streaming back lots of data that's never used is a good waste of bandwidth and will make your application appear sluggish. Additionally, the LINQ query seems to be doing a lot more which may or may not be the correct solution especially as data is populated into FoodTagXFoods
As for which performs better, I couldn't tell you without something like the actual query plans and/or results of statistics io from both queries. My money is on hand-written but maybe because I like my hands.

Examining SQL Server Query Execution Plan is the best way to choose the better query.
Hope below tutorials help.
SQL Tuning Tutorial - Understanding a Database Execution Plan (1)
Examining Query Execution Plans
SQL Server Query Execution Plan Analysis
SQL SERVER – Actual Execution Plan vs. Estimated Execution Plan

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.