Converted "Linq query to Sql query" doesn't make sense to me

Converted "Linq query to Sql query" doesn't make sense to me - c#

I converted a linq query to sql using LinqPad 4. But i am so much confused to the converted sql query. I have a job table that is related to AppliedJob. AppliedJob is related to JobOffer. JobOffer is related to Contract. Contract table has a field CompletedDate that is set to Null initially when a job contract starts. If a job completed ten the field is updated with the current date. I want to get those job list which have CompletedDate !=Null (if found in Contract table). That means a contract related to a job is not completed yet or not found in Contract table. Not found means any contract is not started with the job.
My Linq:
from j in Jobs
join jobContract in
(
from appliedJob in AppliedJobs.DefaultIfEmpty()
from offer in appliedJob.JobOffers.DefaultIfEmpty()
from contract in Contracts.DefaultIfEmpty()
select new { appliedJob, offer, contract }
).DefaultIfEmpty()
on j.JobID equals jobContract.appliedJob.JobID into jobContracts
where jobContracts.Any(jobContract => jobContract.contract.CompletedDate != null)
select j.JobTitle
My Sql query that Linqpad made:
SELECT [t0].[JobTitle]
FROM [Job] AS [t0]
WHERE EXISTS(
SELECT NULL AS [EMPTY]
FROM (
SELECT NULL AS [EMPTY]
) AS [t1]
LEFT OUTER JOIN ((
SELECT NULL AS [EMPTY]
) AS [t2]
LEFT OUTER JOIN ([AppliedJob] AS [t3]
LEFT OUTER JOIN [JobOffer] AS [t4] ON [t4].[AppliedJobID] = [t3].[AppliedJobID]
LEFT OUTER JOIN [Contract] AS [t5] ON 1=1 ) ON 1=1 ) ON 1=1
WHERE ([t5].[CompletedDate] IS NOT NULL) AND ([t0].[JobID] = [t3].[JobID])
)
My question is that why it makes so many SELECT NULL AS [EMPTY] and LEFT OUTER JOIN in the query?
Can i make a simple and understandable query from this? OR is it ok?

DefaultIfEmpty() translates to left outer join. See LEFT OUTER JOIN in LINQ
There are so many "NULL as [Empty]" because NULL != NULL in SQL. See Why does NULL = NULL evaluate to false in SQL server

It's been a while since I've touched C# and LINQ, but this is my take.
The reason for the multiple left outer joins and nulls is because you have several (deferred?) calls to DefaultIfEmpty().
No pun intended, but what is the default return value of Enumerable.DefaultIfEmpty()? It is null. And they are all evaluated and gathered before you get to the point of evaluating the join criteria in the LINQ code snippet.
And that code snippet represents the non-null right side of equation. And the whole thing can return an empty set.
So a compatible SQL statement must create a left outer join between an empty set recursively all the way down to the actual SQL join criteria.
It's almost algebraic. Try to understand what both the LINQ and SQL statements are down. Work them both out, backwards from the end all the way to the beginning of each, and you'll see the equivalence.

The reason for all the SELECT NULL AS [EMPTY]s is that these subqueries are not being utilized to return data, only to verify that there is data there. In other words, the LINQ code is optimizing the query to not actually bring in any column data, since it's completely unnecessary for the purposes of these subqueries.

Related

Control the Sql that Linq sends to db explanation

Hello I have the a linq query that I have created for a left outer join. I am wondering why linq creates the Sql it does, and how to make it a better query.
here's the c# query:
var query=
(
from subject in subjects
join statement in statements.DefaultIfEmpty() on subject.Id equals statement.SubjectId
select subject
);
query.Take(100).Dump();
and the Sql that it sends:
SELECT TOP (100)
--some fields here
FROM [Subject] AS [t0]
INNER JOIN ((
SELECT NULL AS [EMPTY]
) AS [t1]
LEFT OUTER JOIN [SubjectStatement] AS [t2] ON 1=1 ) ON [t0].[id] = [t2].[SubjectId]
What I would like to see sent is
SELECT TOP(100)
--some fields here
FROM Subject
LEFT OUTER JOIN SubjectStatemnt ON Subject.Id = SubjectStatement.Id
Is there a way to control the Sql that is being passed to Sql Server?

You are using the syntax of an inner join and while that might work out some times, you would normally create a left join using the following syntax:
var query =
(
from subject in subjects
join statement in statements on subject.Id equals statement.SubjectId into ljStatement
from statement in ljStatement.DefaultIfEmpty()
select subject
);
query.Take(100).Dump();
This would result in:
SELECT TOP (100) [t0].[Id]
FROM [Subject] AS [t0]
LEFT OUTER JOIN [SubjectStatement] AS [t1] ON [t0].[Id] = [t1].[SubjectId]
into (C# Reference)
The into contextual keyword can be used to create a temporary
identifier to store the results of a group, join or select clause into
a new identifier.
join clause (C# Reference)
A join clause with an into expression is called a group join.
...
A group join produces a hierarchical result sequence, which associates elements in the left source sequence with one or more matching elements in the right side source sequence. A group join has no equivalent in relational terms; it is essentially a sequence of object arrays.
If no elements from the right source sequence are found to match an element in the left source, the join clause will produce an empty array for that item. Therefore, the group join is still basically an inner-equijoin except that the result sequence is organized into groups.

"Linq to sql" Query Optimization [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I am Working with Linq queries of select at the time of debugging I found some Unusual thing.
This is the Linq query for selecting particular columns from table
var result = from p in Context.Accounts
join a in _Context.People on p.PersonId equals a.PersonId
join b in _Context.BusinessTypes on p.BusinessTypeId equals
b.BusinessTypeId where b.Name == Operator
&& p.AccountId == AccId && a.PersonId == Pid && p.IsDelete == false
select new
{
AccountId = p.AccountId,
PersonId = p.PersonId,
AccountName = p.AccountName,
Active = p.Active,
};
This is the Linq To SQL Conversion which is of same query
SELECT
[Filter1].[AccountId] AS [AccountId],
[Filter1].[PersonId1] AS [PersonId],
[Filter1].[FirstName] AS [FirstName],
[Filter1].[LastName] AS [LastName],
[Filter1].[AccountName] AS [AccountName],
[Filter1].[Active1] AS [Active]
FROM (SELECT [Extent1].[AccountId] AS [AccountId], [Extent1].[BusinessTypeId] AS [BusinessTypeId],
[Extent1].[PersonId] AS [PersonId1], [Extent1].[Active] AS [Active1], [Extent1].[AccountName] AS [AccountName],
[Extent2].[PersonId] AS [PersonId2], [Extent2].[FirstName] AS [FirstName], [Extent2].[LastName] AS [LastName]
FROM [ysmgr].[Account] AS [Extent1]
INNER JOIN [ysmgr].[Person] AS [Extent2] ON [Extent1].[PersonId] = [Extent2].[PersonId]
WHERE 0 = [Extent1].[IsDelete] ) AS [Filter1]
INNER JOIN [ysmgr].[BusinessType] AS [Extent3] ON [Filter1].[BusinessTypeId] = [Extent3].[BusinessTypeId]
WHERE ([Extent3].[Name] = Operator) AND ([Filter1].[AccountId] = AccId ) AND ([Filter1].[PersonId2] = Pid)
We can see that in conversion it in second select query(i.e Extent1) it selects the all columns and after that it selects the particular colums (first select i.e. Filter1) .
Does anyone know why it happens?

I consider choosing a tool as signing an MoU: I do my job, you do your job.
Your job is to write correct application code. Entity Framework's job (or one of it) is to convert LINQ into correct SQL. With each version of EF, SQL generation has improved and this is an ongoing process. This means that any tweaks in your code may prove unnecessary, or even counter-productive, with newer versions of EF. Also, a minute modification in a LINQ query that produces the SQL you like, may produce completely different SQL.
That said, it's always sensible to keep an eye on the generated SQL, as you do. If for some reason EF fails badly in producing efficient SQL, you should know.
And then there are known cases where choosing the right LINQ constructs makes a difference. To list a few:
Forcing inner joins. Navigation properties or Includes can produce outer joins. An explicit join in LINQ will always be an inner join in SQL (and this won't change with upgrades).
Avoiding constructs that are notoriously bad. This depends on the query provider. For instance, with Sql Server, performance deteriorates rapidly when Contains is called with "many" items. Also, using set operators like Except, Intersect, Any, or All can produce horrible, non-scalable SQL queries. But later versions may be better at this.
Avoiding forced execution. Or: defer execution as long as possible. Take these two statements:
var name = context.Companies.Single(c => c.CompanyId == id).Name;
var name = context.Companies.Where(c => c.CompanyId == id)
.Select(c => c.Name).Single();
The first query is shorter, better readable, so let's go for it! Or...? Single is one of the LINQ statements that forces query execution. Here it draws the whole record into memory, and then only Name is actually used. This can be very inefficient with large records. The second query only fetches Name from the database.
As for your query, I wouldn't worry about the generated SQL. It's longer and clunkier than what a human being would produce, but it's not too bad. And, as said in the comments, the database engine's query optimizer will probably be able to turn it into an efficient query plan. You may even consider using navigation properties in stead of explicit joins.
Also, as said, your first and main concern is correctness. Tackle performance when it becomes an issue. So far, with EF against Sql Server I've never found incorrect results from LINQ statements and I think that's a tremendous achievement from the EF team.

Why does the Entity Framework generate nested SQL queries?

Why does the Entity Framework generate nested SQL queries?
I have this code
var db = new Context();
var result = db.Network.Where(x => x.ServerID == serverId)
.OrderBy(x=> x.StartTime)
.Take(limit);
Which generates this! (Note the double select statement)
SELECT
`Project1`.`Id`,
`Project1`.`ServerID`,
`Project1`.`EventId`,
`Project1`.`StartTime`
FROM (SELECT
`Extent1`.`Id`,
`Extent1`.`ServerID`,
`Extent1`.`EventId`,
`Extent1`.`StartTime`
FROM `Networkes` AS `Extent1`
WHERE `Extent1`.`ServerID` = #p__linq__0) AS `Project1`
ORDER BY
`Project1`.`StartTime` DESC LIMIT 5
What should I change so that it results in one select statement? I'm using MySQL and Entity Framework with Code First.
Update
I have the same result regardless of the type of the parameter passed to the OrderBy() method.
Update 2: Timed
Total Time (hh:mm:ss.ms) 05:34:13.000
Average Time (hh:mm:ss.ms) 25:42.000
Max Time (hh:mm:ss.ms) 51:54.000
Count 13
First Seen Nov 6, 12 19:48:19
Last Seen Nov 6, 12 20:40:22
Raw query:
SELECT `Project?`.`Id`, `Project?`.`ServerID`, `Project?`.`EventId`, `Project?`.`StartTime` FROM (SELECT `Extent?`.`Id`, `Extent?`.`ServerID`, `Extent?`.`EventId`, `Extent?`.`StartTime`, FROM `Network` AS `Extent?` WHERE `Extent?`.`ServerID` = ?) AS `Project?` ORDER BY `Project?`.`Starttime` DESC LIMIT ?
I used a program to take snapshots from the current process in MySQL.
Other queries were executed at the same time, but when I change it to just one SELECT statement, it NEVER goes over one second. Maybe I have something else that's going on; I'm asking 'cause I'm not so into DBs...
Update 3: The explain statement
The Entity Framework generated
'1', 'PRIMARY', '<derived2>', 'ALL', NULL, NULL, NULL, NULL, '46', 'Using filesort'
'2', 'DERIVED', 'Extent?', 'ref', 'serveridneventid,serverid', 'serveridneventid', '109', '', '45', 'Using where'
One liner
'1', 'SIMPLE', 'network', 'ref', 'serveridneventid,serverid', 'serveridneventid', '109', 'const', '45', 'Using where; Using filesort'
This is from my QA environment, so the timing I pasted above is not related to the rowcount explain statements. I think that there are about 500,000 records that match one server ID.
Solution
I switched from MySQL to SQL Server. I don't want to end up completely rewriting the application layer.

It's the easiest way to build the query logically from the expression tree. Usually the performance will not be an issue. If you are having performance issues you can try something like this to get the entities back:
var results = db.ExecuteStoreQuery<Network>(
"SELECT Id, ServerID, EventId, StartTime FROM Network WHERE ServerID = #ID",
serverId);
results = results.OrderBy(x=> x.StartTime).Take(limit);

My initial impression was that doing it this way would actually be more efficient, although in testing against a MSSQL server, I got <1 second responses regardless.
With a single select statement, it sorts all the records (Order By), and then filters them to the set you want to see (Where), and then takes the top 5 (Limit 5 or, for me, Top 5). On a large table, the sort takes a significant portion of the time. With a nested statement, it first filters the records down to a subset, and only then does the expensive sort operation on it.
Edit: I did test this, but I realized I had an error in my test which invalidated it. Test results removed.

Why does Entity Framework produce a nested query? The simple answer is because Entity Framework breaks your query expression down into an expression tree and then uses that expression tree to build your query. A tree naturally generates nested query expressions (i.e. a child node generates a query and a parent node generates a query on that query).
Why doesn't Entity Framework simplify the query down and write it as you would? The simple answer is because there is a limited amount of work that can go into the query generation engine, and while it's better now than it was in earlier versions it's not perfect and probably never will be.
All that said there should be no significant speed difference between the query you would write by hand and the query EF generated in this case. The database is clever enough to generate an execution plan that applies the WHERE clause first in either case.

If you want to get the EF to generate the query without the subselect, use a constant within the query, not a variable.
I have previously created my own .Where and all other LINQ methods that first traverse the expression tree and convert all variables, method calls etc. into Expression.Constant. It was done just because of this issue in Entity Framework...

I just stumbled upon this post because I suffer from the same problem. I already spend days tracking this down and it it is just a poor query generation in mysql.
I already filed a bug at mysql.com http://bugs.mysql.com/bug.php?id=75272
To summarize the problem:
This simple query
context.products
.Include(x => x.category)
.Take(10)
.ToList();
gets translated into
SELECT
`Limit1`.`C1`,
`Limit1`.`id`,
`Limit1`.`name`,
`Limit1`.`category_id`,
`Limit1`.`id1`,
`Limit1`.`name1`
FROM (SELECT
`Extent1`.`id`,
`Extent1`.`name`,
`Extent1`.`category_id`,
`Extent2`.`id` AS `id1`,
`Extent2`.`name` AS `name1`,
1 AS `C1`
FROM `products` AS `Extent1` INNER JOIN `categories` AS `Extent2` ON `Extent1`.`category_id` = `Extent2`.`id` LIMIT 10) AS `Limit1`
and performs pretty well. Anyway, the outer query is pretty much useless. Now If I add an OrderBy
context.products
.Include(x => x.category)
.OrderBy(x => x.id)
.Take(10)
.ToList();
the query changes to
SELECT
`Project1`.`C1`,
`Project1`.`id`,
`Project1`.`name`,
`Project1`.`category_id`,
`Project1`.`id1`,
`Project1`.`name1`
FROM (SELECT
`Extent1`.`id`,
`Extent1`.`name`,
`Extent1`.`category_id`,
`Extent2`.`id` AS `id1`,
`Extent2`.`name` AS `name1`,
1 AS `C1`
FROM `products` AS `Extent1` INNER JOIN `categories` AS `Extent2` ON `Extent1`.`category_id` = `Extent2`.`id`) AS `Project1`
ORDER BY
`Project1`.`id` ASC LIMIT 10
Which is bad because the order by is in the outer query. Theat means MySQL has to pull every record in order to perform an orderby which results in using filesort
I verified that SQL Server (Comapact at least) does not generate nested queries for the same code
SELECT TOP (10)
[Extent1].[id] AS [id],
[Extent1].[name] AS [name],
[Extent1].[category_id] AS [category_id],
[Extent2].[id] AS [id1],
[Extent2].[name] AS [name1],
FROM [products] AS [Extent1]
LEFT OUTER JOIN [categories] AS [Extent2] ON [Extent1].[category_id] = [Extent2].[id]
ORDER BY [Extent1].[id] ASC

Actually the queries generated by Entity Framework are few ugly, less than LINQ 2 SQL but still ugly.
However, very probably you database engine will make the desired execution plan, and the query will run smoothly.

OrderBy a Many To Many relationship with Entity Sql

I'm trying to better utilize the resources of the Entity Sql in the following scenario: I have a table Book which has a Many-To-Many relationship with the Author table. Each book may have from 0 to N authors. I would like to sort the books by the first author name, ie the first record found in this relationship (or null when no authors are linked to a book).
With T-SQL it can be done without difficulty:
SELECT
b.*
FROM
Book AS b
JOIN BookAuthor AS ba ON b.BookId = ba.BookId
JOIN Author AS a ON ba.AuthorId = a.AuthorId
ORDER BY
a.AuthorName;
But I cannot think of how to adapt my code bellow to achieve it. Indeed I don't know how to write something equivalent directly with Entity Sql too.
Entities e = new Entities();
var books = e.Books;
var query = books.Include("Authors");
if (sorting == null)
query = query.OrderBy("it.Title asc");
else
query = query.OrderBy("it.Authors.Name asc"); // This isn't it.
return query.Skip(paging.Skip).Take(paging.Take).ToList();
Could someone explain me how to modify my code to generate the Entity Sql for the desired result? Or even explain me how to write by hand a query using CreateQuery<Book>() to achieve it?
EDIT
Just to elucidate, I'll be working with a very large collection of books (around 100k). Sorting them in memory would be very impactful on the performance. I wish the answers would focus on how to generate the desired ordering using Entity Sql, so the orderby will happens on the database.

The OrderBy method expects you to give it a lambda expression (well, actually a Func delegate, but most people would use lambdas to make them) that can be run to select the field to sort by. Also, OrderBy always orders ascending; if you want descending order there is an OrderByDescending method.
var query = books
.Include("Authors")
.OrderBy(book => book.Authors.Any()
? book.Authors.FirstOrDefault().Name
: string.Empty);
This is basically telling the OrderBy method: "for each book in the sequence, if there are any authors, select the first one's name as my sort key; otherwise, select the empty string. Then return me the books sorted by the sort key."
You could put anything in place of the string.Empty, including for example book.Title or any other property of the book to use in place of the last name for sorting.
EDIT from comments:
As long as the sorting behavior you ask for isn't too complex, the Entity Framework's query provider can usually figure out how to turn it into SQL. It will try really, really hard to do that, and if it can't you'll get a query error. The only time the sorting would be done in client-side objects is if you forced the query to run (e.g. .AsEnumerable()) before the OrderBy was called.
In this case, the EF outputs a select statement that includes the following calculated field:
CASE WHEN ( EXISTS (SELECT
1 AS [C1]
FROM [dbo].[BookAuthor] AS [Extent4]
WHERE [Extent1].[Id] = [Extent4].[Books_Id]
)) THEN [Limit1].[Name] ELSE #p__linq__0 END AS [C1],
Then orders by that.
#p__linq__0 is a parameter, passed in as string.Empty, so you can see it converted the lambda expression into SQL pretty directly. Extent and Limit are just aliases used in the generated SQL for the joined tables etc. Extent1 is [Books] and Limit1 is:
SELECT TOP (1) -- Field list goes here.
FROM [dbo].[BookAuthor] AS [Extent2]
INNER JOIN [dbo].[Authors] AS [Extent3] ON [Extent3].[Id] = [Extent2].[Authors_Id]
WHERE [Extent1].[Id] = [Extent2].[Books_Id]

If you don't care where the sorting is happening (i.e. SQL vs In Code), you can retrieve your result set, and sort it using your own sorting code after the query results have been returned. In my experience, getting specialized sorting like this to work with Entity Framework can be very difficult, frustrating and time consuming.

Limiting the NULL subqueries from result set

The setup is simple, a Master table and a linked Child table (one master, many children). Lets say we want to extract all masters and their top chronological child value (updated, accessed, etc). A query would look like this (for example):
var masters = from m in Master
let mc = m.Childs.Max(c => c.CreatedOn)
select new { m, mc };
A potential problem occurs if master has no children, the subquery will yield NULL and conversion from NULL to DateTime will fail with
InvalidOperationException: The null
value cannot be assigned to a member
with type System.DateTime which is a
non-nullable value type.
Solution to exception is to cast mc to DateTime?, but I need masters that have some children and just bypass few which have no children yet.
Solution #1 Add where m.Childs.Count() > 0.
This one kicked me hard and unexpected, the generated SQL was just plain awful (as was its execution plan) and ran almost twice as slow:
SELECT [t2].[Name] AS [MasterName], [t2].[value] AS [cm]
FROM (
SELECT [t0].[id], [t0].[Name], (
SELECT MAX([t1].[CreatedOn])
FROM [Child] AS [t1]
WHERE [t1].[masterId] = [t0].[id]
) AS [value]
FROM [Master] AS [t0]
) AS [t2]
WHERE ((
SELECT COUNT(*)
FROM [Child] AS [t3]
WHERE [t3].[masterId] = [t2].[id]
)) > #p0
Solution #2 with where mc != null is even worst, it gives a shorter script but it executes far longer than the above one (takes roughly the same time as the two above together)
SELECT [t2].[Name] AS [MasterName], [t2].[value] AS [cm]
FROM (
SELECT [t0].[id], [t0].[Name], (
SELECT MAX([t1].[CreatedOn])
FROM [Child] AS [t1]
WHERE [t1].[masterId] = [t0].[id]
) AS [value]
FROM [Master] AS [t0]
) AS [t2]
WHERE ([t2].[value]) IS NOT NULL
All in all a lot of wasted SQL time to eliminate a few rows from tens or thousands or more. This led me to Solution #3, get everything and eliminate empty ones client side, but to do that I had to kiss IQueryable goodbye:
var masters = from m in Master
let mc = (DateTime?)m.Childs.Max(c => c.CreatedOn)
select new { m, mc };
var mastersNotNull = masters.AsEnumerable().Where(m => m.mc != null);
and this works, however I am trying to figure out if there are any downsides to this? Will this behave anyway fundamentally different then with full monty IQueryable? I imagine this also means I cannot (or should not) use masters as a factor in a different IQueryable? Any input/observation/alternative is welcomed.

Just based on this requirement:
a Master table and a linked Child
table (one master, many children).
Lets say we want to extract all
masters and their top chronological
child value
SELECT [m].[Name] AS [MasterName]
, Max([c].[value]) as [cm]
FROM [Master] AS [m]
left outer join [Child] as [c] on m.id = c.id
group by [m].[name]

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.