I did the following query:
var list = from book in books
where book.price > 50
select book;
list = list.Take(50);
I would expect the above to generate something like:
SELECT top 50 id, title, price, author
FROM Books
WHERE price > 50
but it generates:
SELECT
[Limit1].[C1] as [C1]
[Limit1].[id] as [Id],
[Limit1].[title] as [title],
[Limit1].[price] as [price],
[Limit1].[author]
FROM (SELECT TOP (50)
[Extent1].[id] as as [Id],
[Extent1].[title] as [title],
[Extent1].[price] as [price],
[Extent1].[author] as [author]
FROM Books as [Extent1]
WHERE [Extent1].[price] > 50
) AS [Limit1]
Why does the above linq query generate a subquery and where does the C1 come from?
Disclaimer: I've never used LINQ before...
My guess would be paging support? I guess you have some sort of Take(50, 50) method that gets 50 records, starting at record 50. Take a look at the SQL that query generates and you will probably find that it uses a similar sub query structure to allow it to return any 50 rows in a query in approximately the amount of time that it returns the first 50 rows.
In any case, the nested sub query doesn't add any performance overhead as it's automagically optimised away during compilation of the execution plan.
You could still make it cleaner like this:
var c = (from co in db.countries
where co.regionID == 5
select co).Take(50);
This will result in:
Table(country).Where(co => (co.regionID = Convert(5))).Take(50)
Equivalent to:
SELECT TOP (50) [t0].[countryID], [t0].[regionID], [t0].[countryName], [t0].[code]
FROM [dbo].[countries] AS [t0]
WHERE [t0].[regionID] = 5
EDIT: Comments, Its Not necessarily because with separate Take(), you can still use it like this:
var c = (from co in db.countries
where co.regionID == 5
select co);
var l = c.Take(50).ToList();
And the Result would be the same as before.
SELECT TOP (50) [t0].[countryID], [t0].[regionID], [t0].[countryName], [t0].[code]
FROM [dbo].[countries] AS [t0]
WHERE [t0].[regionID] = #p0
The fact that you wrote IQueryable = IQueryable.Take(50) is the tricky part here.
The subquery is generated for projection purposes, it makes more sense when you select from multiple tables into a single anonymous object, then the outer query is used to gather the results.
Try what happens with something like this:
from book in books
where price > 50
select new
{
Title = book.title,
Chapters = from chapter in book.Chapters
select chapter.Title
}
Isn't it a case of the first query returning the total number of rows while the second extracts the subset of rows based on the call to the .Take() method?
I agree with #Justin Swartsel. There was no error involved, so this is largely an academic matter.
Linq-to-SQL endeavors to generate SQL that runs efficiently (which it did in your case).
But it does not make any effort to generate conventional SQL that a human would likely create.
The Linq-to-SQL implementers likely used the builder pattern to generate the SQL.
If so, it would be easier to append a substring (or a subquery in this case) than it would be to backtrack and insert a 'TOP x' fragment into the SELECT clause.
Related
There is a list list in memory of 50,000 Product IDs. I would like to get all these Products from the DB. Using dbContext.Products.Where(p => list.contains(p.ID)) generates a giant IN in the SQL - WHERE ID IN (2134,1324543,5675,32451,45735...), and it takes forever. This is partly because it takes time for SQL Server to parse such a large string, and also the execution plan is bad. (I know this from trying to use a temporary table instead).
So I used SQLBulkCopy to insert the IDs to a temporary table, and then ran
dbContext.Set<Product>().SqlQuery("SELECT * FROM Products WHERE ID IN (SELECT ID FROM #tmp))"
This gave good performance. However, now I need the products, with their suppliers (multiple for every product). Using a custom SQL command there is no way to get back a complex object that I know of. So how can I get the products with their suppliers, using the temporary table?
(If I can somehow refer to the temporary table in LINQ, then it would be OK - I could just do dbContext.Products.Where(p => dbContext.TempTable.Any(t => t.ID==p.ID)). If I could refer to it in a UDF that would also be good - but you can't. I cannot use a real table, since concurrent users would leave it in an inconsistent state.)
Thanks
I was curious to explore the sql generated using Join syntax rather than Contains. Here is the code for my test:
IQueryable<Product> queryable = Uow.ProductRepository.All;
List<int> inMemKeys = new int[] { 2134, 1324543, 5675, 32451, 45735 }.ToList();
string sql1 = queryable.Where(p => inMemKeys.Contains(p.ID)).ToString();
string sql2 = queryable.Join(inMemKeys, t => t.ID, pk => pk, (t, pk) => t).ToString();
This is the sql generated using Contains (sql1)
SELECT
[extent1].[id] AS [id],...etc
FROM [dbo].[products] AS [extent1]
WHERE ([extent1].[id] IN (2134, 1324543, 5675, 32451, 45735))
This is the sql generated using Join:
SELECT
[extent1].[id] AS [id],...etc
FROM [dbo].[products] AS [extent1]
INNER JOIN (SELECT
[unionall3].[c1] AS [c1]
FROM (SELECT
[unionall2].[c1] AS [c1]
FROM (SELECT
[unionall1].[c1] AS [c1]
FROM (SELECT
2134 AS [c1]
FROM (SELECT
1 AS x) AS [singlerowtable1] UNION ALL SELECT
1324543 AS [c1]
FROM (SELECT
1 AS x) AS [singlerowtable2]) AS [unionall1] UNION ALL SELECT
5675 AS [c1]
FROM (SELECT
1 AS x) AS [singlerowtable3]) AS [unionall2] UNION ALL SELECT
32451 AS [c1]
FROM (SELECT
1 AS x) AS [singlerowtable4]) AS [unionall3] UNION ALL SELECT
45735 AS [c1]
FROM (SELECT
1 AS x) AS [singlerowtable5]) AS [unionall4]
ON [extent1].[id] = [unionall4].[c1]
So the sql creates a big select statement using union all to create the equivalent of your temporary table, then it joins to that table. The sql is more verbose, but it may well be efficient - I'm afraid I'm not qualified to say.
While it doesn't answer the question as set out in the heading, it does show a way to avoid the giant IN . OK.... now it's a giant UNION ALL.... anyways...I hope that this contribution is useful to some
I suggest you extend the filter table (TempTable in the code above) to store something like a UserId or SessionId as well as ProductID's:
this will give you all the performance you're after
it will work for concurrent users
If this filter table is changing a lot then consider updating it in a separate transaction (i.e. a different instance of dbContext) to avoid holding a write lock on this table for longer than necessary.
I'm not sure when but I read an article on this which indicates that the usage of Skip(1).Any() is better than Count() compassion when using Entity Framework (I may remember wrong). I'm not sure about this after I saw the generated T-SQL code.
Here is the first option:
int userConnectionCount = _dbContext.HubConnections.Count(conn => conn.UserId == user.Id);
bool isAtSingleConnection = (userConnectionCount == 1);
This generates the following T-SQL code which is reasonable:
SELECT
[GroupBy1].[A1] AS [C1]
FROM ( SELECT
COUNT(1) AS [A1]
FROM [dbo].[HubConnections] AS [Extent1]
WHERE [Extent1].[UserId] = #p__linq__0
) AS [GroupBy1]
Here is the other option which is the suggested query as far as I remember:
bool isAtSingleConnection = !_dbContext
.HubConnections.OrderBy(conn => conn.Id)
.Skip(1).Any(conn => conn.UserId == user.Id);
Here is the generated T-SQL for the above LINQ query:
SELECT
CASE WHEN ( EXISTS (SELECT
1 AS [C1]
FROM ( SELECT [Extent1].[Id] AS [Id], [Extent1].[UserId] AS [UserId]
FROM ( SELECT [Extent1].[Id] AS [Id], [Extent1].[UserId] AS [UserId], row_number() OVER (ORDER BY [Extent1].[Id] ASC) AS [row_number]
FROM [dbo].[HubConnections] AS [Extent1]
) AS [Extent1]
WHERE [Extent1].[row_number] > 1
) AS [Skip1]
WHERE [Skip1].[UserId] = #p__linq__0
)) THEN cast(1 as bit) WHEN ( NOT EXISTS (SELECT
1 AS [C1]
FROM ( SELECT [Extent2].[Id] AS [Id], [Extent2].[UserId] AS [UserId]
FROM ( SELECT [Extent2].[Id] AS [Id], [Extent2].[UserId] AS [UserId], row_number() OVER (ORDER BY [Extent2].[Id] ASC) AS [row_number]
FROM [dbo].[HubConnections] AS [Extent2]
) AS [Extent2]
WHERE [Extent2].[row_number] > 1
) AS [Skip2]
WHERE [Skip2].[UserId] = #p__linq__0
)) THEN cast(0 as bit) END AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable1];
Which one is the proper way here? Is there a big performance difference between these two?
Query performance depends on a lot of things, like the indexes that are present, the actual data, how stale the statistics about the data present are etc. SQL query plan optimizer looks at these different metrics to come up with an efficient query plan. So, any straightforward answer that says query 1 is always better than query 2 or the opposite would be incorrect.
That said, my answer below tries to explain the articles stance and how Skip(1).Any() could be better(marginally) than doing a Count() > 1. The second query though being bigger in size and mostly unreadable looks like it could be interpreted in an efficient fashion. Again, this depends on things aforementioned. The idea is that the number of rows that the database has to look into to figure out the result is more in case of Count(). In the count case, assuming that the required indexes are there (a clustered index on Id to make the OrderBy in second case efficient), the db has to go through count number of rows. In the second case, it has to go through a maximum of two rows to arrive at the answer.
Lets get more scientific in our analysis and see if my above theory holds any ground. For this, I am creating a dummy database of customers. The Customer type looks like this,
public class Customer
{
public int ID { get; set; }
public string Name { get; set; }
public int Age { get; set; }
}
I am seeding the database with some 100K random rows(I really have to prove this) using this code,
for (int j = 0; j < 100; j++)
{
using (CustomersContext db = new CustomersContext())
{
Random r = new Random();
for (int i = 0; i < 1000; i++)
{
Customer c = new Customer
{
Name = Guid.NewGuid().ToString(),
Age = r.Next(0, 100)
};
db.Customers.Add(c);
}
db.SaveChanges();
}
}
Sample code here.
Now, the queries that I am going to use are as follows,
db.Customers.Where(c => c.Age == 26).Count() > 1; // scenario 1
db.Customers.Where(c => c.Age == 26).OrderBy(c => c.ID).Skip(1).Any() // scenario 2
I have started SQL profiler to catch the query plans. The captured plans look as follows,
Scenario 1:
Check out the estimated cost and actual row count for scenario 1 in the above images.
Scenario 2:
Check out the estimated cost and actual row count for scenario 2 in the below images.
As per the initial guess, the estimated cost and the number of rows is lesser in the Skip and any case as compared to Count case.
Conclusion:
All this analysis aside, as many others have commented earlier, these are not the kind of performance optimizations you should try to do in your code. Things like these hurt readability with very minimal(I would say non-existent) perf benefit. I just did this analysis for fun and would never use this as a basis for choosing scenario 2. I would measure and see if doing a Count() is actually hurting to change the code to use Skip().Any().
I read an article on this which indicates that the usage of Skip(1).Any() is better than Count().
That statement is quite true on a LINQ to objects query. On a LINQ to objects query Skip(1).Any() only needs to try to get the first two items of the sequence, and it can ignore all of the items that come after it. If the sequence involves rather expensive operations (and properly defers execution) or even more importantly, if the sequence is infinite, this could be a big deal. For most queries it will matter a bit, but often not a lot.
For a LINQ query that is based on a query provider instead it's unlikely to be significantly difference. Particularly with EF, as you have seen, the generated query is not noticeably different. Is it possible for there to be a difference, sure. One case could be handled better than the other by the query provider, particular queries much be able to be optimized better with the particular refactor used by one or the other, etc.
If someone is suggesting that there's a major difference in the EF query between these two, odds are they're mistakenly applying the guideline that was designed to just apply to a LINQ to objects query.
It will definitely depend on your record count in your table/data set. If you have a lot of records, then doing a count on an identity will be very fast, because it's indexed, but skipping one record and then getting the next record would be faster.
Granted, this process could be done in sub-millisecond in either case. Unless you have a record count that exceeds 10,000+ records, it really won't matter unless you need it to return under a specific threshold. Don't forget that SQL Server will cache query execution plans. If you re-run the same query, you may not see a difference after running it the first time, unless the data changes pretty significantly beneath it.
I have this query that I want translated pretty much 1:1 from Entity Framework to SQL:
SELECT GroupId, ItemId, count(*) as total
FROM [TESTDB].[dbo].[TestTable]
WHERE GroupId = '64025'
GROUP BY GroupId, ItemId
ORDER BY GroupId, total DESC
This SQL query should sort based on the number occurrence of the same ItemId (for that group).
I have this now:
from x in dataContext.TestTable.AsNoTracking()
where x.GroupId = 64025
group x by new {x.GroupId, x.ItemId}
into g
orderby g.Key.GroupId, g.Count() descending
select new {g.Key.GroupId, g.Key.ItemId, Count = g.Count()};
But this generates the following SQL code:
SELECT
[GroupBy1].[K1] AS [GroupId],
[GroupBy1].[K2] AS [ItemId],
[GroupBy1].[A2] AS [C1]
FROM ( SELECT
[Extent1].[GroupId] AS [K1],
[Extent1].[ItemId] AS [K2],
COUNT(1) AS [A1],
COUNT(1) AS [A2]
FROM [dbo].[TestTable] AS [Extent1]
WHERE 64025 = [Extent1].[GroupId]
GROUP BY [Extent1].[GroupId], [Extent1].[ItemId]
) AS [GroupBy1]
ORDER BY [GroupBy1].[K1] ASC, [GroupBy1].[A1] DESC
This also works but is a factor 2 slower than the SQL I created.
I've been fiddling around with the linq code for a while but I haven't managed to create something similar to my query.
Execution plan (only the last two items, the first two are identical):
FIRST: |--Stream Aggregate(GROUP BY:([Extent1].[ItemId]) DEFINE:([Expr1006]=Count(*), [Extent1].[GroupId]=ANY([TESTDB].[dbo].[TestTable].[GroupId] as [Extent1].[GroupId])))
|--Index Seek(OBJECT:([TESTDB].[dbo].[TestTable].[IX_Group]), SEEK:([TESTDB].[dbo].[TestTable].[GroupId]=(64034)) ORDERED FORWARD)
SECOND: |--Stream Aggregate(GROUP BY:([TESTDB].[dbo].[TestTable].[ItemId]) DEFINE:([Expr1007]=Count(*), [TESTDB].[dbo].[TestTable].[GroupId]=ANY([TESTDB].[dbo].[TestTable].[GroupId])))
|--Index Seek(OBJECT:([TESTDB].[dbo].[TestTable].[IX_Group] AS [Extent1]), SEEK:([Extent1].[GroupId]=(64034)) ORDERED FORWARD)
The query that Entity Framework generates and your hand crafted query are semantically the same and will give the same plan.
The derived table definition is inlined during query optimisation so the only difference might be some extremely minor additional overhead during parsing and compilation.
The snippets of SHOWPLAN_TEXT you have posted are the same plan. The only difference is aliases. It looks as though your table definition is something like.
CREATE TABLE [dbo].[TestTable]
(
[GroupId] INT,
[ItemId] INT
)
CREATE NONCLUSTERED INDEX IX_Group ON [dbo].[TestTable] ([GroupId], [ItemId])
And you are getting a plan like this
To all intents and purposes the plans are the same. Your performance testing methodology is probably flawed. Maybe your first query brought pages into cache that then benefited the second query for example.
Why does the Entity Framework generate nested SQL queries?
I have this code
var db = new Context();
var result = db.Network.Where(x => x.ServerID == serverId)
.OrderBy(x=> x.StartTime)
.Take(limit);
Which generates this! (Note the double select statement)
SELECT
`Project1`.`Id`,
`Project1`.`ServerID`,
`Project1`.`EventId`,
`Project1`.`StartTime`
FROM (SELECT
`Extent1`.`Id`,
`Extent1`.`ServerID`,
`Extent1`.`EventId`,
`Extent1`.`StartTime`
FROM `Networkes` AS `Extent1`
WHERE `Extent1`.`ServerID` = #p__linq__0) AS `Project1`
ORDER BY
`Project1`.`StartTime` DESC LIMIT 5
What should I change so that it results in one select statement? I'm using MySQL and Entity Framework with Code First.
Update
I have the same result regardless of the type of the parameter passed to the OrderBy() method.
Update 2: Timed
Total Time (hh:mm:ss.ms) 05:34:13.000
Average Time (hh:mm:ss.ms) 25:42.000
Max Time (hh:mm:ss.ms) 51:54.000
Count 13
First Seen Nov 6, 12 19:48:19
Last Seen Nov 6, 12 20:40:22
Raw query:
SELECT `Project?`.`Id`, `Project?`.`ServerID`, `Project?`.`EventId`, `Project?`.`StartTime` FROM (SELECT `Extent?`.`Id`, `Extent?`.`ServerID`, `Extent?`.`EventId`, `Extent?`.`StartTime`, FROM `Network` AS `Extent?` WHERE `Extent?`.`ServerID` = ?) AS `Project?` ORDER BY `Project?`.`Starttime` DESC LIMIT ?
I used a program to take snapshots from the current process in MySQL.
Other queries were executed at the same time, but when I change it to just one SELECT statement, it NEVER goes over one second. Maybe I have something else that's going on; I'm asking 'cause I'm not so into DBs...
Update 3: The explain statement
The Entity Framework generated
'1', 'PRIMARY', '<derived2>', 'ALL', NULL, NULL, NULL, NULL, '46', 'Using filesort'
'2', 'DERIVED', 'Extent?', 'ref', 'serveridneventid,serverid', 'serveridneventid', '109', '', '45', 'Using where'
One liner
'1', 'SIMPLE', 'network', 'ref', 'serveridneventid,serverid', 'serveridneventid', '109', 'const', '45', 'Using where; Using filesort'
This is from my QA environment, so the timing I pasted above is not related to the rowcount explain statements. I think that there are about 500,000 records that match one server ID.
Solution
I switched from MySQL to SQL Server. I don't want to end up completely rewriting the application layer.
It's the easiest way to build the query logically from the expression tree. Usually the performance will not be an issue. If you are having performance issues you can try something like this to get the entities back:
var results = db.ExecuteStoreQuery<Network>(
"SELECT Id, ServerID, EventId, StartTime FROM Network WHERE ServerID = #ID",
serverId);
results = results.OrderBy(x=> x.StartTime).Take(limit);
My initial impression was that doing it this way would actually be more efficient, although in testing against a MSSQL server, I got <1 second responses regardless.
With a single select statement, it sorts all the records (Order By), and then filters them to the set you want to see (Where), and then takes the top 5 (Limit 5 or, for me, Top 5). On a large table, the sort takes a significant portion of the time. With a nested statement, it first filters the records down to a subset, and only then does the expensive sort operation on it.
Edit: I did test this, but I realized I had an error in my test which invalidated it. Test results removed.
Why does Entity Framework produce a nested query? The simple answer is because Entity Framework breaks your query expression down into an expression tree and then uses that expression tree to build your query. A tree naturally generates nested query expressions (i.e. a child node generates a query and a parent node generates a query on that query).
Why doesn't Entity Framework simplify the query down and write it as you would? The simple answer is because there is a limited amount of work that can go into the query generation engine, and while it's better now than it was in earlier versions it's not perfect and probably never will be.
All that said there should be no significant speed difference between the query you would write by hand and the query EF generated in this case. The database is clever enough to generate an execution plan that applies the WHERE clause first in either case.
If you want to get the EF to generate the query without the subselect, use a constant within the query, not a variable.
I have previously created my own .Where and all other LINQ methods that first traverse the expression tree and convert all variables, method calls etc. into Expression.Constant. It was done just because of this issue in Entity Framework...
I just stumbled upon this post because I suffer from the same problem. I already spend days tracking this down and it it is just a poor query generation in mysql.
I already filed a bug at mysql.com http://bugs.mysql.com/bug.php?id=75272
To summarize the problem:
This simple query
context.products
.Include(x => x.category)
.Take(10)
.ToList();
gets translated into
SELECT
`Limit1`.`C1`,
`Limit1`.`id`,
`Limit1`.`name`,
`Limit1`.`category_id`,
`Limit1`.`id1`,
`Limit1`.`name1`
FROM (SELECT
`Extent1`.`id`,
`Extent1`.`name`,
`Extent1`.`category_id`,
`Extent2`.`id` AS `id1`,
`Extent2`.`name` AS `name1`,
1 AS `C1`
FROM `products` AS `Extent1` INNER JOIN `categories` AS `Extent2` ON `Extent1`.`category_id` = `Extent2`.`id` LIMIT 10) AS `Limit1`
and performs pretty well. Anyway, the outer query is pretty much useless. Now If I add an OrderBy
context.products
.Include(x => x.category)
.OrderBy(x => x.id)
.Take(10)
.ToList();
the query changes to
SELECT
`Project1`.`C1`,
`Project1`.`id`,
`Project1`.`name`,
`Project1`.`category_id`,
`Project1`.`id1`,
`Project1`.`name1`
FROM (SELECT
`Extent1`.`id`,
`Extent1`.`name`,
`Extent1`.`category_id`,
`Extent2`.`id` AS `id1`,
`Extent2`.`name` AS `name1`,
1 AS `C1`
FROM `products` AS `Extent1` INNER JOIN `categories` AS `Extent2` ON `Extent1`.`category_id` = `Extent2`.`id`) AS `Project1`
ORDER BY
`Project1`.`id` ASC LIMIT 10
Which is bad because the order by is in the outer query. Theat means MySQL has to pull every record in order to perform an orderby which results in using filesort
I verified that SQL Server (Comapact at least) does not generate nested queries for the same code
SELECT TOP (10)
[Extent1].[id] AS [id],
[Extent1].[name] AS [name],
[Extent1].[category_id] AS [category_id],
[Extent2].[id] AS [id1],
[Extent2].[name] AS [name1],
FROM [products] AS [Extent1]
LEFT OUTER JOIN [categories] AS [Extent2] ON [Extent1].[category_id] = [Extent2].[id]
ORDER BY [Extent1].[id] ASC
Actually the queries generated by Entity Framework are few ugly, less than LINQ 2 SQL but still ugly.
However, very probably you database engine will make the desired execution plan, and the query will run smoothly.
I'm trying to better utilize the resources of the Entity Sql in the following scenario: I have a table Book which has a Many-To-Many relationship with the Author table. Each book may have from 0 to N authors. I would like to sort the books by the first author name, ie the first record found in this relationship (or null when no authors are linked to a book).
With T-SQL it can be done without difficulty:
SELECT
b.*
FROM
Book AS b
JOIN BookAuthor AS ba ON b.BookId = ba.BookId
JOIN Author AS a ON ba.AuthorId = a.AuthorId
ORDER BY
a.AuthorName;
But I cannot think of how to adapt my code bellow to achieve it. Indeed I don't know how to write something equivalent directly with Entity Sql too.
Entities e = new Entities();
var books = e.Books;
var query = books.Include("Authors");
if (sorting == null)
query = query.OrderBy("it.Title asc");
else
query = query.OrderBy("it.Authors.Name asc"); // This isn't it.
return query.Skip(paging.Skip).Take(paging.Take).ToList();
Could someone explain me how to modify my code to generate the Entity Sql for the desired result? Or even explain me how to write by hand a query using CreateQuery<Book>() to achieve it?
EDIT
Just to elucidate, I'll be working with a very large collection of books (around 100k). Sorting them in memory would be very impactful on the performance. I wish the answers would focus on how to generate the desired ordering using Entity Sql, so the orderby will happens on the database.
The OrderBy method expects you to give it a lambda expression (well, actually a Func delegate, but most people would use lambdas to make them) that can be run to select the field to sort by. Also, OrderBy always orders ascending; if you want descending order there is an OrderByDescending method.
var query = books
.Include("Authors")
.OrderBy(book => book.Authors.Any()
? book.Authors.FirstOrDefault().Name
: string.Empty);
This is basically telling the OrderBy method: "for each book in the sequence, if there are any authors, select the first one's name as my sort key; otherwise, select the empty string. Then return me the books sorted by the sort key."
You could put anything in place of the string.Empty, including for example book.Title or any other property of the book to use in place of the last name for sorting.
EDIT from comments:
As long as the sorting behavior you ask for isn't too complex, the Entity Framework's query provider can usually figure out how to turn it into SQL. It will try really, really hard to do that, and if it can't you'll get a query error. The only time the sorting would be done in client-side objects is if you forced the query to run (e.g. .AsEnumerable()) before the OrderBy was called.
In this case, the EF outputs a select statement that includes the following calculated field:
CASE WHEN ( EXISTS (SELECT
1 AS [C1]
FROM [dbo].[BookAuthor] AS [Extent4]
WHERE [Extent1].[Id] = [Extent4].[Books_Id]
)) THEN [Limit1].[Name] ELSE #p__linq__0 END AS [C1],
Then orders by that.
#p__linq__0 is a parameter, passed in as string.Empty, so you can see it converted the lambda expression into SQL pretty directly. Extent and Limit are just aliases used in the generated SQL for the joined tables etc. Extent1 is [Books] and Limit1 is:
SELECT TOP (1) -- Field list goes here.
FROM [dbo].[BookAuthor] AS [Extent2]
INNER JOIN [dbo].[Authors] AS [Extent3] ON [Extent3].[Id] = [Extent2].[Authors_Id]
WHERE [Extent1].[Id] = [Extent2].[Books_Id]
If you don't care where the sorting is happening (i.e. SQL vs In Code), you can retrieve your result set, and sort it using your own sorting code after the query results have been returned. In my experience, getting specialized sorting like this to work with Entity Framework can be very difficult, frustrating and time consuming.