OrderBy a Many To Many relationship with Entity Sql - c#

I'm trying to better utilize the resources of the Entity Sql in the following scenario: I have a table Book which has a Many-To-Many relationship with the Author table. Each book may have from 0 to N authors. I would like to sort the books by the first author name, ie the first record found in this relationship (or null when no authors are linked to a book).
With T-SQL it can be done without difficulty:
SELECT
b.*
FROM
Book AS b
JOIN BookAuthor AS ba ON b.BookId = ba.BookId
JOIN Author AS a ON ba.AuthorId = a.AuthorId
ORDER BY
a.AuthorName;
But I cannot think of how to adapt my code bellow to achieve it. Indeed I don't know how to write something equivalent directly with Entity Sql too.
Entities e = new Entities();
var books = e.Books;
var query = books.Include("Authors");
if (sorting == null)
query = query.OrderBy("it.Title asc");
else
query = query.OrderBy("it.Authors.Name asc"); // This isn't it.
return query.Skip(paging.Skip).Take(paging.Take).ToList();
Could someone explain me how to modify my code to generate the Entity Sql for the desired result? Or even explain me how to write by hand a query using CreateQuery<Book>() to achieve it?
EDIT
Just to elucidate, I'll be working with a very large collection of books (around 100k). Sorting them in memory would be very impactful on the performance. I wish the answers would focus on how to generate the desired ordering using Entity Sql, so the orderby will happens on the database.

The OrderBy method expects you to give it a lambda expression (well, actually a Func delegate, but most people would use lambdas to make them) that can be run to select the field to sort by. Also, OrderBy always orders ascending; if you want descending order there is an OrderByDescending method.
var query = books
.Include("Authors")
.OrderBy(book => book.Authors.Any()
? book.Authors.FirstOrDefault().Name
: string.Empty);
This is basically telling the OrderBy method: "for each book in the sequence, if there are any authors, select the first one's name as my sort key; otherwise, select the empty string. Then return me the books sorted by the sort key."
You could put anything in place of the string.Empty, including for example book.Title or any other property of the book to use in place of the last name for sorting.
EDIT from comments:
As long as the sorting behavior you ask for isn't too complex, the Entity Framework's query provider can usually figure out how to turn it into SQL. It will try really, really hard to do that, and if it can't you'll get a query error. The only time the sorting would be done in client-side objects is if you forced the query to run (e.g. .AsEnumerable()) before the OrderBy was called.
In this case, the EF outputs a select statement that includes the following calculated field:
CASE WHEN ( EXISTS (SELECT
1 AS [C1]
FROM [dbo].[BookAuthor] AS [Extent4]
WHERE [Extent1].[Id] = [Extent4].[Books_Id]
)) THEN [Limit1].[Name] ELSE #p__linq__0 END AS [C1],
Then orders by that.
#p__linq__0 is a parameter, passed in as string.Empty, so you can see it converted the lambda expression into SQL pretty directly. Extent and Limit are just aliases used in the generated SQL for the joined tables etc. Extent1 is [Books] and Limit1 is:
SELECT TOP (1) -- Field list goes here.
FROM [dbo].[BookAuthor] AS [Extent2]
INNER JOIN [dbo].[Authors] AS [Extent3] ON [Extent3].[Id] = [Extent2].[Authors_Id]
WHERE [Extent1].[Id] = [Extent2].[Books_Id]

If you don't care where the sorting is happening (i.e. SQL vs In Code), you can retrieve your result set, and sort it using your own sorting code after the query results have been returned. In my experience, getting specialized sorting like this to work with Entity Framework can be very difficult, frustrating and time consuming.

Related

.netcore EF linq - this is a BUG? Very strange behavior

I have two table in sql. Document and User. Document have relation to User and I want to get users that I sent document recently.
I need to sort by the date document was sent and get unique (distinct) user with relation to this document
This is my linq queries
var recentClients = documentCaseRepository.Entities
.Where(docCase => docCase.AssignedByAgentId == WC.UserContext.UserId)
.OrderByDescending(userWithDate => userWithDate.LastUpdateDate)
.Take(1000) // I need this because if I comment this line then EF generate completely different sql query.
.Select(doc => new { doc.AssignedToClient.Id, doc.AssignedToClient.FirstName, doc.AssignedToClient.LastName })
.Distinct()
.Take(configuration.MaxRecentClientsResults)
.ToList();
and generated sql query is:
SELECT DISTINCT TOP(5) [t].*
FROM (
SELECT TOP(1000) [docCase.AssignedToClient].[Id]
FROM [DocumentCase] AS [docCase]
INNER JOIN [User] AS [docCase.AssignedToClient]
ON ([docCase].[AssignedToClientId] = [docCase.AssignedToClient].[Id])
WHERE [docCase].[AssignedByAgentId] = 3
ORDER BY [docCase].[LastUpdateDate] DESC
)
AS [t]
Every thing is correct for now. But if I delete this line
.Take(1000) // I need this because...
EF generated completely different query such as:
SELECT DISTINCT TOP(5)
[docCase.AssignedToClient].[Id]
FROM [DocumentCase] AS [docCase]
INNER JOIN [User] AS [docCase.AssignedToClient]
ON ([docCase].[AssignedToClientId] = [docCase.AssignedToClient].[Id])
WHERE [docCase].[AssignedByAgentId] = 3
My question is: why EF not generated orderby clause and subquery with distinct?
This is a BUG EF or I'm doing something wrong? And what I must do to generate in linq this sql query ()
SELECT DISTINCT TOP 5 [t].*
FROM ( SELECT [docCase.AssignedToClient].[Id]
FROM [DocumentCase] AS [docCase]
INNER JOIN [User] AS [docCase.AssignedToClient]
ON [docCase].[AssignedToClientId] = [docCase.AssignedToClient].[Id]
WHERE [docCase].[AssignedByAgentId] = 1
ORDER BY [docCase].[LastUpdateDate] DESC
) AS [t]
OrderBy information not always retained across other operators such as Distinct. Entity Framework does not document (to my knowledge) how exactly OrderBy is propagated.
This kind of makes sense because some operators have undefined output order. The fact that ordering is retained in many situations is a convenience for the developer.
Move the OrderBy to the end of the query (or at least past the Distinct).
The reason for the difference in queries is that Distinct messes up result order. So when you first execute OrderBy and then Distinct, you can just es well not execute OrderBy, because this order is lost anyway. So EF can just optimize it away.
Calling Take in between causes the result set to be semantically different: You first order the items, take the first 1000 items of that order and then call Distinct on them.
What you can change in your query depends mainly on the result you want to achieve. Maybe you want to first make the result set distinct then order by date and finally take the amount of items. Other options are also thinkable based on your requirements.

IQueryable Count() method issue with Inner Join created by .Include() related entities

IQueryable<EntityOne> query = entities.EntityOne
.Include(t => t.EntityRelated1)
.Include(t => t.EntityRelated2)
.AsQueryable();
The query generated in "query" variable :
SELECT
[Extent1].[Id] AS [IdEntityOne],
...
[Extent2].[Id] AS [IdEntityRelatedOne],
...
[Extent3].[Id] AS [IdEntityRelatedTwo],
...
FROM [dbo].[EntityOne] AS [Extent1]
INNER JOIN [dbo].[EntityRelatedOne] AS [Extent2]
ON [Extent1].[IdRelatedOne] = [Extent2].[Id]
INNER JOIN [dbo].[EntityRelatedTwo] AS [Extent3]
ON [Extent1].[IdRelatedTwo] = [Extent3].[Id]
After that, on C# code those are the result of counting:
var wrongCount = query.Count(); // Returns 295
var correctCount = query.AsEnumerable().Count(); // Returns 10
The 295 count is the full EntityOne set numbers of registers. (wrong)
The 10 Count is the desired count after Inner Join.
It sounds like the IQueryable.Count() is counting before executing the InnerJoin on database. I don't want to generate the IEnumerable since I hope the count to be executed on Sql Server together with the inner join.
UPDATE 1
Trying to manually execute the inner join:
IQueryable<EntityOne> query2 = entities.EntityOne.Join(entities.EntityTwo,
eone=> eone.IdRelatedOne, en => en.Id,
(x, y) => x);
The SQL code generated in "query2" is :
SELECT
[Extent1].[Id] AS [Id],
...
FROM [dbo].[EntityOne] AS [Extent1]
As you can see, the related entity is not included on Inner Join forced by linq Join statement.
Update 2
I dont know if it matters but, the IdEntityRelated1 on EntityOne is a required property, plus its not a foreign key on database, just a Integer field that stores the Id of the related entity. (Im working with POCO classes with Database First)
I have another working sources where fields but they're nullable integers instead of required. Maybe should I not try to do an Include to force Inner Join between required relationships?
You have a required association, but the expected objects are not present in the database.
But let's first see what EF does.
In the first count...
var wrongCount = query.Count();
...the Includes are ignored. There's no reason to execute them because EF has been told that the referred objects (EntityRelated1 and EntityRelated2 are mandatory, so inner joins are expected to find the related records. If they do, EF figures it may as well just count entities.EntityOne and skip the rest. The Includes are only going to make the query more expensive and they don't affect the result.
You can check that by monitoring the SQL that's executed for the count. That's not the SQL generated when you look at query only! It's probably something that simply boils down to
SELECT COUNT(*) FROM [dbo].[EntityOne]
So the first count returns a correct count of all EntityOne records in the database.
For the second count you force execution of the entire query that's stored in the query variable, the SQL statement that you show. Then you count its results in memory – and it returns 10. This means that the query with the inner joins does actually return 10 records. That, in turn, can only mean one thing: there are 285 EntityOne.IdRelatedOne values that don't point to an existing EntityRelatedOne record. But you mapped the association as required, so EF generates an inner join. An outer join would also return 295.
Include is not a LINQ method proper, is an EntityFramework extension designed to do eager loading and not much else. Includes are lost if the query shape changes:
When you call the Include method, the query path is only valid on the returned instance of the IQueryable of T. Other instances of IQueryable of T and the context itself are not affected.
Specifically this means that, for instance aggregates on top of the Included IQueryable<T> are going to loose the Include (which is exactly what you see).
See Tip 22 - How to make Include really Include, IQueryable.Include() gets ignored, Include in following query does not include really and many more.

Forcing an Entity Framework 6 query to use the correct index

I have a C# application that uses SQLite as the database and the SQLite Entity Framework 6 provider to generate queries based on user input.
The database contains the following tables and indexes:
CREATE TABLE Lists (
ListRowId INTEGER NOT NULL PRIMARY KEY,
ListId GUID NOT NULL,
ListName TEXT NOT NULL
);
CREATE UNIQUE INDEX [IX_Lists_ListId] ON [Lists] ( [ListId] );
-- One to many relationship: Lists => ListDetails
CREATE TABLE ListDetails (
ListDetailRowId INTEGER NOT NULL PRIMARY KEY,
ListDetailId GUID NOT NULL,
ListId GUID NOT NULL,
Plate TEXT
);
CREATE INDEX [IX_ListDetails_Plate] ON [ListDetails] ( [Plate] ASC );
CREATE TABLE Reads (
ReadRowId INTEGER NOT NULL PPRIMARY KEY,
ReadId GUID NOT NULL,
Plate TEXT
);
-- 1 To many relationship: Reads => Alarms.
-- There may be rows in Reads that have no related rows in Alarms.
CREATE TABLE Alarms (
AlarmRowId INTEGER NOT NULL PPRIMARY KEY,
AlarmId GUID NOT NULL,
ListId GUID NOT NULL,
ListDetailId GUID NOT NULL,
ReadRowId INTEGER NOT NULL
);
CREATE INDEX [IX_Alarms_ListId_ListDetailId] ON [Alarms] ([ListId], [ListDetailId]);
CREATE INDEX [IX_Alarms_ReadId] ON [Alarms] ([ReadRowId]);
Please note that the DDL above only includes the relevant columns and indexes. For reasons of speed and the large number of rows in the ListDetails table, there is no index on the ListDetailId GUID column; nor can I create one. In fact, I cannot change the database's schema at all.
The database does not have any foreign key relationships defined between any of these tables. The reason is internal to our system. I repeat, I cannot change the schema.
Using the SQLite EF6 provider, I've built an entity model from the database. It is a database first model as the application was originally written using a different database and EF 4. We upgraded it to EF 6 and replaced the database with SQLite.
While processing user input, I have to put together a query that joins these tables. Here's the basic EF expression I've built.
from read in context.Reads
join alrm in context.Alarms on read.ReadRowId equals alrm.ReadRowId into alarmJoin
from alarm in alarmJoin.DefaultIfEmpty()
join e in context.ListDetails on alarm.ListPlate equals e.Plate into entryJoin
from entry in entryJoin.DefaultIfEmpty()
join l in context.Lists on alarm.ListId equals l.ListId into listJoin
from list in listJoin.DefaultIfEmpty()
where alarm.ListDetailId = entry.ListDetailId
select new {
alarm,
list.ListName,
read
};
I've used the debugger to take that expression and generate the SQL. I've reduced the output for brevity, as the only part I'm interested in are the join on the ListDetails table:
SELECT *
FROM [Reads] AS [Extent1]
LEFT OUTER JOIN [Alarms] AS [Extent2] ON [Extent1].[ReadRowId] = [Extent2].[ReadRowId]
LEFT OUTER JOIN [ListDetails] AS [Extent3] ON ([Extent2].[ListPlate] = [Extent3].[Plate]) OR (([Extent2].[ListPlate] IS NULL) AND ([Extent3].[Plate] IS NULL))
LEFT OUTER JOIN [Lists] AS [Extent4] ON [Extent2].[ListId] = [Extent4].[ListId]
WHERE ([Extent2].[ListDetailId] = [Extent3].[ListDetailId]) OR (([Extent2].[ListDetailId] IS NULL) AND ([Extent3].[ListDetailId] IS NULL))
Executing EXPLAIN QUERY PLAN on this shows that the query will perform a table scan of the ListDetails table. I do not want that to happen; I want the query to use the index on the Plate column.
If I remove the where clause, the SQL that's generated is different:
SELECT *
FROM [Reads] AS [Extent1]
LEFT OUTER JOIN [Alarms] AS [Extent2] ON [Extent1].[ReadRowId] = [Extent2].[ReadRowId]
LEFT OUTER JOIN [ListDetails] AS [Extent3] ON ([Extent2].[ListPlate] = [Extent3].[Plate]) OR (([Extent2].[ListPlate] IS NULL) AND ([Extent3].[Plate] IS NULL))
LEFT OUTER JOIN [Lists] AS [Extent4] ON [Extent2].[ListId] = [Extent4].[ListId]
EXPLAIN QUERY PLAN on this query shows that the database does indeed use the index on the ListDetails table's Plate column. This is what I want to happen. But, there may be multiple rows in the ListDetails table that have the same Plate; it is not a unique field. I need to return the one and only row that matches the information available to me in the Alarms table.
How do I make my query use the index on the Plate column?
Specifying an index requires a query hint. SqlLite uses the INDEXED BY command. Example:
LEFT OUTER JOIN ListDetails as Extent3 INDEXED BY IX_ListDetails_Plate ON Extent2.ListPlate = Extent3.Plate
LINQ does not provide a method to pass a query hint to the database. LINQ's design philosophy is the developer shouldn't worry about the SQL: that is the DBA's job.
So there probably won't be a .With() LINQ extension coming anytime soon.
However, there are several options / workarounds:
1. The "Proper" Way
The "proper" way per LINQ's design philosophy is for a DBA to create a sproc that uses query hints.
The developer will call the sproc with Entity, and get a strongly typed sproc result.
using(applicationDbContext db = new applicationDbContext())
{
var myStronglyTypedResult = db.Database.MySprocMethod();
}
Easiest way, with Entity handling the translations and class creations. However, you will need permission to create the sproc. And it sounds like you do not have that option.
2. Old School
If LINQ doesn't want to use query hints, then don't use LINQ for the query. Simple enough. Back to DataAdapters and hard coded SQL queries. You already have the query designed, might as well use it.
3. DataContext's Sql Interface
DataContext has a SQL interface built in: The SqlQuery<T>(string sql, object[] params) method. Database.SqlQuery
public class ListDetail
{
public int ListDetailRowId {get; set;}
public Guid ListDetialId {get; set;}
public Guid ListId {get; set;}
public string Plate {get; set;}
}
using(ApplicationDbContext db = new ApplicationDbContext())
{
List<ListDetail> results = db.Database.SqlQuery<ListDetail>("SELECT * FROM ListDetails INDEXED BY IX_my_index WHERE ListDetailRowId = #p0", new object[] {50}).ToList();
return results;
}
This gives you the flexibility of straight SQL, you don't have to mess with connection strings nor DataAdapters, and Entity / DataContext handles the translation from DataTable to Entity for you.
However, you will need to manually create the Entity class. It is the same as any other Entity class, it just won't be automatically created like the sproc method does.
This will probably be your best bet.
While this was a while ago, and I am no longer in that job, I wanted to take a minute to describe how we got around this problem. The problems are:
SQLite does not support Stored Procedures, so there's no way to work around the problem from the database side,
You can't embed the INDEXED BY hint into the LINQ query.
The way we ended up fixing this was by implementing a custom user function in the entity model which added the required INDEXED BY hint to the SQL generated by EF. We also implemented a couple of other user functions for a few other SQL Hints supported by SQLite. This allowed up to put the condition for the join that required the hint inside of our user function and EF did the rest.
As I said, I'm no longer in that position, so I can't include any code, but it's just a matter of adding some XML to the entity model file that defines the user functions and defining a class that has placeholder functions. This is all documented in the EF documents.

Why does the Entity Framework generate nested SQL queries?

Why does the Entity Framework generate nested SQL queries?
I have this code
var db = new Context();
var result = db.Network.Where(x => x.ServerID == serverId)
.OrderBy(x=> x.StartTime)
.Take(limit);
Which generates this! (Note the double select statement)
SELECT
`Project1`.`Id`,
`Project1`.`ServerID`,
`Project1`.`EventId`,
`Project1`.`StartTime`
FROM (SELECT
`Extent1`.`Id`,
`Extent1`.`ServerID`,
`Extent1`.`EventId`,
`Extent1`.`StartTime`
FROM `Networkes` AS `Extent1`
WHERE `Extent1`.`ServerID` = #p__linq__0) AS `Project1`
ORDER BY
`Project1`.`StartTime` DESC LIMIT 5
What should I change so that it results in one select statement? I'm using MySQL and Entity Framework with Code First.
Update
I have the same result regardless of the type of the parameter passed to the OrderBy() method.
Update 2: Timed
Total Time (hh:mm:ss.ms) 05:34:13.000
Average Time (hh:mm:ss.ms) 25:42.000
Max Time (hh:mm:ss.ms) 51:54.000
Count 13
First Seen Nov 6, 12 19:48:19
Last Seen Nov 6, 12 20:40:22
Raw query:
SELECT `Project?`.`Id`, `Project?`.`ServerID`, `Project?`.`EventId`, `Project?`.`StartTime` FROM (SELECT `Extent?`.`Id`, `Extent?`.`ServerID`, `Extent?`.`EventId`, `Extent?`.`StartTime`, FROM `Network` AS `Extent?` WHERE `Extent?`.`ServerID` = ?) AS `Project?` ORDER BY `Project?`.`Starttime` DESC LIMIT ?
I used a program to take snapshots from the current process in MySQL.
Other queries were executed at the same time, but when I change it to just one SELECT statement, it NEVER goes over one second. Maybe I have something else that's going on; I'm asking 'cause I'm not so into DBs...
Update 3: The explain statement
The Entity Framework generated
'1', 'PRIMARY', '<derived2>', 'ALL', NULL, NULL, NULL, NULL, '46', 'Using filesort'
'2', 'DERIVED', 'Extent?', 'ref', 'serveridneventid,serverid', 'serveridneventid', '109', '', '45', 'Using where'
One liner
'1', 'SIMPLE', 'network', 'ref', 'serveridneventid,serverid', 'serveridneventid', '109', 'const', '45', 'Using where; Using filesort'
This is from my QA environment, so the timing I pasted above is not related to the rowcount explain statements. I think that there are about 500,000 records that match one server ID.
Solution
I switched from MySQL to SQL Server. I don't want to end up completely rewriting the application layer.
It's the easiest way to build the query logically from the expression tree. Usually the performance will not be an issue. If you are having performance issues you can try something like this to get the entities back:
var results = db.ExecuteStoreQuery<Network>(
"SELECT Id, ServerID, EventId, StartTime FROM Network WHERE ServerID = #ID",
serverId);
results = results.OrderBy(x=> x.StartTime).Take(limit);
My initial impression was that doing it this way would actually be more efficient, although in testing against a MSSQL server, I got <1 second responses regardless.
With a single select statement, it sorts all the records (Order By), and then filters them to the set you want to see (Where), and then takes the top 5 (Limit 5 or, for me, Top 5). On a large table, the sort takes a significant portion of the time. With a nested statement, it first filters the records down to a subset, and only then does the expensive sort operation on it.
Edit: I did test this, but I realized I had an error in my test which invalidated it. Test results removed.
Why does Entity Framework produce a nested query? The simple answer is because Entity Framework breaks your query expression down into an expression tree and then uses that expression tree to build your query. A tree naturally generates nested query expressions (i.e. a child node generates a query and a parent node generates a query on that query).
Why doesn't Entity Framework simplify the query down and write it as you would? The simple answer is because there is a limited amount of work that can go into the query generation engine, and while it's better now than it was in earlier versions it's not perfect and probably never will be.
All that said there should be no significant speed difference between the query you would write by hand and the query EF generated in this case. The database is clever enough to generate an execution plan that applies the WHERE clause first in either case.
If you want to get the EF to generate the query without the subselect, use a constant within the query, not a variable.
I have previously created my own .Where and all other LINQ methods that first traverse the expression tree and convert all variables, method calls etc. into Expression.Constant. It was done just because of this issue in Entity Framework...
I just stumbled upon this post because I suffer from the same problem. I already spend days tracking this down and it it is just a poor query generation in mysql.
I already filed a bug at mysql.com http://bugs.mysql.com/bug.php?id=75272
To summarize the problem:
This simple query
context.products
.Include(x => x.category)
.Take(10)
.ToList();
gets translated into
SELECT
`Limit1`.`C1`,
`Limit1`.`id`,
`Limit1`.`name`,
`Limit1`.`category_id`,
`Limit1`.`id1`,
`Limit1`.`name1`
FROM (SELECT
`Extent1`.`id`,
`Extent1`.`name`,
`Extent1`.`category_id`,
`Extent2`.`id` AS `id1`,
`Extent2`.`name` AS `name1`,
1 AS `C1`
FROM `products` AS `Extent1` INNER JOIN `categories` AS `Extent2` ON `Extent1`.`category_id` = `Extent2`.`id` LIMIT 10) AS `Limit1`
and performs pretty well. Anyway, the outer query is pretty much useless. Now If I add an OrderBy
context.products
.Include(x => x.category)
.OrderBy(x => x.id)
.Take(10)
.ToList();
the query changes to
SELECT
`Project1`.`C1`,
`Project1`.`id`,
`Project1`.`name`,
`Project1`.`category_id`,
`Project1`.`id1`,
`Project1`.`name1`
FROM (SELECT
`Extent1`.`id`,
`Extent1`.`name`,
`Extent1`.`category_id`,
`Extent2`.`id` AS `id1`,
`Extent2`.`name` AS `name1`,
1 AS `C1`
FROM `products` AS `Extent1` INNER JOIN `categories` AS `Extent2` ON `Extent1`.`category_id` = `Extent2`.`id`) AS `Project1`
ORDER BY
`Project1`.`id` ASC LIMIT 10
Which is bad because the order by is in the outer query. Theat means MySQL has to pull every record in order to perform an orderby which results in using filesort
I verified that SQL Server (Comapact at least) does not generate nested queries for the same code
SELECT TOP (10)
[Extent1].[id] AS [id],
[Extent1].[name] AS [name],
[Extent1].[category_id] AS [category_id],
[Extent2].[id] AS [id1],
[Extent2].[name] AS [name1],
FROM [products] AS [Extent1]
LEFT OUTER JOIN [categories] AS [Extent2] ON [Extent1].[category_id] = [Extent2].[id]
ORDER BY [Extent1].[id] ASC
Actually the queries generated by Entity Framework are few ugly, less than LINQ 2 SQL but still ugly.
However, very probably you database engine will make the desired execution plan, and the query will run smoothly.

Does LINQ with a scalar result trigger the lazy loading

I read the Loading Related Entities post by the Entity Framework team and got a bit confused by the last paragraph:
Sometimes it is useful to know how many entities are related to another entity in the database without actually incurring the cost of loading all those entities. The Query method with the LINQ Count method can be used to do this. For example:
using (var context = new BloggingContext())
{
var blog = context.Blogs.Find(1);
// Count how many posts the blog has
var postCount = context.Entry(blog)
.Collection(b => b.Posts)
.Query()
.Count();
}
Why do the Query + Count method needed here?
Can't we simple use the LINQ's COUNT method instead?
var blog = context.Blogs.Find(1);
var postCount = blog.Posts.Count();
Will that trigger the lazy loading and all the collection will be loaded to the memory and just than I'll get my desired scalar value?
You will get your desired scalar value in bot cases. But consider the difference in what's happening.
With .Query().Count() you run a query on the database of the form SELECT COUNT(*) FROM Posts and assign that value to your integer variable.
With .Posts.Count, you run (something like) SELECT * FROM Posts on the database (much more expensive already). Each row of the result is then mapped field-by-field into your C# object type as the collection is enumerated to find your count. By asking for the count in this way, you are forcing all of the data to be loaded so that C# can count how much there is.
Hopefully it's obvious that asking the database for the count of rows (without actually returning all of those rows) is much more efficient!
The first method is not loading all rows since the Count method is invoked from an IQueryable but the second method is loading all rows since it is invoked from an ICollection.
I did some testings to verify it. I tested it with Table1 and Table2 which Table1 has the PK "Id" and Table2 has the FK "Id1" (1:N). I used EF profiler from here http://efprof.com/.
First method:
var t1 = context.Table1.Find(1);
var count1 = context.Entry(t1)
.Collection(t => t.Table2)
.Query()
.Count();
No Select * From Table2:
SELECT TOP (2) [Extent1].[Id] AS [Id]
FROM [dbo].[Table1] AS [Extent1]
WHERE [Extent1].[Id] = 1 /* #p0 */
SELECT [GroupBy1].[A1] AS [C1]
FROM (SELECT COUNT(1) AS [A1]
FROM [dbo].[Table2] AS [Extent1]
WHERE [Extent1].[Id1] = 1 /* #EntityKeyValue1 */) AS [GroupBy1]
Second method:
var t1 = context.Table1.Find(1);
var count2 = t1.Table2.Count();
Table2 is loaded into memory:
SELECT TOP (2) [Extent1].[Id] AS [Id]
FROM [dbo].[Table1] AS [Extent1]
WHERE [Extent1].[Id] = 1 /* #p0 */
SELECT [Extent1].[Id] AS [Id],
[Extent1].[Id1] AS [Id1]
FROM [dbo].[Table2] AS [Extent1]
WHERE [Extent1].[Id1] = 1 /* #EntityKeyValue1 */
Why is this happening?
The result of Collection(t => t.Table2) is a class that implements ICollection but it is not loading all rows and has a property named IsLoaded. The result of the Query method is an IQueryable and this allows calling Count without preloading rows.
The result of t1.Table2 is an ICollection and it is loading all rows to get the count.
By the way, even if you use only t1.Table2 without asking for the count, rows are loaded into memory.
The first solution doesn't trigger the lazy loading because it most probably never access the collection property directly. The Collection method accepts Expression, not just delegate. It is used only to get the name of the property which is than used to access mapping information and build correct query.
Even if it would access the collection property it could use the same strategy as other internal parts of EF (for example validation) which turns off lazy loading temporarily before accessing navigation properties to avoid unexpected lazy loading.
Btw. this is a huge improvement in contrast to ObjectContext API where building query required accessing the navigation property and thus it could trigger lazy loading.
There is one more difference between those two approaches:
The first always executes query to database and returns count of items in the database
The second executes query to database only once to load all items and then returns counts of items in the application without checking state in the database
As the third quite interesting option you can use extra loading. The implementation by Arthur Vickers shows how to use navigation property to get count from the database without lazy loading items.

Categories

Resources