How to get an EF query to compile the most optimised SQL?

How to get an EF query to compile the most optimised SQL? - c#

I'm fairly new to EF, and this is something that has been bugging me for a couple of days now:
I have a User entity. It has a parent WorkSpace, which has a collection of Users.
Each User also has a collection of children Schedule, in a User.Schedules property.
I'm navigating through the objects like this:
var query = myUser.WorkSpace.Users.SelectMany(u => u.Schedules);
When enumerating the results of query (myUser is an instance of User that has been loaded previously using .Find(userid)), I noticed that EF makes one query to the db for each User in WorkSpace.Users
How come EF doesn't get the results in one single query, starting from the primary key of myUser, and joining on this with all the tables involved?
If I do something else directly from the context such as this, it works fine though:
context.Users.Where(u => u.ID = userid).SelectMany(u => u.WorkSpace.Users.SelectMany(u => u.Schedules))
Is there something I'm doing wrong?

Let take the first query:
var query = myUser.WorkSpace.Users.SelectMany(u => u.Schedules);
If you look at the type of the query variable, you'll see that it is IEnumerable<Schedule>, which means this is a regular LINQ to Objects query. Why? Because it starts from materialized object, then accession another object/collection etc. This combined with the EF lazy loading feature is leading to the multiple database query behavior.
If you do the same for the second query:
var query = context.Users.Where(u => u.ID = userid)
.SelectMany(u => u.WorkSpace.Users.SelectMany(u => u.Schedules))
you'll notice that the type of the query is now IQueryable<Schedule>, which means now you have a LINQ to Entities query. This is because neither context.Users nor other object/collections used inside the query are real objects - they are just metadata used to build, execute and materialize the query.
To recap, you are not doing something wrong. The lazy loading works this way. If you don't care about so called N+1 query issue, you can use the first approach. If you do care, then use the second.

Related

SelectMany query with Where produces many SQL queries

I'm using for a GetAppRolesForUser function (and have tried variations of based on answers here):
private AuthContext db = new AuthContext();
...
var userRoles = Mapper.Map<List<RoleApi>>(
db.Users.SingleOrDefault(u => u.InternetId == username)
.Groups.SelectMany(g => g.Roles.Where(r => r.Asset.AssetName == application)));
I end up with this in SQL Profiler for every single RolesId each time:
exec sp_executesql N'SELECT
[Extent2].[GroupId] AS [GroupId],
[Extent2].[GroupName] AS [GroupName]
FROM [Auth].[Permissions] AS [Extent1]
INNER JOIN [Auth].[Groups] AS [Extent2] ON [Extent1].[GroupId] = [Extent2].[GroupId]
WHERE [Extent1].[RolesId] = #EntityKeyValue1',N'#EntityKeyValue1 int',#EntityKeyValue1=6786
How do I refactor so EF produces a single query for userRoles and doesn't take 18 seconds to run?

I think the problem is you're lazy loading the groups and roles.
One solution is eager load them before you call SingleOrDefault
var user = db.Users.Include(x => x.Groups.Select(y => y.Roles))
.SingleOrDefault(u => u.InternetId == username);
var groups = user.Groups.SelectMany(
g => g.Roles.Where(r => r.Asset.AssetName == application));
var userRoles = Mapper.Map<List<RoleApi>>(groups);
Also note : there is no sanity checking for null here.

TheGeneral's answer covers why you are getting caught out with lazy loading. You may also need to include Asset to get AssetName.
With AutoMapper you can avoid the need to Eager Load the entities by employing .ProjectTo<T>() to the IQueryable, provided there is a User accessible in Group.
For instance:
var roles = db.Groups.Where(g => g.User.Internetid == username)
.SelectMany(g => g.Roles.Where(r => r.Asset.AssetName == application))
.ProjectTo<RoleApi>()
.ToList();
This should leverage the deferred execution where AutoMapper will effectively project in the .Select() needed to populate the RoleApi instance based on your mapping/inspection.

Here is another way of avoiding lazy loading. You can also look at projection and have only those fields which you need rather than loading the entire columns.
var userRoles = Mapper.Map<List<RoleApi>>(
db.Users.Where(u => u.InternetId == username).Select(../* projection */ )
.Groups.SelectMany(g => g.Roles.Where(r => r.Asset.AssetName == application)));
EF also comes with Include:
var userRoles = Mapper.Map<List<RoleApi>>(
db.Users.Where(u => u.InternetId == username).Select(../* projection */ )
.Include(g => g.Roles.Where(r => r.Asset.AssetName == application)));
Then can iterate the collection using multiple for loops.

You have to be aware of two differences:
The difference between IEnumerable and IQueryable
The difference between functions that return IQueryable<TResult> (lazy) and functions that return TResult (executing)
Difference between Enumerable and Queryable
. A LINQ statement that is AsEnumerable is meant to be processed in your local process. It contains all code and all calls to execute the statement. This statement is executed as soon as GetEnumerator and MoveNext are called, either explicitly, or implicitly using foreach or LINQ statements that don't return IEnumerable<...>, like ToList, FirstOrDefault, and Any.
In contrast, an IQueryable is not meant to be processed in your process (however it can be done if you want). It is usually meant to be processed by a different process, usually a database management system.
For this an IQueryable holds an Expression and a Provider. The Expression represents the query that must be executed. The Provider knows who must execute the query (the DBMS), and which language this executor uses (usually SQL). When GetEnumerator and MoveNext are called, the Provider takes the Expression and translates it into the language of the Executor. The query is sen't to the executor. The returned data is presented AsEnumerable, where GetEnumerator and MoveNext are called.
Because of this translation into SQL, an IQueryable can't do all the things that an IEnumerable can do. The main thing is that it can't call your local functions. It can't even execute all LINQ functions. The better the quality of the Provider the more it can do. See supported and unsupported LINQ methods
Lazy LINQ methods and executing LINQ methods
There are two groups of LINQ methods. Those that return `IQueryable<...>/IEnumerable<...> and those that do not.
The first group use lazy loading. This means that at the end of the LINQ statement the query has been created, but it is not executed yet. Only 'GetEnumeratorandMoveNextwill make that theProviderwill translate theExpression` and order the DBMS to execute the query.
Concatenating IQueryables will only change the Expression. This is a fairly fast procedure. Hence there is no performance gain if you make one big LINQ expression instead of concatenate them before you execute the query.
Usually the DBMS is smarter and better prepared to do selections than your process. The transfer of selected data to your local process is one of the slower parts of your query.
Advice: Try to create your LINQ statements such, that the executing
statement is the last one that can be executed by the DBMS. Make sure
that you only select the properties that you actually plan to use.
So for example, don't transfer foreign keys if you don't use them.
Back to your question
Leaving the mapper out of the question you start with:
db.Users.SingleOrDefault(...)
SingleOrDefault is a non-lazy function. It doesn't return IQueryable<...>. It will execute the query. It will transport one complete User to your local process, including its Roles.
Advice postpone the SingleOrDefault to the last statement:
var result = myDbcontext.Users
.Where(user => user.InternetId == username)
.SelectMany(user => user.Groups.Roles.Where(role => role.Asset.AssetName == application))
// until here, the query is not executed yet, execute it now:
.SingleOrDefault();
In words: From the sequence of Users, keep only those Users with an InternetId that equals userName. From all remaining Users (which you hope to be only one), select the sequence of Roles of the Groups of each User. However, we don't want to select all Roles, we only keep the Roles with an AssetName equal to application. Now put all remaining Roles into one collection (the many part in SelectMany), and select the zero or one remaining Role that you expect.

What is the difference between Joining two different DB Context using ToList() and .AsQueryable()?

Case 1:
I am Joined two different DB Context by ToList() method in Both Context.
Case 2:
And also tried Joining first Db Context with ToList() and second with AsQueryable().
Both worked for me. All I want to know is the difference between those Joinings regarding Performance and Functionality. Which one is better ?
var users = (from usr in dbContext.User.AsNoTracking()
select new
{
usr.UserId,
usr.UserName
}).ToList();
var logInfo= (from log in dbContext1.LogInfo.AsNoTracking()
select new
{
log.UserId,
log.LogInformation
}).AsQueryable();
var finalQuery= (from usr in users
join log in logInfo on usr.UserId equals log.UserId
select new
{
usr.UserName,
log.LogInformation
}.ToList();

I'll elaborate answer that was given by Jehof in his comment. It is true that this join will be executed in the memory. And there are 2 reasons why it happens.
Firstly, this join cannot be performed in a database because you are joining an object in a memory (users) with a deferred query (logInfo). Based on that it is not possible to generate a query that could be send to a database. It means that before performing the actual join a deferred query is executed and all logs are retrieved from a database. To sum up, in this scenario 2 queries are executed in a database and join happens in memory. It doesn't matter if you use ToList + AsQueryable or ToList + ToList in this case.
Secondly, in your scenario this join can be performed ONLY in a memory. Even if you use AsQueryable with the first context and with the second context it will not work. You will get System.NotSupportedException exception with the message:
The specified LINQ expression contains references to queries that are associated with different contexts.
I wonder why you're using 2 DB contexts. Is it really needed? As I explained because of that you lost a possibility to take full advantage of deferred queries (lazy evaluation features).
If you really have to use 2 DB contexts, I'll consider adding some filters (WHERE conditions) to queries responsible for reading users and logs from DB. Why? For small number of records there is no problem. However, for large amount of data it is not efficient to perform joins in memory. For this purpose databases were created.

It hasn't been explained yet why the statements actually work and why EF doesn't throw an exception that you can only use sequences of primitive types in a LINQ statement.
If you swap both lists ...
var finalQuery= (from log in logInfo
join usr in users on log.UserId equals usr.UserId
...
EF will throw
Unable to create a constant value of type 'User'. Only primitive types or enumeration types are supported in this context.
So why does your code work?
That will become clear if we convert your statement to method syntax (which the runtime does under the hood):
users.Join(logInfo, usr => usr.UserId, log => log.UserId
(usr,log) => new
{
usr.UserName,
log.LogInformation
}
Since users is an IEnumerable, the extension method Enumerable.Join is resolved as the appropriate method. This method accepts an IEnumerable as the second list to be joined. Therefore, logInfo is implicitly cast to IEnumerable, so it runs as a separate SQL statement before it partakes in the join.
In the version from log in logInfo join usr ..., Queryable.Join is used. Now usr is converted into an IQueryable. This turns the whole statement into one expression that EF unsuccessfully tries to translate into one SQL statement.
Now a few words on
Which one is better?
The best option is the one that does just enough to make it work. That means that
You can remove AsQueryable(), because logInfo already is an IQueryable and it is cast to IEnumerable anyway.
You can replace ToList() by AsEnumerable(), because ToList() builds a redundant intermediate result, while AsEnumerable() only changes the runtime type of users, without triggering its execution yet.

ToList()
Execute the query immediately
You will get all the elements ready in memory
AsQueryable()
lazy (execute the query later)
Parameter: Expression<Func<TSource, bool>>
Convert Expression into T-SQL (with specific provider), query remotely and load result to your application memory.
That’s why DbSet (in Entity Framework) also inherits IQueryable to get efficient query.
It does not load every record. E.g. if Take(5), it will generate select top 5 * SQL in the background.

Explicitly loading all navigation properties in a list of previously fetched objects in Entity Framework?

Suppose I have something like
var remoteData = query.Where(s => <conditions here>).ToArray();
and every object in the array has a navigation property called Department.
Is there a way of explicitly loading all the Department properties in a single query to the SQL server.
Doing something like this results in numerous queries
remoteData.ForEach(rd =>
{
rd.DepartmentReference.Load();
});
I know about Include but that is too slow. I want to load everything after the filtering takes place.

No, if you want to get data faster, you should select releted entities in separate query like this:
var deps = dbContext.DepartmentReference.Where(o=>...).ToDictionary(o=>o.DataID);
Here you can join this query on your query not to repeat condition.
And then set values:
remoteData.ForEach(rd =>
{
rd.DepartmentReference = deps[rd.ID];
});

Does Entity Framework Selects all rows in memory when applying a Where filter?

There is one thing makes me confused. I think EF selects all rows (all records) in table.
Let me show you a sample.
public Category GetByID(int Id)
{
return context.Categories.Find(Id);
}
there are a lot of records in table and when i check them with break point i can see all the records not only the Id numbered one. What if there are 10k records in table? I test this. I copied all record manually in database and i make 30k records.
An expression like this,
IEnumerable<Category> categories = categoryRepository
.Where(x => x.Published == true)
.ToList();
I saw 30k records with break point. But at least 10k Published False in table.
Is Entity framework firstly fetches all of the records to memory and after filters them?

TL;DR
It's likely your categoryRepository has broken EF's IQueryable<> expression tree, and is materializing the entire Category table in order to run the .Where predicate. See the examples below.
More Detail
The short answer is no, provided that Entity Framework is able to parse the IQueryable<> expression (which includes the .Where predicates you specify) it will convert the associated expression tree into native Sql using the appropriate query provider for the RDBMS you are targetting, thus allowing all the benefits of Sql, e.g. use of indexes.
As per my comment, one of the common reasons why EF would not do this is if the IQueryable mechanism has been tampered with, for instance, if your Repository pattern implementation uses the IEnumerable<T> overload of the Where predicate and not the IQueryable overload.
As a result, EF has no other option but to fetch the table and execute every row against your predicate function to determine whether the row matches your predicate or not.
As an aside, there is some debate whether there is merit in wrapping a DbContext in a Repository and / or Unit Of Work wrapper, as a DbContext is Transactional, performs caching, can be mocked during unit testing, and now supports a wide range of databases.
Examples of where materialization happens and how this affects performance
(The point at which the Sql query is actually executed is often referred to as materialization)
I've excluded the OP's repository - i.e. we're using the DbContext directly.
Best:
var miniFoos = myDbContext.MyFooSet
.Where(f => f.SomeProperty = "foo")
.Select(f => new {...})
.ToList();
This is good, because, we've applied both the predicate and a projection of just the columns we need in SQL, before we materialize the data into a List (of an anonymous type)
OK:
var foos = myDbContext.MyFooSet
.Where(f => f.SomeProperty = "foo")
.ToList() // Or .AsEnumerable(), or other materialization methods
.Select(f => new {...}); // Subset of fields
This isn't ideal, because although we've applied the .Where clause before we materialize, we're returning the full columns in of Foo table, just to discard unnecessary columns. This means unnecessary I/O, and also, Sql might have been able to use just an index to perform the query.
Bad - Never do this
var foos = myDbContext.MyFooSet
.AsEnumerable() // (or `ToList()`, same problem)
.Where(f => f.SomeProperty = "foo")
.Select(f => new {...}); // Subset of fields
This seems to be what the OP is experiencing - since the table is materialized BEFORE any .Where filtering takes place, the whole table IS retrieved to memory and filtering happens with Linq to Objects, instead.
This problem can also happen when applying custom .Where predicate builders which don't use Expressions, or which use IEnumerable<T> instead of IQueryable<T> - IEnumerable has no associated expression tree and can't be parsed to Sql.

optimize linq query with related entities

i am new to linq, i started writing this query:
var dProjects = Projects
.Select(p => new Models.Project {
ProjectID = p.ProjectID,
Status = p.Status,
ExpiresOn = p.ExpiresOn,
LatestComments = p.ProjectComments
.OrderByDescending(pc => pc.CreatedOn)
.Select(pc => pc.Comments)
.FirstOrDefault(),
ProjectFileIDs = p.ProjectFiles
.Select(pf => pf.BinaryFileID)
.AsQueryable()
})
.AsQueryable<Models.Project>();
I already know this query will perform really slow because related entities like ProjectComments and ProjectFiles will create nested selects, though it works and gives me right results that i need.
How can i optimize this query and get the same results? One of my guesses would be using inner join but ProjectComments and ProjectFiles already has a relationship in database through keys, so not sure what we can achieve by setting the relationship again.
Basically, need to know which is the best approach to take here from performance perspective. One thing to note is i am sorting ProjectComments and only taking the most recent one. Should i be using combination of join and group by into? Help will be much appreciated. Thanks.
UPDATED:
Sorry, if i wasn't clear enough on what i am trying to do. Basically, in front end, i have a grid, which shows list of projects with latest project comments and list of all the files associated to project, so users can click on those links and actually open those documents. So the query that i have above is working and it does show the following in the grid:
Project ID (From Project table)
Status (From Project table)
ExpiresOn (From Project table)
LatestComments (latest entry from ProjectComments table which has project ID as foreign key)
ProjectFileIDs (list of file ids from ProjectFiles table which has Project ID as foreign key - i am using those File IDs and creating links so users can open those files).
So everything is working, i have it all setup, but the query is little slow. Right now we have very little data (only test data), but once this is launched, i am expecting lot of users/data and thus i want to optimize this query to the best, before it goes live. So, the goal here is to basically optimize. I am pretty sure this is not the best approach, because this will create nested selects.

In Entity Framework, you can drastically improve the performance of the queries by returning the objects back as an object graph instead of a projection. Entity Framework is extremely efficient at optimizing all but the most complex SQL queries, and can take advantage of deferred "Eager" loading vs. "Lazy" Loading (not loading related items from the db until they are actually accessed). This MSDN reference is a good place to start.
As far as your specific query is concerned, you could use this technique something like the following:
var dbProjects = yourContext.Projects
.Include(p => p.ProjectComments
.OrderByDescending(pc => pc.CreatedOn)
.Select(pc => pc.Comments)
.FirstOrDefault()
)
.Include(p => p.ProjectFileIDs)
.AsQueryable<Models.Project>();
note the .Include() being used to imply Eager Loading.
From the MDSN Reference on Loading Related Objects,
Performance Considerations
When you choose a pattern for loading related entities, consider the behavior of each approach with regard to the number and timing of connections made to the data source versus the amount of data returned by and the complexity of using a single query. Eager loading returns all related entities together with the queried entities in a single query. This means that, while there is only one connection made to the data source, a larger amount of data is returned in the initial query. Also, query paths result in a more complex query because of the additional joins that are required in the query that is executed against the data source.
Explicit and lazy loading enables you to postpone the request for related object data until that data is actually needed. This yields a less complex initial query that returns less total data. However, each successive loading of a related object makes a connection to the data source and executes a query. In the case of lazy loading, this connection occurs whenever a navigation property is accessed and the related entity is not already loaded.

Do you get any boost in performance if you add Include statements before the Select?
Example:
var dProjects = Projects
.Include(p => p.ProjectComments)
.Include(p => p.ProjectFiles)
Include allows all matching ProjectComments and ProjectFiles to be eagerly loaded. See Loading Related Entities for more details.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.