SelectMany query with Where produces many SQL queries

SelectMany query with Where produces many SQL queries - c#

I'm using for a GetAppRolesForUser function (and have tried variations of based on answers here):
private AuthContext db = new AuthContext();
...
var userRoles = Mapper.Map<List<RoleApi>>(
db.Users.SingleOrDefault(u => u.InternetId == username)
.Groups.SelectMany(g => g.Roles.Where(r => r.Asset.AssetName == application)));
I end up with this in SQL Profiler for every single RolesId each time:
exec sp_executesql N'SELECT
[Extent2].[GroupId] AS [GroupId],
[Extent2].[GroupName] AS [GroupName]
FROM [Auth].[Permissions] AS [Extent1]
INNER JOIN [Auth].[Groups] AS [Extent2] ON [Extent1].[GroupId] = [Extent2].[GroupId]
WHERE [Extent1].[RolesId] = #EntityKeyValue1',N'#EntityKeyValue1 int',#EntityKeyValue1=6786
How do I refactor so EF produces a single query for userRoles and doesn't take 18 seconds to run?

I think the problem is you're lazy loading the groups and roles.
One solution is eager load them before you call SingleOrDefault
var user = db.Users.Include(x => x.Groups.Select(y => y.Roles))
.SingleOrDefault(u => u.InternetId == username);
var groups = user.Groups.SelectMany(
g => g.Roles.Where(r => r.Asset.AssetName == application));
var userRoles = Mapper.Map<List<RoleApi>>(groups);
Also note : there is no sanity checking for null here.

TheGeneral's answer covers why you are getting caught out with lazy loading. You may also need to include Asset to get AssetName.
With AutoMapper you can avoid the need to Eager Load the entities by employing .ProjectTo<T>() to the IQueryable, provided there is a User accessible in Group.
For instance:
var roles = db.Groups.Where(g => g.User.Internetid == username)
.SelectMany(g => g.Roles.Where(r => r.Asset.AssetName == application))
.ProjectTo<RoleApi>()
.ToList();
This should leverage the deferred execution where AutoMapper will effectively project in the .Select() needed to populate the RoleApi instance based on your mapping/inspection.

Here is another way of avoiding lazy loading. You can also look at projection and have only those fields which you need rather than loading the entire columns.
var userRoles = Mapper.Map<List<RoleApi>>(
db.Users.Where(u => u.InternetId == username).Select(../* projection */ )
.Groups.SelectMany(g => g.Roles.Where(r => r.Asset.AssetName == application)));
EF also comes with Include:
var userRoles = Mapper.Map<List<RoleApi>>(
db.Users.Where(u => u.InternetId == username).Select(../* projection */ )
.Include(g => g.Roles.Where(r => r.Asset.AssetName == application)));
Then can iterate the collection using multiple for loops.

You have to be aware of two differences:
The difference between IEnumerable and IQueryable
The difference between functions that return IQueryable<TResult> (lazy) and functions that return TResult (executing)
Difference between Enumerable and Queryable
. A LINQ statement that is AsEnumerable is meant to be processed in your local process. It contains all code and all calls to execute the statement. This statement is executed as soon as GetEnumerator and MoveNext are called, either explicitly, or implicitly using foreach or LINQ statements that don't return IEnumerable<...>, like ToList, FirstOrDefault, and Any.
In contrast, an IQueryable is not meant to be processed in your process (however it can be done if you want). It is usually meant to be processed by a different process, usually a database management system.
For this an IQueryable holds an Expression and a Provider. The Expression represents the query that must be executed. The Provider knows who must execute the query (the DBMS), and which language this executor uses (usually SQL). When GetEnumerator and MoveNext are called, the Provider takes the Expression and translates it into the language of the Executor. The query is sen't to the executor. The returned data is presented AsEnumerable, where GetEnumerator and MoveNext are called.
Because of this translation into SQL, an IQueryable can't do all the things that an IEnumerable can do. The main thing is that it can't call your local functions. It can't even execute all LINQ functions. The better the quality of the Provider the more it can do. See supported and unsupported LINQ methods
Lazy LINQ methods and executing LINQ methods
There are two groups of LINQ methods. Those that return `IQueryable<...>/IEnumerable<...> and those that do not.
The first group use lazy loading. This means that at the end of the LINQ statement the query has been created, but it is not executed yet. Only 'GetEnumeratorandMoveNextwill make that theProviderwill translate theExpression` and order the DBMS to execute the query.
Concatenating IQueryables will only change the Expression. This is a fairly fast procedure. Hence there is no performance gain if you make one big LINQ expression instead of concatenate them before you execute the query.
Usually the DBMS is smarter and better prepared to do selections than your process. The transfer of selected data to your local process is one of the slower parts of your query.
Advice: Try to create your LINQ statements such, that the executing
statement is the last one that can be executed by the DBMS. Make sure
that you only select the properties that you actually plan to use.
So for example, don't transfer foreign keys if you don't use them.
Back to your question
Leaving the mapper out of the question you start with:
db.Users.SingleOrDefault(...)
SingleOrDefault is a non-lazy function. It doesn't return IQueryable<...>. It will execute the query. It will transport one complete User to your local process, including its Roles.
Advice postpone the SingleOrDefault to the last statement:
var result = myDbcontext.Users
.Where(user => user.InternetId == username)
.SelectMany(user => user.Groups.Roles.Where(role => role.Asset.AssetName == application))
// until here, the query is not executed yet, execute it now:
.SingleOrDefault();
In words: From the sequence of Users, keep only those Users with an InternetId that equals userName. From all remaining Users (which you hope to be only one), select the sequence of Roles of the Groups of each User. However, we don't want to select all Roles, we only keep the Roles with an AssetName equal to application. Now put all remaining Roles into one collection (the many part in SelectMany), and select the zero or one remaining Role that you expect.

Related

How to deconstruct a LINQ IQueryable / Expression into the constituent parts?

I am writing an extension method which is meant to help prepare a LINQ query before it is actually executed in EF Core.
For example:
var context = new SchoolContext();
var studentWithGrade = context.Students
.Where(s => s.FirstName == "Bill")
.Include(s => s.Grade)
.Prepare()
.FirstOrDefault();
I am building the Prepare() extension on the IQueryable<> interface. In the method, I need to know things about the resulting query to be executed, such as:
The predicate (s.FirstName == "Bill")
The selected fields (from the Students table, as well as any fields from s.Grade)
The limit (an implied 1 in this case)
... and so on.
AFAICT, the Expression on the IQueryable is transformed into a SQL query (and executed) when FirstOrDefault() is called. So this transformation is obviously possible. But I would like to examine the data in a structured way, rather than inspecting a SQL query string. My hope was that I could simply inspect the queryable (query.Limit < 0) but the actual storage of these constraints seems buried with reflection in Expression object (?) based upon MethodCallExpression.cs's source code.

How to get an EF query to compile the most optimised SQL?

I'm fairly new to EF, and this is something that has been bugging me for a couple of days now:
I have a User entity. It has a parent WorkSpace, which has a collection of Users.
Each User also has a collection of children Schedule, in a User.Schedules property.
I'm navigating through the objects like this:
var query = myUser.WorkSpace.Users.SelectMany(u => u.Schedules);
When enumerating the results of query (myUser is an instance of User that has been loaded previously using .Find(userid)), I noticed that EF makes one query to the db for each User in WorkSpace.Users
How come EF doesn't get the results in one single query, starting from the primary key of myUser, and joining on this with all the tables involved?
If I do something else directly from the context such as this, it works fine though:
context.Users.Where(u => u.ID = userid).SelectMany(u => u.WorkSpace.Users.SelectMany(u => u.Schedules))
Is there something I'm doing wrong?

Let take the first query:
var query = myUser.WorkSpace.Users.SelectMany(u => u.Schedules);
If you look at the type of the query variable, you'll see that it is IEnumerable<Schedule>, which means this is a regular LINQ to Objects query. Why? Because it starts from materialized object, then accession another object/collection etc. This combined with the EF lazy loading feature is leading to the multiple database query behavior.
If you do the same for the second query:
var query = context.Users.Where(u => u.ID = userid)
.SelectMany(u => u.WorkSpace.Users.SelectMany(u => u.Schedules))
you'll notice that the type of the query is now IQueryable<Schedule>, which means now you have a LINQ to Entities query. This is because neither context.Users nor other object/collections used inside the query are real objects - they are just metadata used to build, execute and materialize the query.
To recap, you are not doing something wrong. The lazy loading works this way. If you don't care about so called N+1 query issue, you can use the first approach. If you do care, then use the second.

What is the difference between Joining two different DB Context using ToList() and .AsQueryable()?

Case 1:
I am Joined two different DB Context by ToList() method in Both Context.
Case 2:
And also tried Joining first Db Context with ToList() and second with AsQueryable().
Both worked for me. All I want to know is the difference between those Joinings regarding Performance and Functionality. Which one is better ?
var users = (from usr in dbContext.User.AsNoTracking()
select new
{
usr.UserId,
usr.UserName
}).ToList();
var logInfo= (from log in dbContext1.LogInfo.AsNoTracking()
select new
{
log.UserId,
log.LogInformation
}).AsQueryable();
var finalQuery= (from usr in users
join log in logInfo on usr.UserId equals log.UserId
select new
{
usr.UserName,
log.LogInformation
}.ToList();

I'll elaborate answer that was given by Jehof in his comment. It is true that this join will be executed in the memory. And there are 2 reasons why it happens.
Firstly, this join cannot be performed in a database because you are joining an object in a memory (users) with a deferred query (logInfo). Based on that it is not possible to generate a query that could be send to a database. It means that before performing the actual join a deferred query is executed and all logs are retrieved from a database. To sum up, in this scenario 2 queries are executed in a database and join happens in memory. It doesn't matter if you use ToList + AsQueryable or ToList + ToList in this case.
Secondly, in your scenario this join can be performed ONLY in a memory. Even if you use AsQueryable with the first context and with the second context it will not work. You will get System.NotSupportedException exception with the message:
The specified LINQ expression contains references to queries that are associated with different contexts.
I wonder why you're using 2 DB contexts. Is it really needed? As I explained because of that you lost a possibility to take full advantage of deferred queries (lazy evaluation features).
If you really have to use 2 DB contexts, I'll consider adding some filters (WHERE conditions) to queries responsible for reading users and logs from DB. Why? For small number of records there is no problem. However, for large amount of data it is not efficient to perform joins in memory. For this purpose databases were created.

It hasn't been explained yet why the statements actually work and why EF doesn't throw an exception that you can only use sequences of primitive types in a LINQ statement.
If you swap both lists ...
var finalQuery= (from log in logInfo
join usr in users on log.UserId equals usr.UserId
...
EF will throw
Unable to create a constant value of type 'User'. Only primitive types or enumeration types are supported in this context.
So why does your code work?
That will become clear if we convert your statement to method syntax (which the runtime does under the hood):
users.Join(logInfo, usr => usr.UserId, log => log.UserId
(usr,log) => new
{
usr.UserName,
log.LogInformation
}
Since users is an IEnumerable, the extension method Enumerable.Join is resolved as the appropriate method. This method accepts an IEnumerable as the second list to be joined. Therefore, logInfo is implicitly cast to IEnumerable, so it runs as a separate SQL statement before it partakes in the join.
In the version from log in logInfo join usr ..., Queryable.Join is used. Now usr is converted into an IQueryable. This turns the whole statement into one expression that EF unsuccessfully tries to translate into one SQL statement.
Now a few words on
Which one is better?
The best option is the one that does just enough to make it work. That means that
You can remove AsQueryable(), because logInfo already is an IQueryable and it is cast to IEnumerable anyway.
You can replace ToList() by AsEnumerable(), because ToList() builds a redundant intermediate result, while AsEnumerable() only changes the runtime type of users, without triggering its execution yet.

ToList()
Execute the query immediately
You will get all the elements ready in memory
AsQueryable()
lazy (execute the query later)
Parameter: Expression<Func<TSource, bool>>
Convert Expression into T-SQL (with specific provider), query remotely and load result to your application memory.
That’s why DbSet (in Entity Framework) also inherits IQueryable to get efficient query.
It does not load every record. E.g. if Take(5), it will generate select top 5 * SQL in the background.

Does Entity Framework Selects all rows in memory when applying a Where filter?

There is one thing makes me confused. I think EF selects all rows (all records) in table.
Let me show you a sample.
public Category GetByID(int Id)
{
return context.Categories.Find(Id);
}
there are a lot of records in table and when i check them with break point i can see all the records not only the Id numbered one. What if there are 10k records in table? I test this. I copied all record manually in database and i make 30k records.
An expression like this,
IEnumerable<Category> categories = categoryRepository
.Where(x => x.Published == true)
.ToList();
I saw 30k records with break point. But at least 10k Published False in table.
Is Entity framework firstly fetches all of the records to memory and after filters them?

TL;DR
It's likely your categoryRepository has broken EF's IQueryable<> expression tree, and is materializing the entire Category table in order to run the .Where predicate. See the examples below.
More Detail
The short answer is no, provided that Entity Framework is able to parse the IQueryable<> expression (which includes the .Where predicates you specify) it will convert the associated expression tree into native Sql using the appropriate query provider for the RDBMS you are targetting, thus allowing all the benefits of Sql, e.g. use of indexes.
As per my comment, one of the common reasons why EF would not do this is if the IQueryable mechanism has been tampered with, for instance, if your Repository pattern implementation uses the IEnumerable<T> overload of the Where predicate and not the IQueryable overload.
As a result, EF has no other option but to fetch the table and execute every row against your predicate function to determine whether the row matches your predicate or not.
As an aside, there is some debate whether there is merit in wrapping a DbContext in a Repository and / or Unit Of Work wrapper, as a DbContext is Transactional, performs caching, can be mocked during unit testing, and now supports a wide range of databases.
Examples of where materialization happens and how this affects performance
(The point at which the Sql query is actually executed is often referred to as materialization)
I've excluded the OP's repository - i.e. we're using the DbContext directly.
Best:
var miniFoos = myDbContext.MyFooSet
.Where(f => f.SomeProperty = "foo")
.Select(f => new {...})
.ToList();
This is good, because, we've applied both the predicate and a projection of just the columns we need in SQL, before we materialize the data into a List (of an anonymous type)
OK:
var foos = myDbContext.MyFooSet
.Where(f => f.SomeProperty = "foo")
.ToList() // Or .AsEnumerable(), or other materialization methods
.Select(f => new {...}); // Subset of fields
This isn't ideal, because although we've applied the .Where clause before we materialize, we're returning the full columns in of Foo table, just to discard unnecessary columns. This means unnecessary I/O, and also, Sql might have been able to use just an index to perform the query.
Bad - Never do this
var foos = myDbContext.MyFooSet
.AsEnumerable() // (or `ToList()`, same problem)
.Where(f => f.SomeProperty = "foo")
.Select(f => new {...}); // Subset of fields
This seems to be what the OP is experiencing - since the table is materialized BEFORE any .Where filtering takes place, the whole table IS retrieved to memory and filtering happens with Linq to Objects, instead.
This problem can also happen when applying custom .Where predicate builders which don't use Expressions, or which use IEnumerable<T> instead of IQueryable<T> - IEnumerable has no associated expression tree and can't be parsed to Sql.

Using the result of a sql linq query leads to the database beeing queried twice

I query a database and push the result out to the console and a file using two methods like this:
var result = from p in _db.Pages
join r in rank on p.PkId equals r.Key
orderby r.NumPages descending
select new KeyNameCount
{
PageID = p.PkId,
PageName = p.Name,
Count = r.NumPages
};
WriteFindingsToFile(result, parentId);
WriteFindingsToConsole(result, parentId);
IEnumerable<T>, not IQuerable<T> is used as parametertype for result when used as a parameter in the two method calls.
In both calls the result is iterated in a foreach. This leads to two identical calls against the database, one for each method.
I could refactor the two methods into one and only use one foreach, but that would fast become very hard to maintain (adding write to html, write to xml, etc.)
I am pretty sure this is a farly common question, but using google has not made me any wiser, so I turn to you guys :-)

Every time you access a LINQ query it will requery the database to refresh the data. To stop this happening use .ToArray() or .ToList().
e.g.
var result = (from p in _db.Pages
select p).ToList(); //will query now
Write(result); //will not requery
Write(result2); //will not requery
It's important to understand that a raw LINQ query is run when it is used, not when it is written in the code (e.g. don't dispose of your _db before then).
It can be surprising when you realise how it really works. This allows method chaining and later modification of the query to be reflected in the final query run on the DB. It is important to always keep in mind as it can cause run-time bugs that will not be caught at compile time, usually because the DB connection is closed before the list is used, as you are passing around what appear to be a simple IEnumerable.
EDIT: Changed to remove my opinion and reflect the discussion in the comments. Personally I think the compiler should assume that the end result of chained queries is immediately run unless you explicitly say that it'll be further modified later. Just to avoid the run-time bugs it inevitably causes.

If you look at the function definition for IQuerayble, you will see that it also implements IEnumerable. So you can pass IQueryable as parameter to an IEnumerable function without actually enumerating it.
But because of Linqs deffered execution pattern, the IQueryable will only be executed against the database when you iterate over it (with a for loop for example as in your case, or with ToList() or functions like First/Single).
Here is a blog post that explains how this works.
If you change your code to the following you will hit the database only once and then pass the result in memory to your functions:
var result = (from p in _db.Pages
join r in rank on p.PkId equals r.Key
orderby r.NumPages descending
select new KeyNameCount
{
PageID = p.PkId,
PageName = p.Name,
Count = r.NumPages
}).ToList();
WriteFindingsToFile(result, parentId);
WriteFindingsToConsole(result, parentId);

Any time you iterate over your LINQ result the databased will be queried too.
This is called Deferred query execution. You may have a deeper look into at one (out of many) corresponding MSDN articles!
Execution of the query is deferred until the query variable is iterated over in a foreach or For Each loop

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.