How to deconstruct a LINQ IQueryable / Expression into the constituent parts?

How to deconstruct a LINQ IQueryable / Expression into the constituent parts? - c#

I am writing an extension method which is meant to help prepare a LINQ query before it is actually executed in EF Core.
For example:
var context = new SchoolContext();
var studentWithGrade = context.Students
.Where(s => s.FirstName == "Bill")
.Include(s => s.Grade)
.Prepare()
.FirstOrDefault();
I am building the Prepare() extension on the IQueryable<> interface. In the method, I need to know things about the resulting query to be executed, such as:
The predicate (s.FirstName == "Bill")
The selected fields (from the Students table, as well as any fields from s.Grade)
The limit (an implied 1 in this case)
... and so on.
AFAICT, the Expression on the IQueryable is transformed into a SQL query (and executed) when FirstOrDefault() is called. So this transformation is obviously possible. But I would like to examine the data in a structured way, rather than inspecting a SQL query string. My hope was that I could simply inspect the queryable (query.Limit < 0) but the actual storage of these constraints seems buried with reflection in Expression object (?) based upon MethodCallExpression.cs's source code.

Related

using AsQueryable after ToList [duplicate]

I know some differences of LINQ to Entities and LINQ to Objects which the first implements IQueryable and the second implements IEnumerable and my question scope is within EF 5.
My question is what's the technical difference(s) of those 3 methods? I see that in many situations all of them work. I also see using combinations of them like .ToList().AsQueryable().
What do those methods mean, exactly?
Is there any performance issue or something that would lead to the use of one over the other?
Why would one use, for example, .ToList().AsQueryable() instead of .AsQueryable()?

There is a lot to say about this. Let me focus on AsEnumerable and AsQueryable and mention ToList() along the way.
What do these methods do?
AsEnumerable and AsQueryable cast or convert to IEnumerable or IQueryable, respectively. I say cast or convert with a reason:
When the source object already implements the target interface, the source object itself is returned but cast to the target interface. In other words: the type is not changed, but the compile-time type is.
When the source object does not implement the target interface, the source object is converted into an object that implements the target interface. So both the type and the compile-time type are changed.
Let me show this with some examples. I've got this little method that reports the compile-time type and the actual type of an object (courtesy Jon Skeet):
void ReportTypeProperties<T>(T obj)
{
Console.WriteLine("Compile-time type: {0}", typeof(T).Name);
Console.WriteLine("Actual type: {0}", obj.GetType().Name);
}
Let's try an arbitrary linq-to-sql Table<T>, which implements IQueryable:
ReportTypeProperties(context.Observations);
ReportTypeProperties(context.Observations.AsEnumerable());
ReportTypeProperties(context.Observations.AsQueryable());
The result:
Compile-time type: Table`1
Actual type: Table`1
Compile-time type: IEnumerable`1
Actual type: Table`1
Compile-time type: IQueryable`1
Actual type: Table`1
You see that the table class itself is always returned, but its representation changes.
Now an object that implements IEnumerable, not IQueryable:
var ints = new[] { 1, 2 };
ReportTypeProperties(ints);
ReportTypeProperties(ints.AsEnumerable());
ReportTypeProperties(ints.AsQueryable());
The results:
Compile-time type: Int32[]
Actual type: Int32[]
Compile-time type: IEnumerable`1
Actual type: Int32[]
Compile-time type: IQueryable`1
Actual type: EnumerableQuery`1
There it is. AsQueryable() has converted the array into an EnumerableQuery, which "represents an IEnumerable<T> collection as an IQueryable<T> data source." (MSDN).
What's the use?
AsEnumerable is frequently used to switch from any IQueryable implementation to LINQ to objects (L2O), mostly because the former does not support functions that L2O has. For more details see What is the effect of AsEnumerable() on a LINQ Entity?.
For example, in an Entity Framework query we can only use a restricted number of methods. So if, for example, we need to use one of our own methods in a query we would typically write something like
var query = context.Observations.Select(o => o.Id)
.AsEnumerable().Select(x => MySuperSmartMethod(x))
ToList – which converts an IEnumerable<T> to a List<T> – is often used for this purpose as well. The advantage of using AsEnumerable vs. ToList is that AsEnumerable does not execute the query. AsEnumerable preserves deferred execution and does not build an often useless intermediate list.
On the other hand, when forced execution of a LINQ query is desired, ToList can be a way to do that.
AsQueryable can be used to make an enumerable collection accept expressions in LINQ statements. See here for more details: Do i really need use AsQueryable() on collection?.
Note on substance abuse!
AsEnumerable works like a drug. It's a quick fix, but at a cost and it doesn't address the underlying problem.
In many Stack Overflow answers, I see people applying AsEnumerable to fix just about any problem with unsupported methods in LINQ expressions. But the price isn't always clear. For instance, if you do this:
context.MyLongWideTable // A table with many records and columns
.Where(x => x.Type == "type")
.Select(x => new { x.Name, x.CreateDate })
...everything is neatly translated into a SQL statement that filters (Where) and projects (Select). That is, both the length and the width, respectively, of the SQL result set are reduced.
Now suppose users only want to see the date part of CreateDate. In Entity Framework you'll quickly discover that...
.Select(x => new { x.Name, x.CreateDate.Date })
...is not supported (at the time of writing). Ah, fortunately there's the AsEnumerable fix:
context.MyLongWideTable.AsEnumerable()
.Where(x => x.Type == "type")
.Select(x => new { x.Name, x.CreateDate.Date })
Sure, it runs, probably. But it pulls the entire table into memory and then applies the filter and the projections. Well, most people are smart enough to do the Where first:
context.MyLongWideTable
.Where(x => x.Type == "type").AsEnumerable()
.Select(x => new { x.Name, x.CreateDate.Date })
But still all columns are fetched first and the projection is done in memory.
The real fix is:
context.MyLongWideTable
.Where(x => x.Type == "type")
.Select(x => new { x.Name, DbFunctions.TruncateTime(x.CreateDate) })
(But that requires just a little bit more knowledge...)
What do these methods NOT do?
Restore IQueryable capabilities
Now an important caveat. When you do
context.Observations.AsEnumerable()
.AsQueryable()
you will end up with the source object represented as IQueryable. (Because both methods only cast and don't convert).
But when you do
context.Observations.AsEnumerable().Select(x => x)
.AsQueryable()
what will the result be?
The Select produces a WhereSelectEnumerableIterator. This is an internal .Net class that implements IEnumerable, not IQueryable. So a conversion to another type has taken place and the subsequent AsQueryable can never return the original source anymore.
The implication of this is that using AsQueryable is not a way to magically inject a query provider with its specific features into an enumerable. Suppose you do
var query = context.Observations.Select(o => o.Id)
.AsEnumerable().Select(x => x.ToString())
.AsQueryable()
.Where(...)
The where condition will never be translated into SQL. AsEnumerable() followed by LINQ statements definitively cuts the connection with entity framework query provider.
I deliberately show this example because I've seen questions here where people for instance try to 'inject' Include capabilities into a collection by calling AsQueryable. It compiles and runs, but it does nothing because the underlying object does not have an Include implementation anymore.
Execute
Both AsQueryable and AsEnumerable don't execute (or enumerate) the source object. They only change their type or representation. Both involved interfaces, IQueryable and IEnumerable, are nothing but "an enumeration waiting to happen". They are not executed before they're forced to do so, for example, as mentioned above, by calling ToList().
That means that executing an IEnumerable obtained by calling AsEnumerable on an IQueryable object, will execute the underlying IQueryable. A subsequent execution of the IEnumerable will again execute the IQueryable. Which may be very expensive.
Specific Implementations
So far, this was only about the Queryable.AsQueryable and Enumerable.AsEnumerable extension methods. But of course anybody can write instance methods or extension methods with the same names (and functions).
In fact, a common example of a specific AsEnumerable extension method is DataTableExtensions.AsEnumerable. DataTable does not implement IQueryable or IEnumerable, so the regular extension methods don't apply.

ToList()
Execute the query immediately
AsEnumerable()
lazy (execute the query later)
Parameter: Func<TSource, bool>
Load EVERY record into application memory, and then handle/filter them. (e.g. Where/Take/Skip, it will select * from table1, into the memory, then select the first X elements) (In this case, what it did: Linq-to-SQL + Linq-to-Object)
AsQueryable()
lazy (execute the query later)
Parameter: Expression<Func<TSource, bool>>
Convert Expression into T-SQL (with the specific provider), query remotely and load result to your application memory.
That’s why DbSet (in Entity Framework) also inherits IQueryable to get the efficient query.
Do not load every record, e.g. if Take(5), it will generate select top 5 * SQL in the background. This means this type is more friendly to SQL Database, and that is why this type usually has higher performance and is recommended when dealing with a database.
So AsQueryable() usually works much faster than AsEnumerable() as it generate T-SQL at first, which includes all your where conditions in your Linq.

ToList() will being everything in memory and then you will be working on it.
so, ToList().where ( apply some filter ) is executed locally.
AsQueryable() will execute everything remotely i.e. a filter on it is sent to the database for applying.
Queryable doesn't do anything til you execute it. ToList, however executes immediately.
Also, look at this answer Why use AsQueryable() instead of List()?.
EDIT :
Also, in your case once you do ToList() then every subsequent operation is local including AsQueryable(). You can't switch to remote once you start executing locally.
Hope this makes it a little bit more clearer.

Encountered a bad performance on below code.
void DoSomething<T>(IEnumerable<T> objects){
var single = objects.First(); //load everything into memory before .First()
...
}
Fixed with
void DoSomething<T>(IEnumerable<T> objects){
T single;
if (objects is IQueryable<T>)
single = objects.AsQueryable().First(); // SELECT TOP (1) ... is used
else
single = objects.First();
}
For an IQueryable, stay in IQueryable when possible, try not be used like IEnumerable.
Update. It can be further simplified in one expression, thanks Gert Arnold.
T single = objects is IQueryable<T> q?
q.First():
objects.First();

SelectMany query with Where produces many SQL queries

I'm using for a GetAppRolesForUser function (and have tried variations of based on answers here):
private AuthContext db = new AuthContext();
...
var userRoles = Mapper.Map<List<RoleApi>>(
db.Users.SingleOrDefault(u => u.InternetId == username)
.Groups.SelectMany(g => g.Roles.Where(r => r.Asset.AssetName == application)));
I end up with this in SQL Profiler for every single RolesId each time:
exec sp_executesql N'SELECT
[Extent2].[GroupId] AS [GroupId],
[Extent2].[GroupName] AS [GroupName]
FROM [Auth].[Permissions] AS [Extent1]
INNER JOIN [Auth].[Groups] AS [Extent2] ON [Extent1].[GroupId] = [Extent2].[GroupId]
WHERE [Extent1].[RolesId] = #EntityKeyValue1',N'#EntityKeyValue1 int',#EntityKeyValue1=6786
How do I refactor so EF produces a single query for userRoles and doesn't take 18 seconds to run?

I think the problem is you're lazy loading the groups and roles.
One solution is eager load them before you call SingleOrDefault
var user = db.Users.Include(x => x.Groups.Select(y => y.Roles))
.SingleOrDefault(u => u.InternetId == username);
var groups = user.Groups.SelectMany(
g => g.Roles.Where(r => r.Asset.AssetName == application));
var userRoles = Mapper.Map<List<RoleApi>>(groups);
Also note : there is no sanity checking for null here.

TheGeneral's answer covers why you are getting caught out with lazy loading. You may also need to include Asset to get AssetName.
With AutoMapper you can avoid the need to Eager Load the entities by employing .ProjectTo<T>() to the IQueryable, provided there is a User accessible in Group.
For instance:
var roles = db.Groups.Where(g => g.User.Internetid == username)
.SelectMany(g => g.Roles.Where(r => r.Asset.AssetName == application))
.ProjectTo<RoleApi>()
.ToList();
This should leverage the deferred execution where AutoMapper will effectively project in the .Select() needed to populate the RoleApi instance based on your mapping/inspection.

Here is another way of avoiding lazy loading. You can also look at projection and have only those fields which you need rather than loading the entire columns.
var userRoles = Mapper.Map<List<RoleApi>>(
db.Users.Where(u => u.InternetId == username).Select(../* projection */ )
.Groups.SelectMany(g => g.Roles.Where(r => r.Asset.AssetName == application)));
EF also comes with Include:
var userRoles = Mapper.Map<List<RoleApi>>(
db.Users.Where(u => u.InternetId == username).Select(../* projection */ )
.Include(g => g.Roles.Where(r => r.Asset.AssetName == application)));
Then can iterate the collection using multiple for loops.

You have to be aware of two differences:
The difference between IEnumerable and IQueryable
The difference between functions that return IQueryable<TResult> (lazy) and functions that return TResult (executing)
Difference between Enumerable and Queryable
. A LINQ statement that is AsEnumerable is meant to be processed in your local process. It contains all code and all calls to execute the statement. This statement is executed as soon as GetEnumerator and MoveNext are called, either explicitly, or implicitly using foreach or LINQ statements that don't return IEnumerable<...>, like ToList, FirstOrDefault, and Any.
In contrast, an IQueryable is not meant to be processed in your process (however it can be done if you want). It is usually meant to be processed by a different process, usually a database management system.
For this an IQueryable holds an Expression and a Provider. The Expression represents the query that must be executed. The Provider knows who must execute the query (the DBMS), and which language this executor uses (usually SQL). When GetEnumerator and MoveNext are called, the Provider takes the Expression and translates it into the language of the Executor. The query is sen't to the executor. The returned data is presented AsEnumerable, where GetEnumerator and MoveNext are called.
Because of this translation into SQL, an IQueryable can't do all the things that an IEnumerable can do. The main thing is that it can't call your local functions. It can't even execute all LINQ functions. The better the quality of the Provider the more it can do. See supported and unsupported LINQ methods
Lazy LINQ methods and executing LINQ methods
There are two groups of LINQ methods. Those that return `IQueryable<...>/IEnumerable<...> and those that do not.
The first group use lazy loading. This means that at the end of the LINQ statement the query has been created, but it is not executed yet. Only 'GetEnumeratorandMoveNextwill make that theProviderwill translate theExpression` and order the DBMS to execute the query.
Concatenating IQueryables will only change the Expression. This is a fairly fast procedure. Hence there is no performance gain if you make one big LINQ expression instead of concatenate them before you execute the query.
Usually the DBMS is smarter and better prepared to do selections than your process. The transfer of selected data to your local process is one of the slower parts of your query.
Advice: Try to create your LINQ statements such, that the executing
statement is the last one that can be executed by the DBMS. Make sure
that you only select the properties that you actually plan to use.
So for example, don't transfer foreign keys if you don't use them.
Back to your question
Leaving the mapper out of the question you start with:
db.Users.SingleOrDefault(...)
SingleOrDefault is a non-lazy function. It doesn't return IQueryable<...>. It will execute the query. It will transport one complete User to your local process, including its Roles.
Advice postpone the SingleOrDefault to the last statement:
var result = myDbcontext.Users
.Where(user => user.InternetId == username)
.SelectMany(user => user.Groups.Roles.Where(role => role.Asset.AssetName == application))
// until here, the query is not executed yet, execute it now:
.SingleOrDefault();
In words: From the sequence of Users, keep only those Users with an InternetId that equals userName. From all remaining Users (which you hope to be only one), select the sequence of Roles of the Groups of each User. However, we don't want to select all Roles, we only keep the Roles with an AssetName equal to application. Now put all remaining Roles into one collection (the many part in SelectMany), and select the zero or one remaining Role that you expect.

Does Entity Framework Selects all rows in memory when applying a Where filter?

There is one thing makes me confused. I think EF selects all rows (all records) in table.
Let me show you a sample.
public Category GetByID(int Id)
{
return context.Categories.Find(Id);
}
there are a lot of records in table and when i check them with break point i can see all the records not only the Id numbered one. What if there are 10k records in table? I test this. I copied all record manually in database and i make 30k records.
An expression like this,
IEnumerable<Category> categories = categoryRepository
.Where(x => x.Published == true)
.ToList();
I saw 30k records with break point. But at least 10k Published False in table.
Is Entity framework firstly fetches all of the records to memory and after filters them?

TL;DR
It's likely your categoryRepository has broken EF's IQueryable<> expression tree, and is materializing the entire Category table in order to run the .Where predicate. See the examples below.
More Detail
The short answer is no, provided that Entity Framework is able to parse the IQueryable<> expression (which includes the .Where predicates you specify) it will convert the associated expression tree into native Sql using the appropriate query provider for the RDBMS you are targetting, thus allowing all the benefits of Sql, e.g. use of indexes.
As per my comment, one of the common reasons why EF would not do this is if the IQueryable mechanism has been tampered with, for instance, if your Repository pattern implementation uses the IEnumerable<T> overload of the Where predicate and not the IQueryable overload.
As a result, EF has no other option but to fetch the table and execute every row against your predicate function to determine whether the row matches your predicate or not.
As an aside, there is some debate whether there is merit in wrapping a DbContext in a Repository and / or Unit Of Work wrapper, as a DbContext is Transactional, performs caching, can be mocked during unit testing, and now supports a wide range of databases.
Examples of where materialization happens and how this affects performance
(The point at which the Sql query is actually executed is often referred to as materialization)
I've excluded the OP's repository - i.e. we're using the DbContext directly.
Best:
var miniFoos = myDbContext.MyFooSet
.Where(f => f.SomeProperty = "foo")
.Select(f => new {...})
.ToList();
This is good, because, we've applied both the predicate and a projection of just the columns we need in SQL, before we materialize the data into a List (of an anonymous type)
OK:
var foos = myDbContext.MyFooSet
.Where(f => f.SomeProperty = "foo")
.ToList() // Or .AsEnumerable(), or other materialization methods
.Select(f => new {...}); // Subset of fields
This isn't ideal, because although we've applied the .Where clause before we materialize, we're returning the full columns in of Foo table, just to discard unnecessary columns. This means unnecessary I/O, and also, Sql might have been able to use just an index to perform the query.
Bad - Never do this
var foos = myDbContext.MyFooSet
.AsEnumerable() // (or `ToList()`, same problem)
.Where(f => f.SomeProperty = "foo")
.Select(f => new {...}); // Subset of fields
This seems to be what the OP is experiencing - since the table is materialized BEFORE any .Where filtering takes place, the whole table IS retrieved to memory and filtering happens with Linq to Objects, instead.
This problem can also happen when applying custom .Where predicate builders which don't use Expressions, or which use IEnumerable<T> instead of IQueryable<T> - IEnumerable has no associated expression tree and can't be parsed to Sql.

How to Parameterize Linq Query or Lambda Query in Entity Framework?

I write some Parameterize Lambda Queries
//Method 1:
Func<SalesOrderLineEntity, bool> func01 = (o => o.SOLNumber == "123456");
var q01 = _context.SalesOrderLineEntities.Where(func01).ToList();
//Got the result, but SQLServer Read All Records to memory before "where"
//Method 2:
Expression<Func<SalesOrderLineEntity, bool>> exp02 = (o => o.SOLNumber == "123456");
var q02 = _context.SalesOrderLineEntities.Where(exp02).ToList();
//Got the result,Exec "Where" in SQLServer
//Method 3:
Expression<Func<SalesOrderLineEntity, bool>> exp03 = (o => func01(o));
var q03 = _context.SalesOrderLineEntities.Where(exp03.Compile()).ToList();
//Same to Method 1,because Compile() result is Func<SalesOrderLineEntity, bool>
//Method 4:
var q04 = _context.SalesOrderLineEntities.Where(exp03).ToList();
//Error:The LINQ expression node type 'Invoke' is not supported in LINQ to Entities
Method 1 and 3：Efficiency is very low
Method 4:Error
Method 2:Need I Build a Expression through the Lambda. I feel it is very difficult, because i will use many "if,else".it easier to create a function.
What is the correct way to do that?

Variations
Method 1: EF reads all the records from the DB because you pass a Func into the Where clause, which is not the right candidate: EF cannot extract the needed information from it to build the query, it can only use that function on an in-memory collection.
Method 2: this is the correct way to do EF queries because EF builds up the actual query based on the Expression tree. It may look like the same as Method 1 when you write .Where but this is different.
IQueryable extension methods are using Expression trees, so you can (or EF can) evaluate that information at runtime.
Method 3: this is essentially the same as Method 1, because you compile the expression. This is a key difference while you use them: an expression contains the informations to build the actual operation but that's not the operation itself. You need to compile it before (or for example you can build SQL queries based on them, that's how EF works).
Method 4: EF cannot translate your func01() call to any SQL function. It cannot translate any kind of code because it needs an equivalent SQL operation. You can try to use a general method, you will get the same result, it's not about the Func.
What happens here?
If we simplify the underlying process then the answer above might by more clear.
//Method 2:
Expression<Func<SalesOrderLineEntity, bool>> exp02 = (o => o.SOLNumber == "123456");
var q02 = _context.SalesOrderLineEntities.Where(exp02).ToList();
//Got the result,Exec "Where" in SQLServer
EF can read the following (via the expressions):
the user want to filter with Where
here is an expression for that, let's get some information
well, it needs SalesOrderLineEntity and I have a mapping for that type
the expression tells that the property SOLNumber must be equal to "123456"
ok, I have a mapping for SOLNumber so it's good
and I can translate the equal operator to an equivalent SQL operator
everything okay, so we can build the SQL query
Of course, you cannot do this with a Func for example because that object doesn't contain these informations.

Not sure if this is applicable but have you look at compiled queries: Compiled Queries (LINQ to Entities) which should result in a more efficient SQL statement

Why the order of LINQ to objects methods counts

I read this question's answers that explain the order of the LINQ to objects methods makes a difference. My question is why?
If I write a LINQ to SQL query, it doesn't matter the order of the LINQ methods-projections for example:
session.Query<Person>().OrderBy(x => x.Id)
.Where(x => x.Name == "gdoron")
.ToList();
The expression tree will be transformed to a rational SQL like this:
SELECT *
FROM Persons
WHERE Name = 'gdoron'
ORDER BY Id;
When I Run the query, SQL query will built according to the expression tree no matter how weird the order of the methods.
Why it doesn't work the same with LINQ to objects?
when I enumerate an IQueryable all the projections can be placed in a rational order(e.g. Order By after Where) just like the Data Base optimizer does.

Why it doesn't work this way with LINQ to objects?
LINQ to Objects doesn't use expression trees. The statement is directly turned into a series of method calls, each of which runs as a normal C# method.
As such, the following in LINQ to Objects:
var results = collection.OrderBy(x => x.Id)
.Where(x => x.Name == "gdoron")
.ToList();
Gets turned into direct method calls:
var results = Enumerable.ToList(
Enumerable.Where(
Enumerable.OrderBy(collection, x => x.Id),
x => x.Name = "gdoron"
)
);
By looking at the method calls, you can see why ordering matters. In this case, by placing OrderBy first, you're effectively nesting it into the inner-most method call. This means the entire collection will get ordered when the resutls are enumerated. If you were to switch the order:
var results = collection
.Where(x => x.Name == "gdoron")
.OrderBy(x => x.Id)
.ToList();
Then the resulting method chain switches to:
var results = Enumerable.ToList(
Enumerable.OrderBy(
Enumerable.Where(collection, x => x.Name = "gdoron"),
x => x.Id
)
);
This, in turn, means that only the filtered results will need to be sorted as OrderBy executes.

Linq to objects's deferred execution works differently than linq-to-sql's (and EF's).
With linq-to-objects, the method chain will be executed in the order that the methods are listed—it doesn't use expression trees to store and translate the whole thing.
Calling OrderBy then Where with linq-to-objects will, when you enumerate the results, sort the collection, then filter it. Conversely, filtering results with a call to Where before sorting it with OrderBy will, when you enumerate, first filter, then sort. As a result the latter case can make a massive difference, since you'd potentially be sorting many fewer items.

Because, with LINQ for SQL, the SQL grammar for SELECT mandates that the different clauses occur in a particular sequence. The compiler must generate grammatically correct SQL.
Applying LINQ for objects on an IEnumerable involves iterating over the IEnumerable and applying a sequence of actions to each object in the IEnumerable. Order matters: some actions may transform the object (or the stream of objects itself), others may throw objects away (or inject new objects into the stream).
The compiler can't divine your intent. It builds code that does what you said to do in the order in which you said to do it.

It's perfectly legal to use side-effecting operations. Compare:
"crabapple"
.OrderBy(c => { Console.Write(c); return c; })
.Where(c => { Console.Write(c); return c > 'c'; })
.Count();
"crabapple"
.Where(c => { Console.Write(c); return c > 'c'; })
.OrderBy(c => { Console.Write(c); return c; })
.Count();

Linq to Objects does not reorder to avoid a would-be run-time step to do something that should be optimized at coding time. The resharpers of the world may at some point introduce code analysis tools to smoke out optimization opportunities like this, but it is definitely not a job for the runtime.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.