Why the order of LINQ to objects methods counts

Why the order of LINQ to objects methods counts - c#

I read this question's answers that explain the order of the LINQ to objects methods makes a difference. My question is why?
If I write a LINQ to SQL query, it doesn't matter the order of the LINQ methods-projections for example:
session.Query<Person>().OrderBy(x => x.Id)
.Where(x => x.Name == "gdoron")
.ToList();
The expression tree will be transformed to a rational SQL like this:
SELECT *
FROM Persons
WHERE Name = 'gdoron'
ORDER BY Id;
When I Run the query, SQL query will built according to the expression tree no matter how weird the order of the methods.
Why it doesn't work the same with LINQ to objects?
when I enumerate an IQueryable all the projections can be placed in a rational order(e.g. Order By after Where) just like the Data Base optimizer does.

Why it doesn't work this way with LINQ to objects?
LINQ to Objects doesn't use expression trees. The statement is directly turned into a series of method calls, each of which runs as a normal C# method.
As such, the following in LINQ to Objects:
var results = collection.OrderBy(x => x.Id)
.Where(x => x.Name == "gdoron")
.ToList();
Gets turned into direct method calls:
var results = Enumerable.ToList(
Enumerable.Where(
Enumerable.OrderBy(collection, x => x.Id),
x => x.Name = "gdoron"
)
);
By looking at the method calls, you can see why ordering matters. In this case, by placing OrderBy first, you're effectively nesting it into the inner-most method call. This means the entire collection will get ordered when the resutls are enumerated. If you were to switch the order:
var results = collection
.Where(x => x.Name == "gdoron")
.OrderBy(x => x.Id)
.ToList();
Then the resulting method chain switches to:
var results = Enumerable.ToList(
Enumerable.OrderBy(
Enumerable.Where(collection, x => x.Name = "gdoron"),
x => x.Id
)
);
This, in turn, means that only the filtered results will need to be sorted as OrderBy executes.

Linq to objects's deferred execution works differently than linq-to-sql's (and EF's).
With linq-to-objects, the method chain will be executed in the order that the methods are listed—it doesn't use expression trees to store and translate the whole thing.
Calling OrderBy then Where with linq-to-objects will, when you enumerate the results, sort the collection, then filter it. Conversely, filtering results with a call to Where before sorting it with OrderBy will, when you enumerate, first filter, then sort. As a result the latter case can make a massive difference, since you'd potentially be sorting many fewer items.

Because, with LINQ for SQL, the SQL grammar for SELECT mandates that the different clauses occur in a particular sequence. The compiler must generate grammatically correct SQL.
Applying LINQ for objects on an IEnumerable involves iterating over the IEnumerable and applying a sequence of actions to each object in the IEnumerable. Order matters: some actions may transform the object (or the stream of objects itself), others may throw objects away (or inject new objects into the stream).
The compiler can't divine your intent. It builds code that does what you said to do in the order in which you said to do it.

It's perfectly legal to use side-effecting operations. Compare:
"crabapple"
.OrderBy(c => { Console.Write(c); return c; })
.Where(c => { Console.Write(c); return c > 'c'; })
.Count();
"crabapple"
.Where(c => { Console.Write(c); return c > 'c'; })
.OrderBy(c => { Console.Write(c); return c; })
.Count();

Linq to Objects does not reorder to avoid a would-be run-time step to do something that should be optimized at coding time. The resharpers of the world may at some point introduce code analysis tools to smoke out optimization opportunities like this, but it is definitely not a job for the runtime.

Related

IN Operator between 2 collection objects

In Sql, I know we have the IN operator and the equivalent of that is .Contains() in LINQ but i am stuck on one problem here
Consider the following linq query:
_dbContext.Offices.Where(o => o.Id == policy.OfficeId).FirstOrDefaultAsync()
Suppose i were to introduce a policies (plural) object which is a collection object each policy in the collection has its own officeId, how do I check if the offices collection consists of the officesId from policies collection object? I would like to do this in method syntax if possible. Thanks in advance.

Without having access to your data model, your answer does seem plausible; however, you need to have a call to .ToListAsync() at the end for this to compile. Also, it's more efficient to use Any() (rather than .Select.Contains):
await _dbContext.Offices.Where(o => policies.Any(p => p.OfficeId == o.Id)).ToListAsync();
I'm not sitting in front of a compiler, but if you just want the first one, the following is even better IIRC:
await _dbContext.Offices.FirstOrDefaultAsync(o => policies.Any(p => p.OfficeId == o.Id));

I think the following might work:
await _dbContext.Offices.Where(o => policies.Select(p => p.OfficeId).Contains(o.Id))

How to deconstruct a LINQ IQueryable / Expression into the constituent parts?

I am writing an extension method which is meant to help prepare a LINQ query before it is actually executed in EF Core.
For example:
var context = new SchoolContext();
var studentWithGrade = context.Students
.Where(s => s.FirstName == "Bill")
.Include(s => s.Grade)
.Prepare()
.FirstOrDefault();
I am building the Prepare() extension on the IQueryable<> interface. In the method, I need to know things about the resulting query to be executed, such as:
The predicate (s.FirstName == "Bill")
The selected fields (from the Students table, as well as any fields from s.Grade)
The limit (an implied 1 in this case)
... and so on.
AFAICT, the Expression on the IQueryable is transformed into a SQL query (and executed) when FirstOrDefault() is called. So this transformation is obviously possible. But I would like to examine the data in a structured way, rather than inspecting a SQL query string. My hope was that I could simply inspect the queryable (query.Limit < 0) but the actual storage of these constraints seems buried with reflection in Expression object (?) based upon MethodCallExpression.cs's source code.

SelectMany query with Where produces many SQL queries

I'm using for a GetAppRolesForUser function (and have tried variations of based on answers here):
private AuthContext db = new AuthContext();
...
var userRoles = Mapper.Map<List<RoleApi>>(
db.Users.SingleOrDefault(u => u.InternetId == username)
.Groups.SelectMany(g => g.Roles.Where(r => r.Asset.AssetName == application)));
I end up with this in SQL Profiler for every single RolesId each time:
exec sp_executesql N'SELECT
[Extent2].[GroupId] AS [GroupId],
[Extent2].[GroupName] AS [GroupName]
FROM [Auth].[Permissions] AS [Extent1]
INNER JOIN [Auth].[Groups] AS [Extent2] ON [Extent1].[GroupId] = [Extent2].[GroupId]
WHERE [Extent1].[RolesId] = #EntityKeyValue1',N'#EntityKeyValue1 int',#EntityKeyValue1=6786
How do I refactor so EF produces a single query for userRoles and doesn't take 18 seconds to run?

I think the problem is you're lazy loading the groups and roles.
One solution is eager load them before you call SingleOrDefault
var user = db.Users.Include(x => x.Groups.Select(y => y.Roles))
.SingleOrDefault(u => u.InternetId == username);
var groups = user.Groups.SelectMany(
g => g.Roles.Where(r => r.Asset.AssetName == application));
var userRoles = Mapper.Map<List<RoleApi>>(groups);
Also note : there is no sanity checking for null here.

TheGeneral's answer covers why you are getting caught out with lazy loading. You may also need to include Asset to get AssetName.
With AutoMapper you can avoid the need to Eager Load the entities by employing .ProjectTo<T>() to the IQueryable, provided there is a User accessible in Group.
For instance:
var roles = db.Groups.Where(g => g.User.Internetid == username)
.SelectMany(g => g.Roles.Where(r => r.Asset.AssetName == application))
.ProjectTo<RoleApi>()
.ToList();
This should leverage the deferred execution where AutoMapper will effectively project in the .Select() needed to populate the RoleApi instance based on your mapping/inspection.

Here is another way of avoiding lazy loading. You can also look at projection and have only those fields which you need rather than loading the entire columns.
var userRoles = Mapper.Map<List<RoleApi>>(
db.Users.Where(u => u.InternetId == username).Select(../* projection */ )
.Groups.SelectMany(g => g.Roles.Where(r => r.Asset.AssetName == application)));
EF also comes with Include:
var userRoles = Mapper.Map<List<RoleApi>>(
db.Users.Where(u => u.InternetId == username).Select(../* projection */ )
.Include(g => g.Roles.Where(r => r.Asset.AssetName == application)));
Then can iterate the collection using multiple for loops.

You have to be aware of two differences:
The difference between IEnumerable and IQueryable
The difference between functions that return IQueryable<TResult> (lazy) and functions that return TResult (executing)
Difference between Enumerable and Queryable
. A LINQ statement that is AsEnumerable is meant to be processed in your local process. It contains all code and all calls to execute the statement. This statement is executed as soon as GetEnumerator and MoveNext are called, either explicitly, or implicitly using foreach or LINQ statements that don't return IEnumerable<...>, like ToList, FirstOrDefault, and Any.
In contrast, an IQueryable is not meant to be processed in your process (however it can be done if you want). It is usually meant to be processed by a different process, usually a database management system.
For this an IQueryable holds an Expression and a Provider. The Expression represents the query that must be executed. The Provider knows who must execute the query (the DBMS), and which language this executor uses (usually SQL). When GetEnumerator and MoveNext are called, the Provider takes the Expression and translates it into the language of the Executor. The query is sen't to the executor. The returned data is presented AsEnumerable, where GetEnumerator and MoveNext are called.
Because of this translation into SQL, an IQueryable can't do all the things that an IEnumerable can do. The main thing is that it can't call your local functions. It can't even execute all LINQ functions. The better the quality of the Provider the more it can do. See supported and unsupported LINQ methods
Lazy LINQ methods and executing LINQ methods
There are two groups of LINQ methods. Those that return `IQueryable<...>/IEnumerable<...> and those that do not.
The first group use lazy loading. This means that at the end of the LINQ statement the query has been created, but it is not executed yet. Only 'GetEnumeratorandMoveNextwill make that theProviderwill translate theExpression` and order the DBMS to execute the query.
Concatenating IQueryables will only change the Expression. This is a fairly fast procedure. Hence there is no performance gain if you make one big LINQ expression instead of concatenate them before you execute the query.
Usually the DBMS is smarter and better prepared to do selections than your process. The transfer of selected data to your local process is one of the slower parts of your query.
Advice: Try to create your LINQ statements such, that the executing
statement is the last one that can be executed by the DBMS. Make sure
that you only select the properties that you actually plan to use.
So for example, don't transfer foreign keys if you don't use them.
Back to your question
Leaving the mapper out of the question you start with:
db.Users.SingleOrDefault(...)
SingleOrDefault is a non-lazy function. It doesn't return IQueryable<...>. It will execute the query. It will transport one complete User to your local process, including its Roles.
Advice postpone the SingleOrDefault to the last statement:
var result = myDbcontext.Users
.Where(user => user.InternetId == username)
.SelectMany(user => user.Groups.Roles.Where(role => role.Asset.AssetName == application))
// until here, the query is not executed yet, execute it now:
.SingleOrDefault();
In words: From the sequence of Users, keep only those Users with an InternetId that equals userName. From all remaining Users (which you hope to be only one), select the sequence of Roles of the Groups of each User. However, we don't want to select all Roles, we only keep the Roles with an AssetName equal to application. Now put all remaining Roles into one collection (the many part in SelectMany), and select the zero or one remaining Role that you expect.

How to select items from IEnumerable?

My ordersInfo variable has type IEnumerable<Order>. Order is a custom object. I try to make selection like this:
var s = ordersInfo.Select(x => x.Customer.Email == user.Email && x.Status == OrderStatus.Paid).ToList();
But it returns a collection of 161 elements (it is initial count of collection) and each item has value false. It is not Order object either. What is wrong?

It sounds like you want the Where statement, not the Select statement. Select is used to transform one object into another or select only specific parts of a given object. Where is used to filter.
var s = ordersInfo.Where(x => x.Customer.Email == user.Email
&& x.Status == OrderStatus.Paid).ToList();

I think you are looking for Where instead of Select.
var s = ordersInfo
.Where(x => x.Customer.Email == user.Email &&
x.Status == OrderStatus.Paid)
.ToList();

Select is a projection or a conversion for each item. Think like the SELECT clause of SQL - it changes the output. You want to use Where which is a deferred filtering.

From MSDN - IEnumerable.Select
Projects each element of a sequence into a new form.
Remarks
This method is implemented by using deferred execution. The immediate return value is an object that stores all the information that is required to perform the action. The query represented by this method is not executed until the object is enumerated either by calling its GetEnumerator method directly or by using foreach in Visual C# or For Each in Visual Basic.
This projection method requires the transform function, selector, to produce one value for each value in the source sequence, source. If selector returns a value that is itself a collection, it is up to the consumer to traverse the subsequences manually. In such a situation, it might be better for your query to return a single coalesced sequence of values. To achieve this, use the SelectMany method instead of Select. Although SelectMany works similarly to Select, it differs in that the transform function returns a collection that is then expanded by SelectMany before it is returned.
In query expression syntax, a select (Visual C#) or Select (Visual Basic) clause translates to an invocation of Select.
IEnumerable<int> squares = Enumerable.Range(1, 10).Select(x => x * x);
squares.ToList().ForEach(num => Console.WriteLine(num));
The output will be:
1
4
9
16
25
36
49
64
81
100
You may also use the IEnumerable.Select to only select a fewer properties as well from an object, which will cause the creation of an anonymous type.
What you want is to use the IEnumerable.Where() method.
From MSDN - IEnumerable.Where
Filters a sequence of values based on a predicate.
Remarks
This method is implemented by using deferred execution. The immediate return value is an object that stores all the information that is required to perform the action. The query represented by this method is not executed until the object is enumerated either by calling its GetEnumerator method directly or by using foreach in Visual C# or For Each in Visual Basic.
In query expression syntax, a where (Visual C#) or Where (Visual Basic) clause translates to an invocation of Where<TSource>(IEnumerable<TSource>, Func<TSource, Boolean>).
To answer your question
Use the Where method like so.
var s = ordersInfo.Where(x => x.Customer.Email == user.Email
&& x.Status == OrderStatus.Paid)
.ToList();
Which will actually filter the list based on the criterion given as the predicate, and the list shall get filtered upon the call of the ToList() method, as Where<T> is deferred filtering.

Equivalent sorting with LINQ orderby keyword vs. OrderBy extension method

I'm trying to come up with a set of chained OrderBy/ThenBy extension method calls that is equivalent to a LINQ statement using the orderby keyword.
My LINQ statement using the orderby keyword looks like this:
List<Summary> sortedSummaries = new List<Summary>(
from summary in unsortedSummaries
orderby summary.FieldA ascending,
summary.FieldB ascending,
summary.FieldC ascending,
summary.FieldD ascending
select summary);
Now, I would guess that an equivalent version, using chained OrderBy/ThenBy extension method calls would look like this:
List<Summary> sortedSummaries = new List<Summary>(unsortedSummaries);
sortedSummaries.OrderBy(x => x.FieldA).ThenBy(x => x.FieldB).ThenBy(x => x.FieldC).ThenBy(x => x.FieldD);
However, this gives me quite different results than using my LINQ statement using the orderby keyword.
What might I be doing wrong here?
The reason I'm trying to convert to using chained OrderBy/ThenBy extension method calls is that I need to use a custom comparer on FieldD, like so:
.ThenBy(x => x.FieldD, new NaturalSortComparer())
I can't figure how to do that using LINQ statements with the orderby keyword, so I thought using extension methods might get me farther, but since I'm not able to get my extension method version to give me the same results as my orderby keyword version, I'm back to the drawing board, for the moment.
Any ideas?

Your dot notation call isn't assigning the result to anything. It ought to be:
var sortedSummaries = unsortedSummaries.OrderBy(x => x.FieldA)
.ThenBy(x => x.FieldB)
.ThenBy(x => x.FieldC)
.ThenBy(x => x.FieldD);
Don't forget that LINQ operators never change the existing sequence - they return a new sequence with the appropriate differences (in this case, ordering). If you want to sort in-place, use List<T>.Sort.
The query above is exactly what the LINQ part of your original code does. However, if I wanted to construct a list from it, I wouldn't pass it to the List<T> constructor - I'd use the ToList extension method:
var sortedSummaries = unsortedSummaries.OrderBy(x => x.FieldA)
.ThenBy(x => x.FieldB)
.ThenBy(x => x.FieldC)
.ThenBy(x => x.FieldD)
.ToList();
Check that that does what you expect it to (it really should) - and then you can put your custom comparison in.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Why the order of LINQ to objects methods counts - c#

Linq to Objects does not reorder to avoid a would-be run-time step to do something that should be optimized at coding time. The resharpers of the world may at some point introduce code analysis tools to smoke out optimization opportunities like this, but it is definitely not a job for the runtime.

Related

IN Operator between 2 collection objects

How to deconstruct a LINQ IQueryable / Expression into the constituent parts?

SelectMany query with Where produces many SQL queries

How to select items from IEnumerable?

Equivalent sorting with LINQ orderby keyword vs. OrderBy extension method

Categories

Resources