Am I misunderstanding LINQ to SQL .AsEnumerable()?

Am I misunderstanding LINQ to SQL .AsEnumerable()? - c#

Consider this code:
var query = db.Table
.Where(t => SomeCondition(t))
.AsEnumerable();
int recordCount = query.Count();
int totalSomeNumber = query.Sum();
decimal average = query.Average();
Assume query takes a very long time to run. I need to get the record count, total SomeNumber's returned, and take an average at the end. I thought based on my reading that .AsEnumerable() would execute the query using LINQ-to-SQL, then use LINQ-to-Objects for the Count, Sum, and Average. Instead, when I do this in LINQPad, I see the same query is run three times. If I replace .AsEnumerable() with .ToList(), it only gets queried once.
Am I missing something about what AsEnumerable is/does?

Calling AsEnumerable() does not execute the query, enumerating it does.
IQueryable is the interface that allows LINQ to SQL to perform its magic. IQueryable implements IEnumerable so when you call AsEnumerable(), you are changing the extension-methods being called from there on, ie from the IQueryable-methods to the IEnumerable-methods (ie changing from LINQ to SQL to LINQ to Objects in this particular case). But you are not executing the actual query, just changing how it is going to be executed in its entirety.
To force query execution, you must call ToList().

Yes. All that AsEnumerable will do is cause the Count, Sum, and Average functions to be executed client-side (in other words, it will bring back the entire result set to the client, then the client will perform those aggregates instead of creating COUNT() SUM() and AVG() statements in SQL).

Justin Niessner's answer is perfect.
I just want to quote a MSDN explanation here: .NET Language-Integrated Query for Relational Data
The AsEnumerable() operator, unlike ToList() and ToArray(), does not cause execution of the query. It is still deferred. The AsEnumerable() operator merely changes the static typing of the query, turning a IQueryable into an IEnumerable, tricking the compiler into treating the rest of the query as locally executed.
I hope this is what is meant by:
IQueryable-methods to the IEnumerable-methods (ie changing from LINQ to SQL to LINQ to Objects
Once it is LINQ to Objects we can apply object's methods (e.g. ToString()). This is the explanation for one of the frequently asked questions about LINQ - Why LINQ to Entities does not recognize the method 'System.String ToString()?
According to ASENUMERABLE - codeblog.jonskeet, AsEnumerable can be handy when:
some aspects of the query in the database, and then a bit more manipulation in .NET – particularly if there are aspects you basically can’t implement in LINQ to SQL (or whatever provider you’re using).
It also says:
All we’re doing is changing the compile-time type of the sequence which is propagating through our query from IQueryable to IEnumerable – but that means that the compiler will use the methods in Enumerable (taking delegates, and executing in LINQ to Objects) instead of the ones in Queryable (taking expression trees, and usually executing out-of-process).
Finally, also see this related question: Returning IEnumerable vs. IQueryable

Well, you are on the right track. The problem is that an IQueryable (what the statement is before the AsEnumerable call) is also an IEnumerable, so that call is, in effect, a nop. It will require forcing it to a specific in-memory data structure (e.g., ToList()) to force the query.

I would presume that ToList forces Linq to fetch the records from the database. When you then perform the proceeding calculations they are done against the in memory objects rather than involving the database.
Leaving the return type as an Enumerable means that the data is not fetched until it is called upon by the code performing the calculations. I guess the knock on of this is that the database is hit three times - one for each calculation and the data is not persisted to memory.

Just adding a little more clarification:
I thought based on my reading that .AsEnumerable() would execute the query using LINQ-to-SQL
It will not execute the query right away, as Justin's answer explains. It only will be materialized (hit the database) later on.
Instead, when I do this in LINQPad, I see the same query is run three times.
Yes, and note that all three queries are exact the same, basically fetching all rows from the given condition into memory and then computing the count/sum/avg locally.
If I replace .AsEnumerable() with .ToList(), it only gets queried once.
But still getting all data into memory, with the advantage that now it run only once.
If performance improvement is a concern, just remove .AsEnumerable() and then the count/sum/avg will be translated correctly to their SQL correspondents. Doing so three queries will run (probably faster if there are index satisfying the conditions) but with a lot less memory footprint.

Related

When is data really retrieved from database using EF Core?

To retrieve data, first I wrote LINQ query, that I expect not to executed on database until I call the FirstAsync() method.
var query =
from tkn in Db.Set<TableA>()
where tkn.IsActive == true
where tkn.Token == token
select tkn.RelatedObjectMappedToTableB;
var retrievedObject = await clientQuery.FirstAsync();
The problem is while debugging I can see the values of the related object in the Watch of visual studio, before reaching the call to FirstAsync()
This means to me that database is already queried and EF has not waited until I ask it to do so.
Why is it doing so? Is my opinion about when a linq query is executed wrong?

This means to me that database is already queried and ef has not waited until I ask it to do so.
No, the database is being queried exactly because you asked it to do so.
Is my opinion about when a linq query is executed wrong?
No, it is not, however,
Why is it doing so?
LINQ retrieves the values in a lazy way, which means that it waits until you perform some enumeration on it. Using Visual Studio's Watch Window is asking the query to be evaluated so you can see the actual details.

LINQ support two execution behaviors deferred execution and immediate execution. Deferred execution means that the evaluation of an expression is delayed until its realized value is actually required. It greatly improves performance by avoiding unnecessary execution.
Deferred execution is applicable to any in-memory collection as well as remote LINQ providers like LINQ-to-SQL, LINQ-to-Entities, LINQ-to-XML, etc.
In the provided example in question, it seems that query executed at where but that's not happening actually. The query executed when you call
.FirstAsync(), It means you really need data and C# execute the query, extract data and store data into memory.
If you want immediate execution behavior you can use .ToList() it will extract the complete projected data and save into memory.
var result =(from tkn in Db.Set<TableA>()
where tkn.IsActive == true
& tkn.Token == token
select tkn.RelatedObjectMappedToTableB).ToList();
For more details see article

Using .Where() on a List

Assuming the two following possible blocks of code inside of a view, with a model passed to it using something like return View(db.Products.Find(id));
List<UserReview> reviews = Model.UserReviews.OrderByDescending(ur => ur.Date).ToList();
if (myUserReview != null)
reviews = reviews.Where(ur => ur.Id != myUserReview.Id).ToList();
IEnumerable<UserReview> reviews = Model.UserReviews.OrderByDescending(ur => ur.Date);
if (myUserReview != null)
reviews = reviews.Where(ur => ur.Id != myUserReview.Id);
What are the performance implications between the two? By this point, is all of the product related data in memory, including its navigation properties? Does using ToList() in this instance matter at all? If not, is there a better approach to using Linq queries on a List without having to call ToList() every time, or is this the typical way one would do it?

Read http://blogs.msdn.com/b/charlie/archive/2007/12/09/deferred-execution.aspx
Deferred execution is one of the many marvels intrinsic to linq. The short version is that your data is never touched (it remains idle in the source be that in-memory, or in-database, or wherever). When you construct a linq query all you are doing is creating an IEnumerable class that is 'capable' of enumerating your data. The work doesn't begin until you actually start enumerating and then each piece of data comes all the way from the source, through the pipeline, and is serviced by your code one item at a time. If you break your loop early, you have saved some work - later items are never processed. That's the simple version.
Some linq operations cannot work this way. Orderby is the best example. Orderby has to know every piece of data because it possible that the last piece retrieved from the source very well could be the first piece that you are supposed to get. So when an operation such as orderby is in the pipe, it will actually cache your dataset internally. So all data has been pulled from the source, and has gone through the pipeline, up to the orderby, and then the orderby becomes your new temporary data source for any operations that come afterwards in the expression. Even so, orderby tries as much as possible to follow the deferred execution paradigm by waiting until the last possible moment to build its cache. Including orderby in your query still doesn't do any work, immediately, but the work begins once you start enumerating.
To answer your question directly, your call to ToList is doing exactly that. OrderByDescending is caching the data from your datasource => ToList additionally persists it into a variable that you can actually touch (reviews) => where starts pulling records one at a time from reviews, and if it matches then your final ToList call is storing the results into yet another list in memory.
Beyond the memory implications, ToList is additionally thwarting deferred execution because it also STOPS the processing of your view at the time of the call, to entirely process that entire linq expression, in order to build its in-memory representation of the results.
Now none of this is a real big deal if the number of records we're talking about are in the dozens. You'll never notice the difference at runtime because it happens so quick. But when dealing with large scale datasets, deferring as much as possible for as long as possible in hopes that something will happen allowing you to cancel a full enumeration... in addition to the memory savings... gold.
In your version without ToList: OrderByDescending will still cache a copy of your dataset as processed through the pipeline up to that point, internally, sorted of course. That's ok, you gotta do what you gotta do. But that doesn't happen until you actually try to retrieve your first record later in your view. Once that cache is complete, you get your first record, and for every next record you are then pulling from that cache, checking against the where clause, you get it or you don't based upon that where and have saved a couple of in memory copies and a lot of work.
Magically, I bet even your lead-in of db.Products.Find(id) doesn't even start spinning until your view starts enumerating (if not using ToList). If db.Products is a Linq2SQL datasource, then every other element you've specified will reduce into SQL verbiage, and your entire expression will be deferred.
Hope this helps! Read further on Deferred Execution. And if you want to know 'how' that works, look into c# iterators (yield return). There's a page somewhere on MSDN that I'm having trouble finding that contains the common linq operations, and whether they defer execution or not. I'll update if I can track that down.
/*edit*/ to clarify - all of the above is in the context of raw linq, or Linq2Objects. Until we find that page, common sense will tell you how it works. If you close your eyes and imagine implementing orderby, or any other linq operation, if you can't think of a way to implement it with 'yield return', or without caching, then execution is not likely deferred and a cache copy is likely and/or a full enumeration... orderby, distinct, count, sum, etc... Linq2SQL is a whole other can of worms. Even in that context, ToList will still stop and process the whole expression and store the results because a list, is a list, and is in memory. But Linq2SQL is uniquely capable of deferring many of those aforementioned clauses, and then some, by incorporating them into the generated SQL that is sent to the SQL server. So even orderby can be deferred in this way because the clause will be pushed down into your original datasource and then ignored in the pipe.
Good luck ;)

Not enough context to know for sure.
But ToList() guarantees that the data has been copied into memory, and your first example does that twice.
The second example could involve queryable data or some other on-demand scenario. Even if the original data was all already in memory and even if you only added a call to ToList() at the end, that would be one less copy in-memory than the first example.
And it's entirely possible that in the second example, by the time you get to the end of that little snippet, no actual processing of the original data has been done at all. In that case, the database might not even be queried until some code actually enumerates the final reviews value.
As for whether there's a "better" way to do it, not possible to say. You haven't defined "better". Personally, I tend to prefer the second example...why materialize data before you need it? But in some cases, you do want to force materialization. It just depends on the scenario.

When to force LINQ query evaluation?

What's the accepted practice on forcing evaluation of LINQ queries with methods like ToArray() and are there general heuristics for composing optimal chains of queries? I often try to do everything in a single pass because I've noticed in those instances that AsParallel() does a really good job in speeding up the computation. In cases where the queries perform computations with no side-effects but several passes are required to get the right data out is forcing the computation with ToArray() the right way to go or is it better to leave the query in lazy form?

If you are not averse to using an 'experimental' library, you could use the EnumerableEx.Memoize extension method from the Interactive Extensions library.
This method provides a best-of-both-worlds option where the underlying sequence is computed on-demand, but is not re-computed on subequent passes. Another small benefit, in my opinion, is that the return type is not a mutable collection, as it would be with ToArray or ToList.

Keep the queries in lazy form until you start to evaluate the query multiple times, or even earlier if you need them in another form or you are in danger of variables captured in closures changing their values.
You may want to evaluate when the query contains complex projections which you want to avoid performing multiple times (e.g. constructing complex objects for sequences with lots of elements). In this case evaluating once and iterating many times is much saner.
You may need the results in another form if you want to return them or pass them to another API that expects a specific type of collection.
You may want or need to prevent accessing modified closures if the query captures variables which are not local in scope. Until the query is actually evaluated, you are in danger of other code changing their values "behind your back"; when the evaluation happens, it will use these values instead of those present when the query was constructed. (However, this can be worked around by making a copy of those values in another variable that does have local scope).

You would normally only use ToArray() when you need to use an array, like with an API that expects an array. As long as you don't need to access the results of a query, and you're not confined to some kind of connection context (like the case may be in LINQ to SQL or LINQ to Entities), then you might as well just keep the query in lazy form.

optimizing Where clause in linq

What's better?
1) several Where clauses with one filter per clause
2) 1 Where clause with lots of && between filters
I'm using linq-to-sql
Thanks.

In linq-to-objects multiple && is most likely faster since it incurs the delegate invocation overhead only once.
For most IQueryable based linq implementations it's probably almost the same for both of them, since they most likely will be optimized to the same internal query. The amount of work done by the optimizer might differ slightly.

For linq-to-sql it doesn't matter because the query is evaluated once and executed when you need it.
However, you still need to worry about the performance of the query itself. You can get into trouble by not having the correct indexes in place.

LINQ - Does .Cast<T>() Select Records?

I've noticed that certain command cause LINQtoSQL to connect to the database and download the records that are part of the query, for example, .ToArray().
Does the command .Cast() cause a query to execute (and how can I tell these things in the future?). For example...
IRevision<T> current = context.GetTable(typeof(T))
.Cast<IRevision<T>>()
.SingleOrDefault(o => o.ID == recordId);
I know there is a command for .GetTable that allows you to specify a generic type, but for strange and unexplainable reasons, it cannot be used in this situation.

From Enumerable.Cast()'s remarks:
This method is implemented by using deferred execution. The immediate return value is an object that stores all the information that is required to perform the action. The query represented by this method is not executed until the object is enumerated either by calling its GetEnumerator method directly or by using foreach in Visual C# or For Each in Visual Basic.
All of the LINQ operators will let you know if they are deferred execution or immediate query execution. Additionally, here are the standard LINQ operators which are NOT deferred:
Aggregate
All
Any
Average
Contains
Count
ElementAt
ElementAtOrDefault
First
FirstOrDefault
Last
LastOrDefault
LongCount
Max
Min
SequenceEqual
Single
SingleOrDefault
Sum
ToArray
ToDictionary
ToList
ToLookup

No, it does not. It simply will perform a cast when you iterate through the IEnumerable.
There isn't any definitive way (in code) to know whether or not a method will use deferred execution or not. The documentation is going to be your best friend here as it will tell you if it defers execution or not.
However, that doesn't mean that you can't make some assumptions if the documentation is unclear.
If you have a method that returns another list/structure (like ToList, ToArray), then it will have to execute the query in order to populate the new data structure.
If the method returns a scalar value, then it will have to execute the query to generate that scalar value.
Other than that, if it simply returns IEnumerable<T>, then it more-than-likely is deferring execution. However, that doesn't mean that it's guaranteed, it just means it is more-than-likely.

What you are looking for is called "Deferred Execution". Statements that defer execution only run when you attempt to access the data. Statements such as ToList execute immediately, as the data is needed to transform it into a list.
Cast can wait until you actually access it, so it is a deferred statement.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.