optimizing Where clause in linq

optimizing Where clause in linq - c#

What's better?
1) several Where clauses with one filter per clause
2) 1 Where clause with lots of && between filters
I'm using linq-to-sql
Thanks.

In linq-to-objects multiple && is most likely faster since it incurs the delegate invocation overhead only once.
For most IQueryable based linq implementations it's probably almost the same for both of them, since they most likely will be optimized to the same internal query. The amount of work done by the optimizer might differ slightly.

For linq-to-sql it doesn't matter because the query is evaluated once and executed when you need it.
However, you still need to worry about the performance of the query itself. You can get into trouble by not having the correct indexes in place.

Related

Is there a benefit in using Expressions to build dynamic LINQ queries compared to chaining Funcs if I am not using SQL?

I need to build a dynamic query that can query a large list of objects and get the objects which satisfy a complex predicate known at runtime. I know I want to do it upfront and pass it into the collection to filter on, rather than create some complex switch case on the collection itself.
Everything points me to Expressions and Predicate Builder, which I'm happy to use to chain together expressions in a loop like:
Expression<Func<MyObject, bool>> query = PredicateBuilder.True<MyObject>();
query = query.And(x => x.Field == passedInSearchCriterion)
but I could also do that with:
Func<MyObject, bool> query = x => true;
query = x => query(x) && (x => x.Field == passedInSearchCriterion)
I know the first is better in the case of LINQ to SQL converting it to SQL to execute in the database etc when given to entity framework or something.
But say they were both run locally, not in a database, on a large list, is there any performance difference then in terms of how the resulting function is executed?

I know the first is better because of LINQ to SQL converting it to SQL to execute in the database etc when given to entity framework or something.
No, you don't "know" it's better because you don't understand the difference between expressions and delegates.
That main difference is that expressions are effectively descriptions of a piece of code, and can be inspected to find out information like parameter names - this is why ORMs use them, to map POCOs to SQL columns - while delegates are nothing more than pointers to a method to be executed. As such, there are optimizations the C# compiler can perform on delegates, which it cannot do for expressions. Further details here.
So yes, there will be a performance difference, almost certainly in favour of delegates. Whether that difference is quantifiable and/or relevant to your use-case is something only you can determine via benchmarks.
But any performance difference is irrelevant anyway, because you don't need expressions for your use-case. Just use delegates, which will always be faster.

Is there a "correct" way between these two statements that filter and return a boolean using LINQ to Objects? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
LINQ extension methods - Any() vs. Where() vs. Exists()
Given a list of objects in memory I ran the following two expressions:
myList.where(x => x.Name == "bla").Any()
vs
myList.Any(x => x.Name == "bla")
The latter was fastest always, I believe this is due to the Where enumerating all items. But this also happens when there's no matches.
Im not sure of the exact WHY though. Are there any cases where this viewed performance difference wouldn't be the case, like if it was querying Nhib?
Cheers.

The Any() with the predicate can perform its task without an iterator (yield return). Using a Where() creates an iterator, which adds has a performance impact (albeit very small).
Thus, performance-wise (by a bit), you're better off using the form of Any() that takes the predicate (x => x.Name == "bla"). Which, personally, I find more readable as well...
On a side note, Where() does not necessarily enumerate over all elements, it just creates an iterator that will travel over the elements as they are requested, thus the call to Any() after the Where() will drive the iteration, which will stop at the first item it finds that matches the condition.
So the performance difference is not that Where() iterates over all the items (in linq-to-objects) because it really doesn't need to (unless, of course, it doesn't find one that satisfies it), it's that the Where() clause has to set up an iterator to walk over the elements, whereas Any() with a predicate does not.

Assuming you correct where to Where and = to ==, I'd expect the "Any with a predicate" version to execute very slightly faster. However, I would expect the situations in which the difference was significant to be few and far between, so you should aim for readability first.
As it happens, I would normally prefer the "Any with a predicate" version in terms of readability too, so you win on both fronts - but you should really go with what you find more readable first. Measure the performance in scenarios you actually care about, and if a section of code isn't performing as you need it to, then consider micro-optimizing it - measuring at every step, of course.

I believe this is due to the Where enumerating all items.
If myList is a collection in memory, it doesn't. The Where method uses deferred execution, so it will only enumerate as many items as needed to determine the result. In that case you would not see any significant difference between .Any(...) and .Where(...).Any().
Are there any cases where this viewed performance difference wouldn't
be the case, like if it was querying Nhib?
Yes, if myList is a data source that will take the expression generated by the methods and translate to a query to run elsewhere (e.g. LINQ To SQL), you may see a difference. The code that translates the expression simply does a better job at translating one of the expressions.

When to force LINQ query evaluation?

What's the accepted practice on forcing evaluation of LINQ queries with methods like ToArray() and are there general heuristics for composing optimal chains of queries? I often try to do everything in a single pass because I've noticed in those instances that AsParallel() does a really good job in speeding up the computation. In cases where the queries perform computations with no side-effects but several passes are required to get the right data out is forcing the computation with ToArray() the right way to go or is it better to leave the query in lazy form?

If you are not averse to using an 'experimental' library, you could use the EnumerableEx.Memoize extension method from the Interactive Extensions library.
This method provides a best-of-both-worlds option where the underlying sequence is computed on-demand, but is not re-computed on subequent passes. Another small benefit, in my opinion, is that the return type is not a mutable collection, as it would be with ToArray or ToList.

Keep the queries in lazy form until you start to evaluate the query multiple times, or even earlier if you need them in another form or you are in danger of variables captured in closures changing their values.
You may want to evaluate when the query contains complex projections which you want to avoid performing multiple times (e.g. constructing complex objects for sequences with lots of elements). In this case evaluating once and iterating many times is much saner.
You may need the results in another form if you want to return them or pass them to another API that expects a specific type of collection.
You may want or need to prevent accessing modified closures if the query captures variables which are not local in scope. Until the query is actually evaluated, you are in danger of other code changing their values "behind your back"; when the evaluation happens, it will use these values instead of those present when the query was constructed. (However, this can be worked around by making a copy of those values in another variable that does have local scope).

You would normally only use ToArray() when you need to use an array, like with an API that expects an array. As long as you don't need to access the results of a query, and you're not confined to some kind of connection context (like the case may be in LINQ to SQL or LINQ to Entities), then you might as well just keep the query in lazy form.

Does LINQ enhance the performance by eliminating looping?

I've used Linq against some collection objects (Dictionary, List). So if I want to select items based on a criteria I write a Linq query and then enumerate the linq object. So my question is that is Linq eliminating looping the main collection and as a result improving the performance?

Absolutely not. LINQ to Objects loops internally - how else could it work?
On the other hand, LINQ is more efficient than some approaches you could take, by streaming the data only when it's required etc.
On the third hand, it involves extra layers of indirection (all the iterators etc) which will have some marginal effect on performance.

Probbaly not. LINQ lends itself to terse (hopefully) readable code.
Under the covers it's looping, unless the backing data structure supports a more efficient searching algorithm than scanning.

When you use the query directly, then you still loop over the whole collection.
You just don't see everything, because the query will only return elements that match your filter.
The overall performance will probably even take a hit, simply because of all those nested iterators that are involved.
When you called ToList() on your query result, and then used this result several times, then you'd be better off performance-wise.

No, in fact if you are using LINQ to SQL, the performance will be a little worse because LINQ after all is an additional layer on top of the ado.net stack.
if you using linq over objects. there are optimizations done by linq, the most important one is "Yield" which starts to yield results from an IEnumerable as it gets generated. which is better than the standard approach which has to wait for a List to be filled and returned by the function in order to iterate over it.

Am I misunderstanding LINQ to SQL .AsEnumerable()?

Consider this code:
var query = db.Table
.Where(t => SomeCondition(t))
.AsEnumerable();
int recordCount = query.Count();
int totalSomeNumber = query.Sum();
decimal average = query.Average();
Assume query takes a very long time to run. I need to get the record count, total SomeNumber's returned, and take an average at the end. I thought based on my reading that .AsEnumerable() would execute the query using LINQ-to-SQL, then use LINQ-to-Objects for the Count, Sum, and Average. Instead, when I do this in LINQPad, I see the same query is run three times. If I replace .AsEnumerable() with .ToList(), it only gets queried once.
Am I missing something about what AsEnumerable is/does?

Calling AsEnumerable() does not execute the query, enumerating it does.
IQueryable is the interface that allows LINQ to SQL to perform its magic. IQueryable implements IEnumerable so when you call AsEnumerable(), you are changing the extension-methods being called from there on, ie from the IQueryable-methods to the IEnumerable-methods (ie changing from LINQ to SQL to LINQ to Objects in this particular case). But you are not executing the actual query, just changing how it is going to be executed in its entirety.
To force query execution, you must call ToList().

Yes. All that AsEnumerable will do is cause the Count, Sum, and Average functions to be executed client-side (in other words, it will bring back the entire result set to the client, then the client will perform those aggregates instead of creating COUNT() SUM() and AVG() statements in SQL).

Justin Niessner's answer is perfect.
I just want to quote a MSDN explanation here: .NET Language-Integrated Query for Relational Data
The AsEnumerable() operator, unlike ToList() and ToArray(), does not cause execution of the query. It is still deferred. The AsEnumerable() operator merely changes the static typing of the query, turning a IQueryable into an IEnumerable, tricking the compiler into treating the rest of the query as locally executed.
I hope this is what is meant by:
IQueryable-methods to the IEnumerable-methods (ie changing from LINQ to SQL to LINQ to Objects
Once it is LINQ to Objects we can apply object's methods (e.g. ToString()). This is the explanation for one of the frequently asked questions about LINQ - Why LINQ to Entities does not recognize the method 'System.String ToString()?
According to ASENUMERABLE - codeblog.jonskeet, AsEnumerable can be handy when:
some aspects of the query in the database, and then a bit more manipulation in .NET – particularly if there are aspects you basically can’t implement in LINQ to SQL (or whatever provider you’re using).
It also says:
All we’re doing is changing the compile-time type of the sequence which is propagating through our query from IQueryable to IEnumerable – but that means that the compiler will use the methods in Enumerable (taking delegates, and executing in LINQ to Objects) instead of the ones in Queryable (taking expression trees, and usually executing out-of-process).
Finally, also see this related question: Returning IEnumerable vs. IQueryable

Well, you are on the right track. The problem is that an IQueryable (what the statement is before the AsEnumerable call) is also an IEnumerable, so that call is, in effect, a nop. It will require forcing it to a specific in-memory data structure (e.g., ToList()) to force the query.

I would presume that ToList forces Linq to fetch the records from the database. When you then perform the proceeding calculations they are done against the in memory objects rather than involving the database.
Leaving the return type as an Enumerable means that the data is not fetched until it is called upon by the code performing the calculations. I guess the knock on of this is that the database is hit three times - one for each calculation and the data is not persisted to memory.

Just adding a little more clarification:
I thought based on my reading that .AsEnumerable() would execute the query using LINQ-to-SQL
It will not execute the query right away, as Justin's answer explains. It only will be materialized (hit the database) later on.
Instead, when I do this in LINQPad, I see the same query is run three times.
Yes, and note that all three queries are exact the same, basically fetching all rows from the given condition into memory and then computing the count/sum/avg locally.
If I replace .AsEnumerable() with .ToList(), it only gets queried once.
But still getting all data into memory, with the advantage that now it run only once.
If performance improvement is a concern, just remove .AsEnumerable() and then the count/sum/avg will be translated correctly to their SQL correspondents. Doing so three queries will run (probably faster if there are index satisfying the conditions) but with a lot less memory footprint.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.