Does LINQ enhance the performance by eliminating looping? - c#

I've used Linq against some collection objects (Dictionary, List). So if I want to select items based on a criteria I write a Linq query and then enumerate the linq object. So my question is that is Linq eliminating looping the main collection and as a result improving the performance?

Absolutely not. LINQ to Objects loops internally - how else could it work?
On the other hand, LINQ is more efficient than some approaches you could take, by streaming the data only when it's required etc.
On the third hand, it involves extra layers of indirection (all the iterators etc) which will have some marginal effect on performance.

Probbaly not. LINQ lends itself to terse (hopefully) readable code.
Under the covers it's looping, unless the backing data structure supports a more efficient searching algorithm than scanning.

When you use the query directly, then you still loop over the whole collection.
You just don't see everything, because the query will only return elements that match your filter.
The overall performance will probably even take a hit, simply because of all those nested iterators that are involved.
When you called ToList() on your query result, and then used this result several times, then you'd be better off performance-wise.

No, in fact if you are using LINQ to SQL, the performance will be a little worse because LINQ after all is an additional layer on top of the ado.net stack.
if you using linq over objects. there are optimizations done by linq, the most important one is "Yield" which starts to yield results from an IEnumerable as it gets generated. which is better than the standard approach which has to wait for a List to be filled and returned by the function in order to iterate over it.

Related

Fastest way to check whether a single element in common exists between two enumerables

I have a method I'm writing where I want to be able to filter orders based on whether they have one or more ordered products in them that exist in the selection of products made by the user. Currently I'm doing this with:
SelectedProductIDs.Intersect(orderProductIDs).Any()
executed on each order (~20,000 orders total in the database and expected to grow quickly), where both SelectedProducts and orderProductIDs are string[]. I've also attempted to use pre-generated HashSets for both SelectedProductIDs and orderProductIDs, but this made no appreciable difference in the speed of comparison.
However, both of these are unpleasantly slow - ~300ms per selection change - particularly given that the dates made available to the sliders within the UI are predicated entirely on the results of this query, so user interaction has to halt in some fashion. Is there a (very) significantly faster way to do this?
Edit: May not have been clear enough - order objects are materialized from SQL data at launch-time and these queries are performed later, in a secondary window of the overall application. SQL is irrelevant to the specifics of this question; this is a LINQ-to-Objects question.
The LINQ intersect is going to reconstruct a new HashSet based on the input value no matter what you do, even if the input is already a HashSet. Its implementation mutates the hash set internally (which is how it avoids yielding duplicate values) so it is important to make a copy of the input sequence, even if it's already a HashSet.
You can create your own Intersect method that accepts a hashset, instead of populating a new one. To avoid mutating it though, you'll have to settle for a bag-based Intersect, rather than a set based Intersect (i.e., duplicates in the sequence will all be yielded). Clearly that's not a problem in your case:
public static IEnumerable<T> IntersectAll<T>(
this HashSet<T> set, IEnumerable<T> sequence)
{
foreach (var item in sequence)
if (set.Contains(item))
yield return item;
}
Now you can write:
SelectedProductIDs.InsersectAll(orderProductIDs).Any();
And the hashset won't need to be re-constructed each time.
It sounds like you are reading all the values from the database into memory and then querying. If you instead use LINQ to EF, it will translate the LINQ query into a SQL query that gets run on the database, which could be significantly faster.

When to force LINQ query evaluation?

What's the accepted practice on forcing evaluation of LINQ queries with methods like ToArray() and are there general heuristics for composing optimal chains of queries? I often try to do everything in a single pass because I've noticed in those instances that AsParallel() does a really good job in speeding up the computation. In cases where the queries perform computations with no side-effects but several passes are required to get the right data out is forcing the computation with ToArray() the right way to go or is it better to leave the query in lazy form?
If you are not averse to using an 'experimental' library, you could use the EnumerableEx.Memoize extension method from the Interactive Extensions library.
This method provides a best-of-both-worlds option where the underlying sequence is computed on-demand, but is not re-computed on subequent passes. Another small benefit, in my opinion, is that the return type is not a mutable collection, as it would be with ToArray or ToList.
Keep the queries in lazy form until you start to evaluate the query multiple times, or even earlier if you need them in another form or you are in danger of variables captured in closures changing their values.
You may want to evaluate when the query contains complex projections which you want to avoid performing multiple times (e.g. constructing complex objects for sequences with lots of elements). In this case evaluating once and iterating many times is much saner.
You may need the results in another form if you want to return them or pass them to another API that expects a specific type of collection.
You may want or need to prevent accessing modified closures if the query captures variables which are not local in scope. Until the query is actually evaluated, you are in danger of other code changing their values "behind your back"; when the evaluation happens, it will use these values instead of those present when the query was constructed. (However, this can be worked around by making a copy of those values in another variable that does have local scope).
You would normally only use ToArray() when you need to use an array, like with an API that expects an array. As long as you don't need to access the results of a query, and you're not confined to some kind of connection context (like the case may be in LINQ to SQL or LINQ to Entities), then you might as well just keep the query in lazy form.

optimizing Where clause in linq

What's better?
1) several Where clauses with one filter per clause
2) 1 Where clause with lots of && between filters
I'm using linq-to-sql
Thanks.
In linq-to-objects multiple && is most likely faster since it incurs the delegate invocation overhead only once.
For most IQueryable based linq implementations it's probably almost the same for both of them, since they most likely will be optimized to the same internal query. The amount of work done by the optimizer might differ slightly.
For linq-to-sql it doesn't matter because the query is evaluated once and executed when you need it.
However, you still need to worry about the performance of the query itself. You can get into trouble by not having the correct indexes in place.

Indexed properties with Linq?

In a database we create indexes on columns that we want to query with joins.
Does Linq to objects facilitate this in any way?
I imagine that search performance could be (much) improved when somehow List's can be supported by binary trees (indexes) in memory that are mapped to specific properties of T.
I am thinking of Lists that do not have to be optimized for inserts or deletes.
One could turn off the index for another optimization.
No; LINQ does not use indexes.
Instead, you can use i4o, which does.
Note that many LINQ operations, such as Distinct, Join, GroupBy, and others, will build a hasttable (or hashset, as appropriate) to avoid O(n2) performance.
For more information, see Jon Skeet's EduLINQ series.

In-memory LINQ performance

More than about LINQ to [insert your favorite provider here], this question is about searching or filtering in-memory collections.
I know LINQ (or searching/filtering extension methods) works in objects implementing IEnumerable or IEnumerable<T>. The question is: because of the nature of enumeration, is every query complexity at least O(n)?
For example:
var result = list.FirstOrDefault(o => o.something > n);
In this case, every algorithm will take at least O(n) unless list is ordered with respect to 'something', in which case the search should take O(log(n)): it should be a binary search. However, If I understand correctly, this query will be resolved through enumeration, so it should take O(n), even in list was previously ordered.
Is there something I can do to solve a query in O(log(n))?
If I want performance, should I use Array.Sort and Array.BinarySearch?
Even with parallelisation, it's still O(n). The constant factor would be different (depending on your number of cores) but as n varied the total time would still vary linearly.
Of course, you could write your own implementations of the various LINQ operators over your own data types, but they'd only be appropriate in very specific situations - you'd have to know for sure that the predicate only operated on the optimised aspects of the data. For instance, if you've got a list of people that's ordered by age, it's not going to help you with a query which tries to find someone with a particular name :)
To examine the predicate, you'd have to use expression trees instead of delegates, and life would become a lot harder.
I suspect I'd normally add new methods which make it obvious that you're using the indexed/ordered/whatever nature of the data type, and which will always work appropriately. You couldn't easily invoke those extra methods from query expressions, of course, but you can still use LINQ with dot notation.
Yes, the generic case is always O(n), as Sklivvz said.
However, many LINQ methods special case for when the object implementing IEnumerable actually implements e.g. ICollection. (I've seen this for IEnumerable.Contains at least.)
In practice this means that LINQ IEnumerable.Contains calls the fast HashSet.Contains for example if the IEnumerable actually is a HashSet.
IEnumerable<int> mySet = new HashSet<int>();
// calls the fast HashSet.Contains because HashSet implements ICollection.
if (mySet.Contains(10)) { /* code */ }
You can use reflector to check exactly how the LINQ methods are defined, that is how I figured this out.
Oh, and also LINQ contains methods IEnumerable.ToDictionary (maps key to single value) and IEnumerable.ToLookup (maps key to multiple values). This dictionary/lookup table can be created once and used many times, which can speed up some LINQ-intensive code by orders of magnitude.
Yes, it has to be, because the only way of accessing any member of an IEnumerable is by using its methods, which means O(n).
It seems like a classic case in which the language designers decided to trade performance for generality.

Categories

Resources