What is the difference between these two Linq queries:
var result = ResultLists().Where( c=> c.code == "abc").FirstOrDefault();
// vs.
var result = ResultLists().FirstOrDefault( c => c.code == "abc");
Are the semantics exactly the same?
Iff sematically equal, does the predicate form of FirstOrDefault offer any theoretical or practical performance benefit over Where() plus plain FirstOrDefault()?
Either is fine.
They both run lazily - if the source list has a million items, but the tenth item matches then both will only iterate 10 items from the source.
Performance should be almost identical and any difference would be totally insignificant.
The second one. All other things being equal, the iterator in the second case can stop as soon as it finds a match, where the first one must find all that match, and then pick the first of those.
Nice discussion, all the above answers are correct.
I didn't run any performance test, whereas on the bases of my experience FirstOrDefault() sometimes faster and optimize as compare to Where().FirstOrDefault().
I recently fixed the memory overflow/performance issue ("neural-network algorithm") and fix was changing Where(x->...).FirstOrDefault() to simply FirstOrDefault(x->..).
I was ignoring the editor's recommendation to change Where(x->...).FirstOrDefault() to simply FirstOrDefault(x->..).
So I believe the correct answer to the above question is
The second option is the best approach in all cases
Where is actually a deferred execution - it means, the evaluation of an expression is delayed until its realized value is actually required. It greatly improves performance by avoiding unnecessary execution.
Where looks kind of like this, and returns a new IEnumerable
foreach (var item in enumerable)
{
if (condition)
{
yield return item;
}
}
FirstOrDefault() returns <T> and not throw any exception or return null when there is no result
Related
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
LINQ extension methods - Any() vs. Where() vs. Exists()
Given a list of objects in memory I ran the following two expressions:
myList.where(x => x.Name == "bla").Any()
vs
myList.Any(x => x.Name == "bla")
The latter was fastest always, I believe this is due to the Where enumerating all items. But this also happens when there's no matches.
Im not sure of the exact WHY though. Are there any cases where this viewed performance difference wouldn't be the case, like if it was querying Nhib?
Cheers.
The Any() with the predicate can perform its task without an iterator (yield return). Using a Where() creates an iterator, which adds has a performance impact (albeit very small).
Thus, performance-wise (by a bit), you're better off using the form of Any() that takes the predicate (x => x.Name == "bla"). Which, personally, I find more readable as well...
On a side note, Where() does not necessarily enumerate over all elements, it just creates an iterator that will travel over the elements as they are requested, thus the call to Any() after the Where() will drive the iteration, which will stop at the first item it finds that matches the condition.
So the performance difference is not that Where() iterates over all the items (in linq-to-objects) because it really doesn't need to (unless, of course, it doesn't find one that satisfies it), it's that the Where() clause has to set up an iterator to walk over the elements, whereas Any() with a predicate does not.
Assuming you correct where to Where and = to ==, I'd expect the "Any with a predicate" version to execute very slightly faster. However, I would expect the situations in which the difference was significant to be few and far between, so you should aim for readability first.
As it happens, I would normally prefer the "Any with a predicate" version in terms of readability too, so you win on both fronts - but you should really go with what you find more readable first. Measure the performance in scenarios you actually care about, and if a section of code isn't performing as you need it to, then consider micro-optimizing it - measuring at every step, of course.
I believe this is due to the Where enumerating all items.
If myList is a collection in memory, it doesn't. The Where method uses deferred execution, so it will only enumerate as many items as needed to determine the result. In that case you would not see any significant difference between .Any(...) and .Where(...).Any().
Are there any cases where this viewed performance difference wouldn't
be the case, like if it was querying Nhib?
Yes, if myList is a data source that will take the expression generated by the methods and translate to a query to run elsewhere (e.g. LINQ To SQL), you may see a difference. The code that translates the expression simply does a better job at translating one of the expressions.
This question already has an answer here:
And difference between FirstOrDefault(func) & Where(func).FirstOrDefault()?
(1 answer)
Closed 9 years ago.
Can someone explain if these two Linq statements are same or do they differ in terms of execution. I am guessing the result of their execution is the same but please correct me if I am wrong.
var k = db.MySet.Where(a => a.Id == id).SingleOrDefault().Username;
var mo = db.MySet.SingleOrDefault(a => a.Id == id).Username;
Yes, both instructions are functionally equivalent and return the same result. The second is just a shortcut.
However, I wouldn't recommend writing it like this, because SingleOrDefault will return null if there is no item with the specified Id. This will result in a NullReferenceException when you access Username. If you don't expect it to return null, use Single, not SingleOrDefault, because it will give you a more useful error message if your expectation is not met. If you're not sure that a user with that Id exists, use SingleOrDefault, but check the result before accessing its members.
yes. these two linq statements are same.
but i suggest you wirte code like this:
var mo = db.MySet.SingleOrDefault(a => a.Id == id);
if(mo !=null)
{
string username=mo.Username;
}
var k = db.MySet.Where(a => a.Id == id).SingleOrDefault().Username;
var mo = db.MySet.SingleOrDefault(a => a.Id == id).Username;
You asked if they are equivalent...
Yes, they will return the same result, both in LINQ-to-Objects and in LINQ-to-SQL/Entity-Framework
No, they aren't equal, equal in LINQ-to-Objects. Someone had benchmarked them and discovered that the first one is a little faster (because the .Where() has special optimizations based on the type of db.MySet) Reference: https://stackoverflow.com/a/8664387/613130
They're different in the terms of actual code they will execute, but I can't see a situation in which they will give different results. In fact, if you've got Resharper installed, it will recommend you change the former into the latter.
However, I'd in general question why you ever want to do SingleOrDefault() without immediately following it with a null check.
Instead of checking for null I always check for default(T) as the name of the LINQ function implies too. In my opinion its a bit more maintainable code in case the type is changed into a struct or class.
They will both return the same result (or both will throw a NULL reference exception). However, the second one may be more efficient.
The first version will need to enumerate all values that meet the where condition, and then it will check to see if this returned only 1 value. Thus this version might need to enumerate over 100s of values.
The second version will check for only one value that meets the condition at the start. As soon as that version finds 2 values, it will thrown an exception, so it does not have the overhead of enumerating (possibly) 100s of values that will never be used.
If I know there is only one matching item in a collection, is there any way to tell Linq about this so that it will abort the search when it finds it?
I am assuming that both of these search the full collection before returning one item?
var fred = _people.Where((p) => p.Name == "Fred").First();
var bill = _people.Where((p) => p.Name == "Bill").Take(1);
EDIT: People seem fixated on FirstOrDefault, or SingleOrDefault. These have nothing to do with my question. They simply provide a default value if the collection is empty. As I stated, I know that my collection has a single matching item.
AakashM's comment is of most interest to me. I would appear my assumption is wrong but I'm interested why.
For instance, when linq to objects is running the Where() function in my example code, how does it know that there are further operations on its return value?
Your assumption is wrong. LINQ uses deferred execution and lazy evaluation a lot. What this means is that, for example, when you call Where(), it doesn't actually iterate the collection. Only when you iterate the object it returns, will it iterate the original collection. And it will do it in a lazy manner: only as much as is necessary.
So, in your case, neither query will iterate the whole collection: both will iterate it only up to the point where they find the first element, and then stop.
Actually, the second query (with Take()) won't do even that, it will iterate the source collection only if you iterate the result.
This all applies to LINQ to objects. Other providers (LINQ to SQL and others) can behave differently, but at least the principle of deferred execution usually still holds.
I think First() will not scan the whole collection. It will return immediatelly after the first match. But I suggest to use FirstOrDefault() instead.
EDIT:
Difference between First() and FirstOrDefault() (from MSDN):
The First() method throws an exception if source contains no elements. To instead return a default value when the source sequence is empty, use the FirstOrDefault() method.
Enumerable.First
Substitue .Where( by .SingleorDefault(
This will find the first and only item for you.
But you can't do this for any given number. If you need 2 items, you'll need to get the entire collection.
However, you shouldn't worry about time. The most effort is used in opening a database connection and establishing a query. Executing the query doesn't take that much time, so there's no real reason to stop a query halfway :-)
I have a huge IEnumerable(suppose the name is myItems), which way is more effective?
Solution 1: Filter it first then ForEach.
Array.ForEach(myItems.Where(FILTER-IT-HERE).ToArray(),MY-ACTION);
Solution 2: Do RETURN in MY-ACTION if the item is not up to the mustard.
Array.ForEach(myItems.ToArray(),MY-ACTION-WITH-FILTER);
Is one of them always better than another? Or any other good suggestions? Thanks in advance.
Did you do any measurements? Since WE can't measure the run time of My-Action then only you can. Measure and decide.
Sometimes one has to create benchmark's because similar looking activities could produce radically different and unexpected results.
You do not say what your data source is so I'm going to assume it may be data on an SQL server in which case filtering at the server side will likely always be the best approach because you have minimized the amount of data transfer. Memory access is always faster than data transfer from disk to memory so whenever you can transfer fewer records, you are likely to have better performance.
Well, both times, you're converting to an array, which might not be so efficient if the IEnumerable is very large (like you said). You could create a generic extension method for IEnumerable, like:
public static void ForEach<T>(this IEnumerable<T> current, Action<T> action) {
foreach (var i in current) {
action(i);
}
}
and then you could do this:
IEnumerable<int> ints = new List<int>();
ints.Where(i => i == 5).ForEach(i => Console.WriteLine(i));
If performance is a concern, it's unclear to me why you'd be bothering to construct an entire array in the first place. Why not just this?
foreach (var item in myItems.Where(FILTER-IT-HERE))
MY-ACTION;
Or:
foreach (var item in myItems)
MY-ACTION-WITH-FILTER;
I ask because, while the others are right that you can't really know without testing, I wouldn't expect there to be much difference between the above two options. I would expect there to be a difference, on the other hand, between creating/populating an array (seemingly for no reason) and not creating an array.
Everything else being equal, calling ToArray() first will impart a greater performance hit than when calling it last. Although, as has been stated by others before me,
Why use ToArray() and Array.ForEach() at all?
We don't know that everything else actually is equal since you do not reveal the implementation details of your filter and action.
The idea of LINQ is to work on enumerable collections, so the best LINQ query is the one where you don't use Array.ForEach() and .ToArray() at all.
I would say that this falls into the category of premature optimization. If, after establishing benchmarks, you find that the code is too slow, you can always try each approach and pick the result that works better for you.
Since we don't know how the IEnumerable<> is produced it's hard to say which approach will perform better. We also don't know how many items will remain after you apply your predicate - nor do we know whether the action or iteration steps are going to be the dominant factor in the execution of your code. The only way to know for sure is to try it both ways, profile the results, and pick the best.
Performance aside, I would choose the version that is most clear - which (for me) is to first filter and then apply the projection to the result.
I'm looking through a generic list to find items based on a certain parameter.
In General, what would be the best and fastest implementation?
1. Looping through each item in the list and saving each match to a new list and returning that
foreach(string s in list)
{
if(s == "match")
{
newList.Add(s);
}
}
return newList;
Or
2. Using the FindAll method and passing it a delegate.
newList = list.FindAll(delegate(string s){return s == "match";});
Don't they both run in ~ O(N)? What would be the best practice here?
Regards,
Jonathan
You should definitely use the FindAll method, or the equivalent LINQ method. Also, consider using the more concise lambda instead of your delegate if you can (requires C# 3.0):
var list = new List<string>();
var newList = list.FindAll(s => s.Equals("match"));
I would use the FindAll method in this case, as it is more concise, and IMO, has easier readability.
You are right that they are pretty much going to both perform in O(N) time, although the foreach statement should be slightly faster given it doesn't have to perform a delegate invocation (delegates incur a slight overhead as opposed to directly calling methods).
I have to stress how insignificant this difference is, it's more than likely never going to make a difference unless you are doing a massive number of operations on a massive list.
As always, test to see where the bottlenecks are and act appropriately.
Jonathan,
A good answer you can find to this is in chapter 5 (performance considerations) of Linq To Action.
They measure a for each search that executes about 50 times and that comes up with foreach = 68ms per cycle / List.FindAll = 62ms per cycle. Really, it would probably be in your interest to just create a test and see for yourself.
List.FindAll is O(n) and will search the entire list.
If you want to run your own iterator with foreach, I'd recommend using the yield statement, and returning an IEnumerable if possible. This way, if you end up only needing one element of your collection, it will be quicker (since you can stop your caller without exhausting the entire collection).
Otherwise, stick to the BCL interface.
Any perf difference is going to be extremely minor. I would suggest FindAll for clarity, or, if possible, Enumerable.Where. I prefer using the Enumerable methods because it allows for greater flexibility in refactoring the code (you don't take a dependency on List<T>).
Yes, they both implementations are O(n). They need to look at every element in the list to find all matches. In terms of readability I would also prefer FindAll. For performance considerations have a look at LINQ in Action (Ch 5.3). If you are using C# 3.0 you could also apply a lambda expression. But that's just the icing on the cake:
var newList = aList.FindAll(s => s == "match");
Im with the Lambdas
List<String> newList = list.FindAll(s => s.Equals("match"));
Unless the C# team has improved the performance for LINQ and FindAll, the following article seems to suggest that for and foreach would outperform LINQ and FindAll on object enumeration: LINQ on Objects Performance.
This artilce was dated back to March 2009, just before this post originally asked.