Which are the advantages/drawbacks of both approaches?
return items.Select(item => DoSomething(item));
versus
foreach(var item in items)
{
yield return DoSomething(item);
}
EDIT As they are MSIL roughly equivalent, which one you find more readable?
The yield return technique causes the C# compiler to generate an enumerator class "behind the scenes", while the Select call uses a standard enumerator class parameterized with a delegate. In practice, there shouldn't be a whole lot of difference between the two, other than possibly an extra call frame in the Select case, for the delegate.
For what it's worth, wrapping a lambda around DoSomething is sort of pointless as well; just pass a delegate for it directly.
In the slow-moving corporate world where I currently spend more time than I'd wish, yield return has the enormous advantage that it doesn't need that brand new .NET 3.5 Framework that won't be installed for at least another 2 years.
Select only allows you to return one object for each item in your "items" collection.
Using an additional .Where(x => DoIReallyWantThis(x)) allows you to weed out unwanted items, but still only allows you to return one object per item.
If you want potentially more than one object per item, you can use .SelectMany but it is easy to wind up with a single long line that is less than easy to read.
"yield return" has the potential to make your code more readable if you are looking through a complex data structure and picking out bits of information here and there. The best example of this that I have seen was where there were around a dozen separate conditions which would result in a returned object, and in each case the returned object was constructed differently.
Related
I am maintaining some code at work and the original author is gone so thought I would ask here to see if I can satisfy my curiosity.
Below is a bit of code (anonymized) where yield is being used. As far as I can tell it does not add any benefit and just returning a list would be sufficient, maybe more readable as well (for me at least). Just wondering if I am missing something because this pattern is repeated in a couple of places in the code base.
public virtual IEnumerable<string> ValidProductTypes
{
get
{
yield return ProductTypes.Type1;
yield return ProductTypes.Type2;
yield return ProductTypes.Type3;
}
}
This property is used as a parameter for some class which just uses it to populate a collection:
var productManager = new ProductManager(ValidProductTypes);
public ProductManager(IEnumerable<string> validProductTypes)
{
var myFilteredList = GetFilteredTypes(validProductTypes);
}
public ObservableCollection<ValidType> GetFilteredTypes(IEnumerable<string> validProductTypes)
{
var filteredList = validProductTypes
.Where(type => TypeIsValid); //TypeIsValid returns a ValidType
return new ObservableCollection<ValidType>(filteredList);
}
I'd say that returning an IEnumerable<T> and implementing that using yield return is the simplest option.
If you see that a method returns an IEnumerable<T>, there really is only one thing you can do with it: iterate it. Any more complicated operations on it (like using LINQ) are just encapsulated specific ways of iterating it.
If a method returns an array or list, you also gain the ability to mutate it and you might start wondering if that's an acceptable use of the API. For example, what happens if you do ValidProductTypes.Add("a new product")?
If you're talking just about the implementation, then the difference becomes much smaller. But the caller would still be able to cast the returned array or list from IEnumerable<T> to its concrete type and mutate that. The chance that anyone would actually think this was the intended use of the API is small, but with yield return, the chance is zero, because it's not possible.
Considering that I'd say the syntax has roughly the same complexity and ease of understanding, I think yield return is a reasonable choice. Though with C# 6.0 expression bodied properties, the syntax for arrays might get the upper hand:
public virtual IEnumerable<string> ValidProductTypes =>
new[] { ProductTypes.Type1, ProductTypes.Type2, ProductTypes.Type3 };
The above answer is assuming that this is not performance-critical code, so fairly small differences in performance won't matter. If this is performance-critical code, then you should measure which option is better. And you might also want to consider getting rid of allocations (probably by caching the array in a field, or something like that), which might be the dominant factor.
I use the yield return keyword quite a bit, but I find it lacking when I want to add a range to the IEnumerable. Here's a quick example of what I would like to do:
IEnumerable<string> SomeRecursiveMethod()
{
// some code
// ...
yield return SomeRecursiveMethod();
}
Naturally this results in an error, which can be resolved by doing a simple loop. Is there a better way to do this? A loop feels a bit clunky.
No, there isn't I'm afraid. F# does support this with yield!, but there's no equivalent in C# - you have to use the loop, basically. Sorry... I feel your pain. I mentioned it in one of my Edulinq blog posts, where it would have made things simpler.
Note that using yield return recursively can be expensive - see Wes Dyer's post on iterators for more information (and mentioning a "yield foreach" which was under consideration four years ago...)
If you already have an IEnumerable to loop over, and the return type is IEnumerable (as is the case for functions that could use yield return), you can simply return that enumeration.
If you have cases where you need to combine results from multiple IEnumerables, you can use the IEnumerable<T>.Concat extension method.
In your recursive example, though, you need to terminate the enumeration/concatenation based on the contents of the enumeration. I don't think my method will support this.
The yield keyword is indeed very nice. But nesting it in a for loop will cause more glue code to be generated and executed.
If you can live with a less functional style of programming, you can pass a List around to which you append:
void GenerateList(List<string> result)
{
result.Add("something")
// more code.
GenerateList(result);
}
I got in a discussion with two colleagues regarding a setup for an iteration over an IEnumerable (the contents of which will not be altered in any way during the operation). There are three conflicting theories on which is the optimal approach. Both the others (and me as well) are very certain and that got me unsure, so for the sake of clarity, I want to check with an external source.
The scenario is as follows. We had the code below as a starting point and discovered that some of the hazaas need not to be acted upon. So, starting with the code below, we started to add a blocker for the action.
foreach(Hazaa hazaa in hazaas) ;
My suggestion is as follows.
foreach(Hazaa hazaa in hazaas.Where(element => condition)) ;
One of the guys wants to resolve it by a more explicit form, claiming that LINQ is not appropriate in this case (not sure why it'd be so but he seems to be very convinced). He's solution is this.
foreach(Hazaa hazaa in hazaas) ;
if(condition) ;
The other contra-suggestion is supported by the claim that Where risks to repeat the filtering process needlessly and that it's more certain to minimize the computational workload by picking the appropriate elements once for all by Select.
foreach(Hazaa hazaa in hazaas.Select(element => condition)) ;
I argue that the first is obsolete, since LINQ can handle data objects quite well.
I also believe that Select-ing is in this case equivalently fast to Where-ing and no needless steps will be taken (e.g. the evaluation of the condition on the elements will only be performed once). If anything, it should be faster using Where because we won't be creating an extra instance of anything.
Who's right?
Select is inappropriate. It doesn't filter anything.
if is a possible solution, but Where is just as explicit.
Where executes the condition exactly once per item, just as the if. Additionally, it is important to note that the call to Where doesn't iterate the list. So, using Where you iterate the list exactly once, just like when using if.
I think you are discussing with one person that didn't understand LINQ - the guy that wants to use Select - and one that doesn't like the functional aspect of LINQ.
I would go with Where.
The .Where() and the if(condition) approach will be the same.
But since LinQ is nicely readable i'd prefer that.
The approach with .Select() is nonsense, since it will not return the Hazaa-Object, but an IEnumerable<Boolean>
To be clear about the functions:
myEnumerable.Where(a => isTrueFor(a)) //This is filtering
myEnumerable.Select(a => a.b) //This is projection
Where() will run a function, which returns a Boolean foreach item of the enumerable and return this item depending on the result of the Boolean function
Select() will run a function for every item in the list and return the result of the function without doing any filtering.
Does there exist a standard pattern for yield returning all the items within an Enumerable?
More often than I like I find some of my code reflecting the following pattern:
public IEnumerable<object> YieldReturningFunction()
{
...
[logic and various standard yield return]
...
foreach(object obj in methodReturningEnumerable(x,y,z))
{
yield return obj;
}
}
The explicit usage of a foreach loop solely to return the results of an Enumerable reeks of code smell to me.
Obviously I could abandon the use of yield return increasing the complexity of my code by explicitly building an Enumerable and adding the result of each standard yield return to it as well as adding a the range of the results of the methodReturningEnumerable. This would be unfortunate, as such I was hoping there exists a better way to manage the yield return pattern.
No, there is no way around that.
It's a feature that's been requested, and it's not a bad idea (a yield foreach or equivalent exists in other languages).
At this point Microsoft simply hasn't allocated the time and money to implement it. They may or may not implement it in the future; I would guess (with no factual basis) that it's somewhere on the to do list; it's simply a question of if/when it gets high enough on that list to actually get implemented.
The only possible change that I could see would be to refactor out all of the individual yield returns from the top of the method into their own enumerable returning method, and then add a new method that returns the concatenation of that method and methodReturningEnumerable(x,y,z). Would it be better; no, probably not. The Concat would add back in just as much as you would have saved, if not more.
Can't be done. It's not that bad though. You can shorten it to a single line:
foreach (var o in otherEnumerator) yield return o;
Unrelated note: you should be careful of what logic you include in your generators; all execution is deferred until GetEnumerator() is called on the returned IEnumerable. I catch myself throwing NullArgumentExceptions incorrectly this way so often that I thought it was worth mentioning. :)
I have an architecture where we are passing our data nodes as IEnumerable<BaseNode>. It all works great, but in each subclass we want to store these as List<AnotherNode> as everything in that class creates and uses AnotherNode objects (we have about 15 different subclasses).
The one place using the more strongly typed list doesn't work is the root classes method that returns a type IEnumerable<BaseNode> and with the covariance limitations in .net 3.5, that can't be returned. (We have to stay on .net 3.5 for now.)
But if I have List<AnotherNode> data; and return data.OfType<BaseNode>(); - that works fine. So here's my question.
As all of data is of type BaseNode - what's the performance hit of this call? Because the alternative is I have to cast in places which has a small performance hit - but it's also a situation where we give up everything knowing it's type.
Two minor things:
There is a small, but measurable overhead associated with yielding each item in the enumerator. If you need to care about this because you're in a very tight inner loop, you're actually better off iterating with a for loop on the list directly. Most likely this doesn't matter.
Because the result is IEnumerable<BaseNode> and has already been filtered through a yielding enumeration function, subsequent calls to methods like Count() or ElementAt() will not take advantage of optimizations in the LINQ implementation for Lists. This is also unlikely to be a problem unless you make frequent use of these extension methods and have a very large number of elements.
Have you seen the Cast<T>() Linq operator? It should be more performant than OfType<T>().
Basically there is a condition that is run with OfType<T>()
if (item is T) {
yield return (T)item;
}
Contrast that with what Cast<T>() does:
yield return (T)item;