I am maintaining some code at work and the original author is gone so thought I would ask here to see if I can satisfy my curiosity.
Below is a bit of code (anonymized) where yield is being used. As far as I can tell it does not add any benefit and just returning a list would be sufficient, maybe more readable as well (for me at least). Just wondering if I am missing something because this pattern is repeated in a couple of places in the code base.
public virtual IEnumerable<string> ValidProductTypes
{
get
{
yield return ProductTypes.Type1;
yield return ProductTypes.Type2;
yield return ProductTypes.Type3;
}
}
This property is used as a parameter for some class which just uses it to populate a collection:
var productManager = new ProductManager(ValidProductTypes);
public ProductManager(IEnumerable<string> validProductTypes)
{
var myFilteredList = GetFilteredTypes(validProductTypes);
}
public ObservableCollection<ValidType> GetFilteredTypes(IEnumerable<string> validProductTypes)
{
var filteredList = validProductTypes
.Where(type => TypeIsValid); //TypeIsValid returns a ValidType
return new ObservableCollection<ValidType>(filteredList);
}
I'd say that returning an IEnumerable<T> and implementing that using yield return is the simplest option.
If you see that a method returns an IEnumerable<T>, there really is only one thing you can do with it: iterate it. Any more complicated operations on it (like using LINQ) are just encapsulated specific ways of iterating it.
If a method returns an array or list, you also gain the ability to mutate it and you might start wondering if that's an acceptable use of the API. For example, what happens if you do ValidProductTypes.Add("a new product")?
If you're talking just about the implementation, then the difference becomes much smaller. But the caller would still be able to cast the returned array or list from IEnumerable<T> to its concrete type and mutate that. The chance that anyone would actually think this was the intended use of the API is small, but with yield return, the chance is zero, because it's not possible.
Considering that I'd say the syntax has roughly the same complexity and ease of understanding, I think yield return is a reasonable choice. Though with C# 6.0 expression bodied properties, the syntax for arrays might get the upper hand:
public virtual IEnumerable<string> ValidProductTypes =>
new[] { ProductTypes.Type1, ProductTypes.Type2, ProductTypes.Type3 };
The above answer is assuming that this is not performance-critical code, so fairly small differences in performance won't matter. If this is performance-critical code, then you should measure which option is better. And you might also want to consider getting rid of allocations (probably by caching the array in a field, or something like that), which might be the dominant factor.
Related
I use the yield return keyword quite a bit, but I find it lacking when I want to add a range to the IEnumerable. Here's a quick example of what I would like to do:
IEnumerable<string> SomeRecursiveMethod()
{
// some code
// ...
yield return SomeRecursiveMethod();
}
Naturally this results in an error, which can be resolved by doing a simple loop. Is there a better way to do this? A loop feels a bit clunky.
No, there isn't I'm afraid. F# does support this with yield!, but there's no equivalent in C# - you have to use the loop, basically. Sorry... I feel your pain. I mentioned it in one of my Edulinq blog posts, where it would have made things simpler.
Note that using yield return recursively can be expensive - see Wes Dyer's post on iterators for more information (and mentioning a "yield foreach" which was under consideration four years ago...)
If you already have an IEnumerable to loop over, and the return type is IEnumerable (as is the case for functions that could use yield return), you can simply return that enumeration.
If you have cases where you need to combine results from multiple IEnumerables, you can use the IEnumerable<T>.Concat extension method.
In your recursive example, though, you need to terminate the enumeration/concatenation based on the contents of the enumeration. I don't think my method will support this.
The yield keyword is indeed very nice. But nesting it in a for loop will cause more glue code to be generated and executed.
If you can live with a less functional style of programming, you can pass a List around to which you append:
void GenerateList(List<string> result)
{
result.Add("something")
// more code.
GenerateList(result);
}
Does there exist a standard pattern for yield returning all the items within an Enumerable?
More often than I like I find some of my code reflecting the following pattern:
public IEnumerable<object> YieldReturningFunction()
{
...
[logic and various standard yield return]
...
foreach(object obj in methodReturningEnumerable(x,y,z))
{
yield return obj;
}
}
The explicit usage of a foreach loop solely to return the results of an Enumerable reeks of code smell to me.
Obviously I could abandon the use of yield return increasing the complexity of my code by explicitly building an Enumerable and adding the result of each standard yield return to it as well as adding a the range of the results of the methodReturningEnumerable. This would be unfortunate, as such I was hoping there exists a better way to manage the yield return pattern.
No, there is no way around that.
It's a feature that's been requested, and it's not a bad idea (a yield foreach or equivalent exists in other languages).
At this point Microsoft simply hasn't allocated the time and money to implement it. They may or may not implement it in the future; I would guess (with no factual basis) that it's somewhere on the to do list; it's simply a question of if/when it gets high enough on that list to actually get implemented.
The only possible change that I could see would be to refactor out all of the individual yield returns from the top of the method into their own enumerable returning method, and then add a new method that returns the concatenation of that method and methodReturningEnumerable(x,y,z). Would it be better; no, probably not. The Concat would add back in just as much as you would have saved, if not more.
Can't be done. It's not that bad though. You can shorten it to a single line:
foreach (var o in otherEnumerator) yield return o;
Unrelated note: you should be careful of what logic you include in your generators; all execution is deferred until GetEnumerator() is called on the returned IEnumerable. I catch myself throwing NullArgumentExceptions incorrectly this way so often that I thought it was worth mentioning. :)
I have a List/IEnumerable of objects and I'd like to perform a calculation on some of them.
e.g.
myList.Where(f=>f.Calculate==true).Calculate();
to update myList, based on the Where clause, so that the required calulcation is performed and the entire list updated as appropriate.
The list contains "lines" where an amount is either in Month1, Month2, Month3...Month12, Year1, Year2, Year3-5 or "Long Term"
Most lines are fixed and always fall into one of these months, but some "lines" are calulcated based upon their "Maturity Date".
Oh, and just to complicate things! the list (at the moment) is of an anonymous type from a couple of linq queries. I could make it a concrete class if required though, but I'd prefer not to if I can avoid it.
So, I'd like to call a method that works on only the calculated lines, and puts the correct amount into the correct "month".
I'm not worried about the calculation logic, but rather how to get this into an easily readable method that updates the list without, ideally, returning a new list.
[Is it possible to write a lambda extension method to do both the calculation AND the where - or is this overkill anyway as Where() already exists?]
Personally, if you want to update the list in place, I would just use a simple loop. It will be much simpler to follow and maintain:
for (int i=0;i<list.Count;++i)
{
if (list[i].ShouldCalculate)
list[i] = list[i].Calculate();
}
This, at least, is much more obvious that it's going to update. LINQ has the expectation of performing a query, not mutating the data.
If you really want to use LINQ for this, you can - but it will still require a copy if you want to have a List<T> as your results:
myList = myList.Select(f => f.ShouldCalculate ? f.Calculate() : f).ToList();
This would call your Calculate() method as needed, and copy the original when not needed. It does require a copy to create a new List<T>, though, as you mentioned that was a requirement (in comments).
However, my personal preference would still be to use a loop in this case. I find the intent much more clear - plus, you avoid the unnecessary copy operation.
Edit #2:
Given this comment:
Oh, and just to complicate things! the list (at the moment) is of an anonymous type from a couple of linq queries
If you really want to use LINQ style syntax, I would recommend just not calling ToList() on your original queries. If you leave them in their original, IEnumerable<T> form, you can easily do my second option above, but on the original query:
var myList = query.Select(f => f.ShouldCalculate ? f.Calculate() : f).ToList();
This has the advantage of only constructing the list one time, and preventing the copy, as the original sequence will not get evaluated until this operation.
LINQ is mostly geared around side-effect-free queries, and anonymous types themselves are immutable (although of course they can maintain references to mutable types).
Given that you want to mutate the list in place, LINQ isn't a great fit.
As per Reed's suggestion, I would use a straight for loop. However, if you want to perform different calculations at different points, you could encapsulate this:
public static void Recalculate<T>(IList<T> list,
Func<T, bool> shouldCalculate,
Func<T, T> calculation)
{
for (int i = 0; i < list.Count; i++)
{
if (shouldCalculate(items[i]))
{
items[i] = calculation(items[i]);
}
}
}
If you really want to use this in a fluid way, you could make it return the list - but I would personally be against that, as it would then look like it was side-effect-free like LINQ.
And like Reed, I'd also prefer to do this by creating a new sequence...
Select doesn't copy or clone the objects it passes to the passed delegate, any state changes to that object will be reflected through the reference in the container (unless it is a value type).
So updating reference types is not a problem.
To replace the objects (or when working with value types1) this are more complex and there is no inbuilt solution with LINQ. A for loop is clearest (as with the other answers).
1 Remembering, of course, that mutable value types are evil.
Which are the advantages/drawbacks of both approaches?
return items.Select(item => DoSomething(item));
versus
foreach(var item in items)
{
yield return DoSomething(item);
}
EDIT As they are MSIL roughly equivalent, which one you find more readable?
The yield return technique causes the C# compiler to generate an enumerator class "behind the scenes", while the Select call uses a standard enumerator class parameterized with a delegate. In practice, there shouldn't be a whole lot of difference between the two, other than possibly an extra call frame in the Select case, for the delegate.
For what it's worth, wrapping a lambda around DoSomething is sort of pointless as well; just pass a delegate for it directly.
In the slow-moving corporate world where I currently spend more time than I'd wish, yield return has the enormous advantage that it doesn't need that brand new .NET 3.5 Framework that won't be installed for at least another 2 years.
Select only allows you to return one object for each item in your "items" collection.
Using an additional .Where(x => DoIReallyWantThis(x)) allows you to weed out unwanted items, but still only allows you to return one object per item.
If you want potentially more than one object per item, you can use .SelectMany but it is easy to wind up with a single long line that is less than easy to read.
"yield return" has the potential to make your code more readable if you are looking through a complex data structure and picking out bits of information here and there. The best example of this that I have seen was where there were around a dozen separate conditions which would result in a returned object, and in each case the returned object was constructed differently.
I'm working with a code base where lists need to be frequently searched for a single element.
Is it faster to use a Predicate and Find() than to manually do an enumeration on the List?
for example:
string needle = "example";
FooObj result = _list.Find(delegate(FooObj foo) {
return foo.Name == needle;
});
vs.
string needle = "example";
foreach (FooObj foo in _list)
{
if (foo.Name == needle)
return foo;
}
While they are equivalent in functionality, are they equivalent in performance as well?
They are not equivalent in performance. The Find() method requires a method (in this case delegate) invocation for every item in the list. Method invocation is not free and is relatively expensive as compared to an inline comparison. The foreach version requires no extra method invocation per object.
That being said, I wouldn't pick one or the other based on performance until I actually profiled my code and found this to be a problem. I haven't yet found the overhead of this scenario to every be a "hot path" problem for code I've written and I use this pattern a lot with Find and other similar methods.
If searching your list is too slow as-is, you can probably do better than a linear search. If you can keep the list sorted, you can use a binary search to find the element in O(lg n) time.
If you're searching a whole lot, consider replacing that list with a Dictionary to index your objects by name.
Technically, the runtime performance of the delegate version will be slightly worse than the other version - but in most cases you'd be hard pressed to perceive any difference.
Of more importance (IHMO) is the code time performance of being able to write what you want, rather than how you want it. This makes a big difference in maintainability.
This original code:
string needle = "example";
foreach (FooObj foo in _list)
{
if (foo.Name == needle)
return foo;
}
requires any maintainer to read the code and understand that you're looking for a particular item.
This code
string needle = "example";
return _list.Find(
delegate(FooObj foo)
{
return foo.Name == needle;
});
makes it clear that you're looking for a particular item - quicker to understand.
Finally, this code, using features from C# 3.0:
string needle = "example";
return _list.Find( foo => foo.Name == needle);
does exactly the same thing, but in one line that's even faster to read and understand (well, once you understand lambda expressions, anyway).
In summary, given that the performance of the alternatives is nearly equal, choose the one that makes the code easier to read and maintain.
"I'm working with a code base where lists need to be frequently searched for a single element"
It is better to change your data structure to be Dictionary instead of List to get better performance
Similar question was asked for List.ForEach vs. foreach-iteration (foreach vs someList.Foreach(){}).
In that case List.ForEach was a bit faster.
As Jared pointed out, there are differences.
But, as always, don't worry unless you know it's a bottleneck. And if it is a bottleneck, that's probably because the lists are big, in which case you should consider using a faster find - a hash table or binary tree, or even just sorting the list and doing binary search will give you log(n) performance which will have far more impact than tweaking your linear case.