yield return multiple IEnumerables [duplicate] - c#

I use the yield return keyword quite a bit, but I find it lacking when I want to add a range to the IEnumerable. Here's a quick example of what I would like to do:
IEnumerable<string> SomeRecursiveMethod()
{
// some code
// ...
yield return SomeRecursiveMethod();
}
Naturally this results in an error, which can be resolved by doing a simple loop. Is there a better way to do this? A loop feels a bit clunky.

No, there isn't I'm afraid. F# does support this with yield!, but there's no equivalent in C# - you have to use the loop, basically. Sorry... I feel your pain. I mentioned it in one of my Edulinq blog posts, where it would have made things simpler.
Note that using yield return recursively can be expensive - see Wes Dyer's post on iterators for more information (and mentioning a "yield foreach" which was under consideration four years ago...)

If you already have an IEnumerable to loop over, and the return type is IEnumerable (as is the case for functions that could use yield return), you can simply return that enumeration.
If you have cases where you need to combine results from multiple IEnumerables, you can use the IEnumerable<T>.Concat extension method.
In your recursive example, though, you need to terminate the enumeration/concatenation based on the contents of the enumeration. I don't think my method will support this.

The yield keyword is indeed very nice. But nesting it in a for loop will cause more glue code to be generated and executed.
If you can live with a less functional style of programming, you can pass a List around to which you append:
void GenerateList(List<string> result)
{
result.Add("something")
// more code.
GenerateList(result);
}

Related

Any benefit of using yield in this case?

I am maintaining some code at work and the original author is gone so thought I would ask here to see if I can satisfy my curiosity.
Below is a bit of code (anonymized) where yield is being used. As far as I can tell it does not add any benefit and just returning a list would be sufficient, maybe more readable as well (for me at least). Just wondering if I am missing something because this pattern is repeated in a couple of places in the code base.
public virtual IEnumerable<string> ValidProductTypes
{
get
{
yield return ProductTypes.Type1;
yield return ProductTypes.Type2;
yield return ProductTypes.Type3;
}
}
This property is used as a parameter for some class which just uses it to populate a collection:
var productManager = new ProductManager(ValidProductTypes);
public ProductManager(IEnumerable<string> validProductTypes)
{
var myFilteredList = GetFilteredTypes(validProductTypes);
}
public ObservableCollection<ValidType> GetFilteredTypes(IEnumerable<string> validProductTypes)
{
var filteredList = validProductTypes
.Where(type => TypeIsValid); //TypeIsValid returns a ValidType
return new ObservableCollection<ValidType>(filteredList);
}
I'd say that returning an IEnumerable<T> and implementing that using yield return is the simplest option.
If you see that a method returns an IEnumerable<T>, there really is only one thing you can do with it: iterate it. Any more complicated operations on it (like using LINQ) are just encapsulated specific ways of iterating it.
If a method returns an array or list, you also gain the ability to mutate it and you might start wondering if that's an acceptable use of the API. For example, what happens if you do ValidProductTypes.Add("a new product")?
If you're talking just about the implementation, then the difference becomes much smaller. But the caller would still be able to cast the returned array or list from IEnumerable<T> to its concrete type and mutate that. The chance that anyone would actually think this was the intended use of the API is small, but with yield return, the chance is zero, because it's not possible.
Considering that I'd say the syntax has roughly the same complexity and ease of understanding, I think yield return is a reasonable choice. Though with C# 6.0 expression bodied properties, the syntax for arrays might get the upper hand:
public virtual IEnumerable<string> ValidProductTypes =>
new[] { ProductTypes.Type1, ProductTypes.Type2, ProductTypes.Type3 };
The above answer is assuming that this is not performance-critical code, so fairly small differences in performance won't matter. If this is performance-critical code, then you should measure which option is better. And you might also want to consider getting rid of allocations (probably by caching the array in a field, or something like that), which might be the dominant factor.

What is the proper pattern for handling Enumerable objects with a yield return?

Does there exist a standard pattern for yield returning all the items within an Enumerable?
More often than I like I find some of my code reflecting the following pattern:
public IEnumerable<object> YieldReturningFunction()
{
...
[logic and various standard yield return]
...
foreach(object obj in methodReturningEnumerable(x,y,z))
{
yield return obj;
}
}
The explicit usage of a foreach loop solely to return the results of an Enumerable reeks of code smell to me.
Obviously I could abandon the use of yield return increasing the complexity of my code by explicitly building an Enumerable and adding the result of each standard yield return to it as well as adding a the range of the results of the methodReturningEnumerable. This would be unfortunate, as such I was hoping there exists a better way to manage the yield return pattern.
No, there is no way around that.
It's a feature that's been requested, and it's not a bad idea (a yield foreach or equivalent exists in other languages).
At this point Microsoft simply hasn't allocated the time and money to implement it. They may or may not implement it in the future; I would guess (with no factual basis) that it's somewhere on the to do list; it's simply a question of if/when it gets high enough on that list to actually get implemented.
The only possible change that I could see would be to refactor out all of the individual yield returns from the top of the method into their own enumerable returning method, and then add a new method that returns the concatenation of that method and methodReturningEnumerable(x,y,z). Would it be better; no, probably not. The Concat would add back in just as much as you would have saved, if not more.
Can't be done. It's not that bad though. You can shorten it to a single line:
foreach (var o in otherEnumerator) yield return o;
Unrelated note: you should be careful of what logic you include in your generators; all execution is deferred until GetEnumerator() is called on the returned IEnumerable. I catch myself throwing NullArgumentExceptions incorrectly this way so often that I thought it was worth mentioning. :)

What's the performance hit of List.OfType<> where the entire list is that type?

I have an architecture where we are passing our data nodes as IEnumerable<BaseNode>. It all works great, but in each subclass we want to store these as List<AnotherNode> as everything in that class creates and uses AnotherNode objects (we have about 15 different subclasses).
The one place using the more strongly typed list doesn't work is the root classes method that returns a type IEnumerable<BaseNode> and with the covariance limitations in .net 3.5, that can't be returned. (We have to stay on .net 3.5 for now.)
But if I have List<AnotherNode> data; and return data.OfType<BaseNode>(); - that works fine. So here's my question.
As all of data is of type BaseNode - what's the performance hit of this call? Because the alternative is I have to cast in places which has a small performance hit - but it's also a situation where we give up everything knowing it's type.
Two minor things:
There is a small, but measurable overhead associated with yielding each item in the enumerator. If you need to care about this because you're in a very tight inner loop, you're actually better off iterating with a for loop on the list directly. Most likely this doesn't matter.
Because the result is IEnumerable<BaseNode> and has already been filtered through a yielding enumeration function, subsequent calls to methods like Count() or ElementAt() will not take advantage of optimizations in the LINQ implementation for Lists. This is also unlikely to be a problem unless you make frequent use of these extension methods and have a very large number of elements.
Have you seen the Cast<T>() Linq operator? It should be more performant than OfType<T>().
Basically there is a condition that is run with OfType<T>()
if (item is T) {
yield return (T)item;
}
Contrast that with what Cast<T>() does:
yield return (T)item;

yield return versus return select

Which are the advantages/drawbacks of both approaches?
return items.Select(item => DoSomething(item));
versus
foreach(var item in items)
{
yield return DoSomething(item);
}
EDIT As they are MSIL roughly equivalent, which one you find more readable?
The yield return technique causes the C# compiler to generate an enumerator class "behind the scenes", while the Select call uses a standard enumerator class parameterized with a delegate. In practice, there shouldn't be a whole lot of difference between the two, other than possibly an extra call frame in the Select case, for the delegate.
For what it's worth, wrapping a lambda around DoSomething is sort of pointless as well; just pass a delegate for it directly.
In the slow-moving corporate world where I currently spend more time than I'd wish, yield return has the enormous advantage that it doesn't need that brand new .NET 3.5 Framework that won't be installed for at least another 2 years.
Select only allows you to return one object for each item in your "items" collection.
Using an additional .Where(x => DoIReallyWantThis(x)) allows you to weed out unwanted items, but still only allows you to return one object per item.
If you want potentially more than one object per item, you can use .SelectMany but it is easy to wind up with a single long line that is less than easy to read.
"yield return" has the potential to make your code more readable if you are looking through a complex data structure and picking out bits of information here and there. The best example of this that I have seen was where there were around a dozen separate conditions which would result in a returned object, and in each case the returned object was constructed differently.

Generic list FindAll() vs. foreach

I'm looking through a generic list to find items based on a certain parameter.
In General, what would be the best and fastest implementation?
1. Looping through each item in the list and saving each match to a new list and returning that
foreach(string s in list)
{
if(s == "match")
{
newList.Add(s);
}
}
return newList;
Or
2. Using the FindAll method and passing it a delegate.
newList = list.FindAll(delegate(string s){return s == "match";});
Don't they both run in ~ O(N)? What would be the best practice here?
Regards,
Jonathan
You should definitely use the FindAll method, or the equivalent LINQ method. Also, consider using the more concise lambda instead of your delegate if you can (requires C# 3.0):
var list = new List<string>();
var newList = list.FindAll(s => s.Equals("match"));
I would use the FindAll method in this case, as it is more concise, and IMO, has easier readability.
You are right that they are pretty much going to both perform in O(N) time, although the foreach statement should be slightly faster given it doesn't have to perform a delegate invocation (delegates incur a slight overhead as opposed to directly calling methods).
I have to stress how insignificant this difference is, it's more than likely never going to make a difference unless you are doing a massive number of operations on a massive list.
As always, test to see where the bottlenecks are and act appropriately.
Jonathan,
A good answer you can find to this is in chapter 5 (performance considerations) of Linq To Action.
They measure a for each search that executes about 50 times and that comes up with foreach = 68ms per cycle / List.FindAll = 62ms per cycle. Really, it would probably be in your interest to just create a test and see for yourself.
List.FindAll is O(n) and will search the entire list.
If you want to run your own iterator with foreach, I'd recommend using the yield statement, and returning an IEnumerable if possible. This way, if you end up only needing one element of your collection, it will be quicker (since you can stop your caller without exhausting the entire collection).
Otherwise, stick to the BCL interface.
Any perf difference is going to be extremely minor. I would suggest FindAll for clarity, or, if possible, Enumerable.Where. I prefer using the Enumerable methods because it allows for greater flexibility in refactoring the code (you don't take a dependency on List<T>).
Yes, they both implementations are O(n). They need to look at every element in the list to find all matches. In terms of readability I would also prefer FindAll. For performance considerations have a look at LINQ in Action (Ch 5.3). If you are using C# 3.0 you could also apply a lambda expression. But that's just the icing on the cake:
var newList = aList.FindAll(s => s == "match");
Im with the Lambdas
List<String> newList = list.FindAll(s => s.Equals("match"));
Unless the C# team has improved the performance for LINQ and FindAll, the following article seems to suggest that for and foreach would outperform LINQ and FindAll on object enumeration: LINQ on Objects Performance.
This artilce was dated back to March 2009, just before this post originally asked.

Categories

Resources