How does using yield save time or memory? [closed] - c#

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I'm new to C#, have not seen the equivalent of yield in previous languages I've tried to learn, and am not convinced that it is helpful except perhaps for readability. I survived all these years without it, so why do I need it?
As I undersand, you can use yield return to spit out values of type T one-by-one rather than collecting those values into an IEnumerable<T> and spitting that whole collection out at the end. What's the point? After all, I'm sure there is some overhead involved in interrupting the execution of the function to copy out a single value. Perhaps I'll run some performance tests to see if it's more efficient in terms of time. More than that, I'm wondering if you can show me a specific situation where I would need to iterate through a set of values collected by a function and can only do it with yield or would be better off doing it with yield.

As a marquee example of iterator usage, consider a number series iterator:
IEnumerable<int> fibo() {
int cur = 0, next = 1;
while(true) {
yield return cur;
next += cur;
cur = next - cur;
}
}
Now we can choose what to do with the series, and only the required elements are calculated:
var fibs = fibo();
var sumOfFirst10Fibs = fibs.Take(10).Sum();
Another useful pattern is flattening a complex data structure, like a tree1:
public class Tree<T> {
public Tree<T> Left, Right;
public T value;
public IEnumerable<T> InOrder() {
if(Left != null) {
foreach(T val in Left.InOrder())
yield return val;
}
yield return value;
if(Right != null) {
foreach(T val in Right.InOrder())
yield return val;
}
}
}
}
1 As noted by Alexey in the comments, the in-order traversal is inefficient (particularly when tall trees are traversed).

The idea is to generate the values on the fly. Your collection of values might be infinite or the cost of generating each value might be high. When you foreach through an IEnumerable, you are actually calling methods on IEnumerator, which can be implemented in any way you like. A function that uses yield is automatically reimplemented as an IEnumerator that generates values only when they are requested. When you want to generate values on the fly as well, you also have to code an implementation of IEnumerator just like the one a yielding function is replaced with.
Some specific situations where using a generator might be preferable to creating and returning a collection:
searching a very large file line-by-line. You don't want to load several gigabytes of text into memory, so it makes sense to read one line and yield return it. You can write a loop, of course, but by extracting the logic into a generator you can easily replace the file with a database table or a file in a different format, for example
walking a tree. You can use a visitor to walk a tree, or you can use a generator to generate a sequence of nodes in the right order, two approaches are inversions of one another. NB: recursive generators are a bad idea in C#!
generating infinite data for testing purposes where each successive element uses previous elements to generate itself ("On the 1298456th day of Christmas my true love sent me..." is a trivial example, you don't need to store 1298455 days worth of presents, just the list of previous presents and the current day)
Basically, in every case where you do not have to worry about handling IEnumerable as ICollection, i.e. you treat is as a stream of values, not as a finite bag of values with a Count, you might save time or memory by using a generator.

yield can be useful in a scenario where the Collection you want to return is not yet ready. i.e you are building up the list while iterating. By using yield-return, you really only need to have the next item before returning.
Another case where yield-return is preferable is if the IEnumerable represents an infinite set. Consider the list of Prime Numbers, or an infinite list of random numbers. You can never return the full IEnumerable at once, so you use yield-return to return the list incrementally.

The MSDN covers a lot to it:
When you use the yield keyword in a statement, you indicate that the
method, operator, or get accessor in which it appears is an iterator.
Using yield to define an iterator removes the need for an explicit
extra class (the class that holds the state for an enumeration, see
IEnumerator(Of T) for an example) when you implement the IEnumerable
and IEnumerator pattern for a custom collection type.
Technical Implementation
The following code returns an IEnumerable<string> from an iterator
method and then iterates through its elements.
IEnumerable<string> elements = MyIteratorMethod();
foreach (string element in elements)
{
…
}
The call to MyIteratorMethod doesn't execute the body of the method.
Instead the call returns an IEnumerable<string> into the elements
variable.
On an iteration of the foreach loop, the MoveNext method is
called for elements. This call executes the body of MyIteratorMethod
until the next yield return statement is reached. The expression
returned by the yield return statement determines not only the value
of the element variable for consumption by the loop body but also the
Current property of elements, which is an IEnumerable<string>.
On each
subsequent iteration of the foreach loop, the execution of the
iterator body continues from where it left off, again stopping when it
reaches a yield return statement. The foreach loop completes when the
end of the iterator method or a yield break statement is reached.

Related

What's the difference (if any) in the following snippets in terms of functionality/performance/efficiency? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Assuming numToGenerate, min, and max are the same for both snippets and GetNextRandom is a method that uses an instance of System.Random to generate a random integer by simply returning the value of instance.Next(min, max).
First snippet using yield:
var list = new List<int>();
while(list.Count < numToGenerate)
{
var next = GetNextRandom(min, max);
if (!list.Contains(next))
{
list.Add(next);
yield return next;
}
}
Second snippet using normal return:
var list = new List<int>();
while(list.Count < numToGenerate)
{
var next = GetNextRandom(min, max);
if (!list.Contains(next))
{
list.Add(next);
}
}
return list;
Let's pretend these snippets are part of a method that returns IEnumerable<int>. What are the major differences of the two? Which should I be using and why? I'm trying to understand the functional difference if any.
It depends. Are you going to consume all of the values requested? If not, the first one has some advantages. For example if you call .Take(1) on it you have only been through the loop once and have only stored one value in the list.
If GetNextRandom was a very slow process and you wanted to return values to the UI as they are generated then, again, the first one has an advantage there.
BUT if you are planning on consuming all of it, of if the caller is just going to call .ToList on it to avoid enumerating it twice, then the second one is probably better, and you can adjust your return type to IList so that callers can know that they can go directly to any element and can Count the list without enumerating it again. (See also Optimize LINQ for IList)
As far as garbage collection goes, in the first one the list will be available for garbage collection after the method is complete. In the second case the caller gets the whole list and could hold it for longer.
PS Use HashSet<T> if n is large rather than inventing your own set on top of List

Basic array Any() vs Length

I have a simple array of objects:
Contact[] contacts = _contactService.GetAllContacts();
I want to test if that method returns any contacts. I really like the LINQ syntax for Any() as it highlights what I am trying to achieve:
if(!contacts.Any()) return;
However, is this slower than just testing the length of the array?
if(contacts.Length == 0) return;
Is there any way I can find out what kind of operation Any() performs in this instance without having to go to here to ask? Something like a Profiler, but for in-memory collections?
There are two Any() methods:
1. An extension method for IEnumerable<T>
2. An extension method for IQueryable<T>
I'm guessing that you're using the extension method for IEnumerable<T>. That one looks like this:
public static bool Any<T>(this IEnumerable<T> enumerable)
{
foreach (var item in enumerable)
{
return true;
}
return false;
}
Basically, using Length == 0 is faster because it doesn't involve creating an iterator for the array.
If you want to check out code that isn't yours (that is, code that has already been compiled), like Any<T>, you can use some kind of disassembler. Jetbrains has one for free - http://www.jetbrains.com/decompiler/
I have to completely disagree with the other answers. It certainly does not iterate over the array. It will be marginally slower, as it needs to create an array iterator object and call MoveNext() once, but that cost should be negligible in most scenarios; if Any() makes the code more readable to you, feel free to use it.
Source: Decompiled Enumerable.Any<TSource> code.
If you have a array the Length is in a property of the array. When calling Any you are iterate the array to find the first element. Setting up the enumerator is probably more expensive then just reading the Length property.
In your very case Length is slightly better:
// Just private field test if it's zero or not
if (contacts.Length == 0)
return;
// Linq overhead could be added: e.g. a for loop
// for (int i = 0; i < contains.Length; ++i)
// return true;
// plus inevitable private field test (i < contains.Length)
if (!contacts.Any())
return;
But the difference seems being negligible.
In general case, however, Any is better, because it stops on the first item found
// Itterates until 1st item is found
if (contains.Any(x => MyCondition(x)))
return;
// Itterates the entire collection
if (contains.Select(x => MyCondition(x)).Count() > 0)
return;
Yes, it is slower because it iterate over the elements.Using Length property is better. But still I don't think there is a significant difference because Any returns true as soon as it finds an item.

Graph. Adjacencies foreach iteration

I am designing a Graph class (both list and matrix implementations). I have to provide method such as GetAdjacencies(int vertex).
In the first moment I thought of returning IEnumerable and that way I will be able to iterate through the result using foreach statement. But in the next moment, I realized it is horrible solution because new list must be created each time GetAdjacencies(int vertex) occurs.
Next I thought of returning Enumerator, but that way I can't iterate through the result using foreach statement. However efficiency is much better (I will implement many graph algorithms and I am really interested in optimization).
Could you tell me what is the right way to do this in C#?
You don't have to create new list each time the method is called. Use iterator blocks for instance:
public IEnumerable<int> GetAdjacencies(int vertex)
{
foreach (int i in adj[vertex])
yield return i;
}
I don't know how exactly the graph is represented in your code, so the details of getting adjacencies list may vary.

What is the proper pattern for handling Enumerable objects with a yield return?

Does there exist a standard pattern for yield returning all the items within an Enumerable?
More often than I like I find some of my code reflecting the following pattern:
public IEnumerable<object> YieldReturningFunction()
{
...
[logic and various standard yield return]
...
foreach(object obj in methodReturningEnumerable(x,y,z))
{
yield return obj;
}
}
The explicit usage of a foreach loop solely to return the results of an Enumerable reeks of code smell to me.
Obviously I could abandon the use of yield return increasing the complexity of my code by explicitly building an Enumerable and adding the result of each standard yield return to it as well as adding a the range of the results of the methodReturningEnumerable. This would be unfortunate, as such I was hoping there exists a better way to manage the yield return pattern.
No, there is no way around that.
It's a feature that's been requested, and it's not a bad idea (a yield foreach or equivalent exists in other languages).
At this point Microsoft simply hasn't allocated the time and money to implement it. They may or may not implement it in the future; I would guess (with no factual basis) that it's somewhere on the to do list; it's simply a question of if/when it gets high enough on that list to actually get implemented.
The only possible change that I could see would be to refactor out all of the individual yield returns from the top of the method into their own enumerable returning method, and then add a new method that returns the concatenation of that method and methodReturningEnumerable(x,y,z). Would it be better; no, probably not. The Concat would add back in just as much as you would have saved, if not more.
Can't be done. It's not that bad though. You can shorten it to a single line:
foreach (var o in otherEnumerator) yield return o;
Unrelated note: you should be careful of what logic you include in your generators; all execution is deferred until GetEnumerator() is called on the returned IEnumerable. I catch myself throwing NullArgumentExceptions incorrectly this way so often that I thought it was worth mentioning. :)

foreach vs someList.ForEach(){}

There are apparently many ways to iterate over a collection. Curious if there are any differences, or why you'd use one way over the other.
First type:
List<string> someList = <some way to init>
foreach(string s in someList) {
<process the string>
}
Other Way:
List<string> someList = <some way to init>
someList.ForEach(delegate(string s) {
<process the string>
});
I suppose off the top of my head, that instead of the anonymous delegate I use above, you'd have a reusable delegate you could specify...
There is one important, and useful, distinction between the two.
Because .ForEach uses a for loop to iterate the collection, this is valid (edit: prior to .net 4.5 - the implementation changed and they both throw):
someList.ForEach(x => { if(x.RemoveMe) someList.Remove(x); });
whereas foreach uses an enumerator, so this is not valid:
foreach(var item in someList)
if(item.RemoveMe) someList.Remove(item);
tl;dr: Do NOT copypaste this code into your application!
These examples aren't best practice, they are just to demonstrate the differences between ForEach() and foreach.
Removing items from a list within a for loop can have side effects. The most common one is described in the comments to this question.
Generally, if you are looking to remove multiple items from a list, you would want to separate the determination of which items to remove from the actual removal. It doesn't keep your code compact, but it guarantees that you do not miss any items.
We had some code here (in VS2005 and C#2.0) where the previous engineers went out of their way to use list.ForEach( delegate(item) { foo;}); instead of foreach(item in list) {foo; }; for all the code that they wrote. e.g. a block of code for reading rows from a dataReader.
I still don't know exactly why they did this.
The drawbacks of list.ForEach() are:
It is more verbose in C# 2.0. However, in C# 3 onwards, you can use the "=>" syntax to make some nicely terse expressions.
It is less familiar. People who have to maintain this code will wonder why you did it that way. It took me awhile to decide that there wasn't any reason, except maybe to make the writer seem clever (the quality of the rest of the code undermined that). It was also less readable, with the "})" at the end of the delegate code block.
See also Bill Wagner's book "Effective C#: 50 Specific Ways to Improve Your C#" where he talks about why foreach is preferred to other loops like for or while loops - the main point is that you are letting the compiler decide the best way to construct the loop. If a future version of the compiler manages to use a faster technique, then you will get this for free by using foreach and rebuilding, rather than changing your code.
a foreach(item in list) construct allows you to use break or continue if you need to exit the iteration or the loop. But you cannot alter the list inside a foreach loop.
I'm surprised to see that list.ForEach is slightly faster. But that's probably not a valid reason to use it throughout , that would be premature optimisation. If your application uses a database or web service that, not loop control, is almost always going to be be where the time goes. And have you benchmarked it against a for loop too? The list.ForEach could be faster due to using that internally and a for loop without the wrapper would be even faster.
I disagree that the list.ForEach(delegate) version is "more functional" in any significant way. It does pass a function to a function, but there's no big difference in the outcome or program organisation.
I don't think that foreach(item in list) "says exactly how you want it done" - a for(int 1 = 0; i < count; i++) loop does that, a foreach loop leaves the choice of control up to the compiler.
My feeling is, on a new project, to use foreach(item in list) for most loops in order to adhere to the common usage and for readability, and use list.Foreach() only for short blocks, when you can do something more elegantly or compactly with the C# 3 "=>" operator. In cases like that, there may already be a LINQ extension method that is more specific than ForEach(). See if Where(), Select(), Any(), All(), Max() or one of the many other LINQ methods doesn't already do what you want from the loop.
As they say, the devil is in the details...
The biggest difference between the two methods of collection enumeration is that foreach carries state, whereas ForEach(x => { }) does not.
But lets dig a little deeper, because there are some things you should be aware of that can influence your decision, and there are some caveats you should be aware of when coding for either case.
Lets use List<T> in our little experiment to observe behavior. For this experiment, I am using .NET 4.7.2:
var names = new List<string>
{
"Henry",
"Shirley",
"Ann",
"Peter",
"Nancy"
};
Lets iterate over this with foreach first:
foreach (var name in names)
{
Console.WriteLine(name);
}
We could expand this into:
using (var enumerator = names.GetEnumerator())
{
}
With the enumerator in hand, looking under the covers we get:
public List<T>.Enumerator GetEnumerator()
{
return new List<T>.Enumerator(this);
}
internal Enumerator(List<T> list)
{
this.list = list;
this.index = 0;
this.version = list._version;
this.current = default (T);
}
public bool MoveNext()
{
List<T> list = this.list;
if (this.version != list._version || (uint) this.index >= (uint) list._size)
return this.MoveNextRare();
this.current = list._items[this.index];
++this.index;
return true;
}
object IEnumerator.Current
{
{
if (this.index == 0 || this.index == this.list._size + 1)
ThrowHelper.ThrowInvalidOperationException(ExceptionResource.InvalidOperation_EnumOpCantHappen);
return (object) this.Current;
}
}
Two things become immediate evident:
We are returned a stateful object with intimate knowledge of the underlying collection.
The copy of the collection is a shallow copy.
This is of course in no way thread safe. As was pointed out above, changing the collection while iterating is just bad mojo.
But what about the problem of the collection becoming invalid during iteration by means outside of us mucking with the collection during iteration? Best practices suggests versioning the collection during operations and iteration, and checking versions to detect when the underlying collection changes.
Here's where things get really murky. According to the Microsoft documentation:
If changes are made to the collection, such as adding, modifying, or
deleting elements, the behavior of the enumerator is undefined.
Well, what does that mean? By way of example, just because List<T> implements exception handling does not mean that all collections that implement IList<T> will do the same. That seems to be a clear violation of the Liskov Substitution Principle:
Objects of a superclass shall be replaceable with objects of its
subclasses without breaking the application.
Another problem is that the enumerator must implement IDisposable -- that means another source of potential memory leaks, not only if the caller gets it wrong, but if the author does not implement the Dispose pattern correctly.
Lastly, we have a lifetime issue... what happens if the iterator is valid, but the underlying collection is gone? We now a snapshot of what was... when you separate the lifetime of a collection and its iterators, you are asking for trouble.
Lets now examine ForEach(x => { }):
names.ForEach(name =>
{
});
This expands to:
public void ForEach(Action<T> action)
{
if (action == null)
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.match);
int version = this._version;
for (int index = 0; index < this._size && (version == this._version || !BinaryCompatibility.TargetsAtLeast_Desktop_V4_5); ++index)
action(this._items[index]);
if (version == this._version || !BinaryCompatibility.TargetsAtLeast_Desktop_V4_5)
return;
ThrowHelper.ThrowInvalidOperationException(ExceptionResource.InvalidOperation_EnumFailedVersion);
}
Of important note is the following:
for (int index = 0; index < this._size && ... ; ++index)
action(this._items[index]);
This code does not allocate any enumerators (nothing to Dispose), and does not pause while iterating.
Note that this also performs a shallow copy of the underlying collection, but the collection is now a snapshot in time. If the author does not correctly implement a check for the collection changing or going 'stale', the snapshot is still valid.
This doesn't in any way protect you from the problem of the lifetime issues... if the underlying collection disappears, you now have a shallow copy that points to what was... but at least you don't have a Dispose problem to deal with on orphaned iterators...
Yes, I said iterators... sometimes its advantageous to have state. Suppose you want to maintain something akin to a database cursor... maybe multiple foreach style Iterator<T>'s is the way to go. I personally dislike this style of design as there are too many lifetime issues, and you rely on the good graces of the authors of the collections you are relying on (unless you literally write everything yourself from scratch).
There is always a third option...
for (var i = 0; i < names.Count; i++)
{
Console.WriteLine(names[i]);
}
It ain't sexy, but its got teeth (apologies to Tom Cruise and the movie The Firm)
Its your choice, but now you know and it can be an informed one.
For fun, I popped List into reflector and this is the resulting C#:
public void ForEach(Action<T> action)
{
if (action == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.match);
}
for (int i = 0; i < this._size; i++)
{
action(this._items[i]);
}
}
Similarly, the MoveNext in Enumerator which is what is used by foreach is this:
public bool MoveNext()
{
if (this.version != this.list._version)
{
ThrowHelper.ThrowInvalidOperationException(ExceptionResource.InvalidOperation_EnumFailedVersion);
}
if (this.index < this.list._size)
{
this.current = this.list._items[this.index];
this.index++;
return true;
}
this.index = this.list._size + 1;
this.current = default(T);
return false;
}
The List.ForEach is much more trimmed down than MoveNext - far less processing - will more likely JIT into something efficient..
In addition, foreach() will allocate a new Enumerator no matter what. The GC is your friend, but if you're doing the same foreach repeatedly, this will make more throwaway objects, as opposed to reusing the same delegate - BUT - this is really a fringe case. In typical usage you will see little or no difference.
I guess the someList.ForEach() call could be easily parallelized whereas the normal foreach is not that easy to run parallel.
You could easily run several different delegates on different cores, which is not that easy to do with a normal foreach.
Just my 2 cents
I know two obscure-ish things that make them different. Go me!
Firstly, there's the classic bug of making a delegate for each item in the list. If you use the foreach keyword, all your delegates can end up referring to the last item of the list:
// A list of actions to execute later
List<Action> actions = new List<Action>();
// Numbers 0 to 9
List<int> numbers = Enumerable.Range(0, 10).ToList();
// Store an action that prints each number (WRONG!)
foreach (int number in numbers)
actions.Add(() => Console.WriteLine(number));
// Run the actions, we actually print 10 copies of "9"
foreach (Action action in actions)
action();
// So try again
actions.Clear();
// Store an action that prints each number (RIGHT!)
numbers.ForEach(number =>
actions.Add(() => Console.WriteLine(number)));
// Run the actions
foreach (Action action in actions)
action();
The List.ForEach method doesn't have this problem. The current item of the iteration is passed by value as an argument to the outer lambda, and then the inner lambda correctly captures that argument in its own closure. Problem solved.
(Sadly I believe ForEach is a member of List, rather than an extension method, though it's easy to define it yourself so you have this facility on any enumerable type.)
Secondly, the ForEach method approach has a limitation. If you are implementing IEnumerable by using yield return, you can't do a yield return inside the lambda. So looping through the items in a collection in order to yield return things is not possible by this method. You'll have to use the foreach keyword and work around the closure problem by manually making a copy of the current loop value inside the loop.
More here
You could name the anonymous delegate :-)
And you can write the second as:
someList.ForEach(s => s.ToUpper())
Which I prefer, and saves a lot of typing.
As Joachim says, parallelism is easier to apply to the second form.
List.ForEach() is considered to be more functional.
List.ForEach() says what you want done. foreach(item in list) also says exactly how you want it done. This leaves List.ForEach free to change the implementation of the how part in the future. For example, a hypothetical future version of .Net might always run List.ForEach in parallel, under the assumption that at this point everyone has a number of cpu cores that are generally sitting idle.
On the other hand, foreach (item in list) gives you a little more control over the loop. For example, you know that the items will be iterated in some kind of sequential order, and you could easily break in the middle if an item meets some condition.
Some more recent remarks on this issue are available here:
https://stackoverflow.com/a/529197/3043
The entire ForEach scope (delegate function) is treated as a single line of code (calling the function), and you cannot set breakpoints or step into the code. If an unhandled exception occurs the entire block is marked.
Behind the scenes, the anonymous delegate gets turned into an actual method so you could have some overhead with the second choice if the compiler didn't choose to inline the function. Additionally, any local variables referenced by the body of the anonymous delegate example would change in nature because of compiler tricks to hide the fact that it gets compiled to a new method. More info here on how C# does this magic:
http://blogs.msdn.com/oldnewthing/archive/2006/08/04/688527.aspx
The ForEach function is member of the generic class List.
I have created the following extension to reproduce the internal code:
public static class MyExtension<T>
{
public static void MyForEach(this IEnumerable<T> collection, Action<T> action)
{
foreach (T item in collection)
action.Invoke(item);
}
}
So a the end we are using a normal foreach (or a loop for if you want).
On the other hand, using a delegate function is just another way to define a function, this code:
delegate(string s) {
<process the string>
}
is equivalent to:
private static void myFunction(string s, <other variables...>)
{
<process the string>
}
or using labda expressions:
(s) => <process the string>
The second way you showed uses an extension method to execute the delegate method for each of the elements in the list.
This way, you have another delegate (=method) call.
Additionally, there is the possibility to iterate the list with a for loop.
One thing to be wary of is how to exit from the Generic .ForEach method - see this discussion. Although the link seems to say that this way is the fastest. Not sure why - you'd think they would be equivalent once compiled...

Categories

Resources