I am learning basic C#
I have the following code snippet
while(p!=null)
{
foreach(var x in X)
yield return x;
//....
foreach(var y in Y)
yield return y;
p=GetP();
}
Is the code above the same as
IEnumerable<object> os;
while (p!=null)
{
foreach(var x in X)
os.Add(x);
//....
foreach(var y in Y)
os.Add(y);
p=GetP();
}
return os;
???
The two code snippets* are "the same" only in the sense that they would produce the same sequence of objects if iteration is carried out to completion. However, the actual sequence of what is going to happen during the iteration is very different.
Code with yield return may be stopped early, if the loop that iterates the resultant IEnumerable terminates early because of a break or an exception.
Code that adds to a collection prepares a new collection in memory. Code with yield return uses existing collections to make a sequence that can be iterated, without storing the result in memory.
Code with yield return can react to changes in what it iterates during the process of the iteration. For example, if the code that uses your yield return method adds to collection Y in the process of iterating X, the newly items would be returned when it's time to iterate Y. The second code example would not be able to do the same.
* Let's pretend that IEnumerable<T> has an Add method; in reality you would probably end up using a List<T> or some other collection.
I believe you are correct in the general way that the yield works. Yield should be a little more performat than just adding to the collection because it will only be access when needed.
From MSDN:
You use a yield return statement to return each element one at a time.
You consume an iterator method by using a foreach statement or LINQ query. Each iteration of the foreach loop calls the iterator method. When a yield return statement is reached in the iterator method, expression is returned, and the current location in code is retained. Execution is restarted from that location the next time that the iterator function is called.
Related
I have been playing around with various implementations of a PriorityQueue class lately, and I have come across some behavior I do not fully understand.
Here, is a snippet from the unit test I am running:
PriorityQueue<Int32> priorityQueue = new PriorityQueue<Int32>();
Randomizer r = new Randomizer();
priorityQueue.AddRange(r.GetInts(Int32.MinValue, Int32.MaxValue, r.Next(300, 10000)));
priorityQueue.PopFront(); // Gets called, and works correctly
Int32 numberToPop = priorityQueue.Count / 3;
priorityQueue.PopFront(numberToPop); // Does not get called, an empty IEnumberable<T> (T is an Int32 here) is returned
As I noted in the comments, the PopFront() gets called and operates correctly, but when I try to call the PopFront(numberToPop), the method does not get called at all, as in, it does not even enter the method.
Here are the methods:
public T PopFront()
{
if (items.Count == 0)
{
throw new InvalidOperationException("No elements exist in the queue");
}
T item = items[0];
items.RemoveAt(0);
return item;
}
public IEnumerable<T> PopFront(Int32 numberToPop)
{
Debug.WriteLine("PriorityQueue<T>.PopFront({0})", numberToPop);
if (numberToPop > items.Count)
{
throw new ArgumentException(#"The numberToPop exceeds the number
of elements in the queue", "numberToPop");
}
while (numberToPop-- > 0)
{
yield return PopFront();
}
}
Now, previously, I had implemented the overloaded PopFront function like this:
public IEnumerable<T> PopFront(Int32 numberToPop)
{
Console.WriteLine("PriorityQueue<T>.PopFront({0})", numberToPop);
if (numberToPop > items.Count)
{
throw new ArgumentException(#"The numberToPop exceeds the number
of elements in the queue", "numberToPop");
}
var poppedItems = items.Take(numberToPop);
Clear(0, numberToPop);
return poppedItems;
}
The previous implementation (above) worked as expected. With all that being said, I am obviously aware that my use of the yield statement is incorrect (most likely because I am removing then returning elements in the PopFront() function), but what I am really interested in knowing is why the PopFront(Int32 numberToPop) is never even called and, if it is not called, why then is it returning an empty IEnumerable?
Any help/explanation to why this is occurring is greatly appreciated.
When you use yield return, the compiler creates a state machine for you. Your code won't start executing until you start to enumerate (foreach or ToList) the IEnumerable<T> returned by your method.
From the yield documentation
On an iteration of the foreach loop, the MoveNext method is called for elements. This call executes the body of MyIteratorMethod until the next yield return statement is reached. The expression returned by the yield return statement determines not only the value of the element variable for consumption by the loop body but also the Current property of elements, which is an IEnumerable.
On each subsequent iteration of the foreach loop, the execution of the iterator body continues from where it left off, again stopping when it reaches a yield return statement. The foreach loop completes when the end of the iterator method or a yield break statement is reached.
Rather than declaring a list at the start of the method, adding to it and then returning it - I'm sure there's some shorthand return statement that can be written in a loop, for example, to save the extra code (declaring etc.) but I've forgot it. Anybody know what I mean?
Use yield:
public IEnumerable<int> BuildList()
{
yield return 1;
yield return 2;
}
I think you are looking for yield return
you can just use it like so to return elements in a loop:
public IEnumerable<T> GetElements()
{
foreach(T t in listOfT)
{
// do some work
yield return t;
//code will continue here on next iteration
}
}
be aware that often you can use linq or the extension methods to so some work on all the elements of a list without having to write a function with a loop. Like filtering the list for elements that satisfy to some condition or to perform an operation on all elements of a list.
I am reading this blog: Pipes and filters pattern
I am confused by this code snippet:
public class Pipeline<T>
{
private readonly List<IOperation<T>> operations = new List<IOperation<T>>();
public Pipeline<T> Register(IOperation<T> operation)
{
operations.Add(operation);
return this;
}
public void Execute()
{
IEnumerable<T> current = new List<T>();
foreach (IOperation<T> operation in operations)
{
current = operation.Execute(current);
}
IEnumerator<T> enumerator = current.GetEnumerator();
while (enumerator.MoveNext());
}
}
what is the purpose of this statement: while (enumerator.MoveNext());? seems this code is a noop.
First consider this:
IEnumerable<T> current = new List<T>();
foreach (IOperation<T> operation in operations)
{
current = operation.Execute(current);
}
This code appears to be creating nested enumerables, each of which takes elements from the previous, applies some operation to them, and passes the result to the next. But it only constructs the enumerables. Nothing actually happens yet. It's just ready to go, stored in the variable current. There are lots of ways to implement IOperation.Execute but it could be something like this.
IEnumerable<T> Execute(IEnumerable<T> ts)
{
foreach (T t in ts)
yield return this.operation(t); // Perform some operation on t.
}
Another option suggested in the article is a sort:
IEnumerable<T> Execute(IEnumerable<T> ts)
{
// Thank-you LINQ!
// This was 10 lines of non-LINQ code in the original article.
return ts.OrderBy(t => t.Foo);
}
Now look at this:
IEnumerator<T> enumerator = current.GetEnumerator();
while (enumerator.MoveNext());
This actually causes the chain of operations to be performed. When the elements are requested from the enumeration, it causes elements from the original enumerable to be passed through the chain of IOperations, each of which performs some operation on them. The end result is discarded so only the side-effect of the operation is interesting - such as writing to the console or logging to a file. This would have been a simpler way to write the last two lines:
foreach (T t in current) {}
Another thing to observe is that the initial list that starts the process is an empty list so for this to make sense some instances of T have to be created inside the first operation. In the article this is done by asking the user for input from the console.
In this case, the while (enumerator.MoveNext()); is simply evaluating all the items that are returned by the final IOperation<T>. It looks a little confusing, but the empty List<T> is only created in order to supply a value to the first IOperation<T>.
In many collections this would do exaclty nothing as you suggest, but given that we are talking about the pipes and filters pattern it is likely that the final value is some sort of iterator that will cause code to be executed. It could be something like this, for example (assuming that is an integer):
public class WriteToConsoleOperation : IOperation<int>
{
public IEnumerable<int> Execute(IEnumerable<int> ints)
{
foreach (var i in ints)
{
Console.WriteLine(i);
yield return i;
}
}
}
So calling MoveNext() for each item on the IEnumerator<int> returned by this iterator will return each of the values (which are ignored in the while loop) but also output each of the values to the console.
Does that make sense?
while (enumerator.MoveNext());
Inside the current block of code, there is no affect (it moves through all the items in the enumeration). The displayed code doesn't act on the current element in the enumeration. What might be happening is that the MoveNext() method is moving to the next element, and it is doing something to the objects in the collection (updating an internal value, pull the next from the database etc.). Since the type is List<T> this is probably not the case, but in other instances it could be.
Working through a tutorial (Professional ASP.NET MVC - Nerd Dinner), I came across this snippet of code:
public IEnumerable<RuleViolation> GetRuleViolations() {
if (String.IsNullOrEmpty(Title))
yield return new RuleViolation("Title required", "Title");
if (String.IsNullOrEmpty(Description))
yield return new RuleViolation("Description required","Description");
if (String.IsNullOrEmpty(HostedBy))
yield return new RuleViolation("HostedBy required", "HostedBy");
if (String.IsNullOrEmpty(Address))
yield return new RuleViolation("Address required", "Address");
if (String.IsNullOrEmpty(Country))
yield return new RuleViolation("Country required", "Country");
if (String.IsNullOrEmpty(ContactPhone))
yield return new RuleViolation("Phone# required", "ContactPhone");
if (!PhoneValidator.IsValidNumber(ContactPhone, Country))
yield return new RuleViolation("Phone# does not match country", "ContactPhone");
yield break;
}
I've read up on yield, but I guess my understanding is still a little bit hazy. What it seems to do is create an object that allows cycling through the items in a collection without actually doing the cycling unless and until it's absolutely necessary.
This example is a little strange to me, though. What I think it's doing is delaying the creation of any RuleViolation instances until the programmer actually requests a specific item in the collection using either for each or a LINQ extension method like .ElementAt(2).
Beyond this, though, I have some questions:
When do the conditional parts of the if statements get evaluated? When GetRuleViolations() is called or when the enumerable is actually iterated? In other words, if the value of Title changes from null to Really Geeky Dinner between the time that I call GetRuleViolations() and the time I attempt to actually iterate over it, will RuleViolation("Title required", "Title") be created or not?
Why is yield break; necessary? What is it really doing here?
Let's say Title is null or empty. If I call GetRuleViolations() then iterate over the resulting enumerable two times in a row, how many times will new RuleViolation("Title required", "Title") be called?
A function that contains yield commands is treated differently than a normal function. What is happening behind the scenes when that function is called, is that an anonymous type is constructed of the specific IEnumerable type of the function, the function creates an object of that type and returns it. The anonymous class contains logic that executes the body of the function up until the next yield command for every time the IEnumerable.MoveNext is called. It is a bit misleading, the body of the function is not executed in one batch like a normal function, but rather in pieces, each piece executes when the enumerator moves one step forward.
With regards to your questions:
As I said, each if gets executed when you iterate to the next element.
yield break is indeed not necessary in the example above. What it does is it terminates the enumeration.
Each time you iterate over the enumerable, you force the execution of the code again. Put a breakpoint on the relevant line and test for yourself.
1) Take this simpler example:
public void Enumerate()
{
foreach (var item in EnumerateItems())
{
Console.WriteLine(item);
}
}
public IEnumerable<string> EnumerateItems()
{
yield return "item1";
yield return "item2";
yield break;
}
Each time you call MoveNext() from the IEnumerator the code returns from the yield point and moves to the next executable line of code.
2) yield break; will tell the IEnumerator that there is nothing more to enumerate.
3) once per enumeration.
Using yield break;
public IEnumerable<string> EnumerateUntilEmpty()
{
foreach (var name in nameList)
{
if (String.IsNullOrEmpty(name)) yield break;
yield return name;
}
}
Short version:
1: The yield is the magic "Stop and come back later" keyword, so the if statements in front of the "active" one have been evaluated.
2: yield break explicitly ends the enumeration (think "break" in a switch case)
3: Every time. You can cache the result, of course, by turning it into a List for example and iterating over that afterwards.
still trying to find where i would use the "yield" keyword in a real situation.
I see this thread on the subject
What is the yield keyword used for in C#?
but in the accepted answer, they have this as an example where someone is iterating around Integers()
public IEnumerable<int> Integers()
{
yield return 1;
yield return 2;
yield return 4;
yield return 8;
yield return 16;
yield return 16777216;
}
but why not just use
list<int>
here instead. seems more straightforward..
If you build and return a List (say it has 1 million elements), that's a big chunk of memory, and also of work to create it.
Sometimes the caller may only want to know what the first element is. Or they might want to write them to a file as they get them, rather than building the whole list in memory and then writing it to a file.
That's why it makes more sense to use yield return. It doesn't look that different to building the whole list and returning it, but it's very different because the whole list doesn't have to be created in memory before the caller can look at the first item on it.
When the caller says:
foreach (int i in Integers())
{
// do something with i
}
Each time the loop requires a new i, it runs a bit more of the code in Integers(). The code in that function is "paused" when it hits a yield return statement.
Yield allows you to build methods that produce data without having to gather everything up before returning. Think of it as returning multiple values along the way.
Here's a couple of methods that illustrate the point
public IEnumerable<String> LinesFromFile(String fileName)
{
using (StreamReader reader = new StreamReader(fileName))
{
String line;
while ((line = reader.ReadLine()) != null)
yield return line;
}
}
public IEnumerable<String> LinesWithEmails(IEnumerable<String> lines)
{
foreach (String line in lines)
{
if (line.Contains("#"))
yield return line;
}
}
Neither of these two methods will read the whole contents of the file into memory, yet you can use them like this:
foreach (String lineWithEmail in LinesWithEmails(LinesFromFile("test.txt")))
Console.Out.WriteLine(lineWithEmail);
You can use yield to build any iterator. That could be a lazily evaluated series (reading lines from a file or database, for example, without reading everything at once, which could be too much to hold in memory), or could be iterating over existing data such as a List<T>.
C# in Depth has a free chapter (6) all about iterator blocks.
I also blogged very recently about using yield for smart brute-force algorithms.
For an example of the lazy file reader:
static IEnumerable<string> ReadLines(string path) {
using (StreamReader reader = File.OpenText(path)) {
string line;
while ((line = reader.ReadLine()) != null) {
yield return line;
}
}
}
This is entirely "lazy"; nothing is read until you start enumerating, and only a single line is ever held in memory.
Note that LINQ-to-Objects makes extensive use of iterator blocks (yield). For example, the Where extension is essentially:
static IEnumerable<T> Where<T>(this IEnumerable<T> data, Func<T, bool> predicate) {
foreach (T item in data) {
if (predicate(item)) yield return item;
}
}
And again, fully lazy - allowing you to chain together multiple operations without forcing everything to be loaded into memory.
yield allows you to process collections that are potentially infinite in size because the entire collection is never loaded into memory in one go, unlike a List based approach. For instance an IEnumerable<> of all the prime numbers could be backed off by the appropriate algo for finding the primes, whereas a List approach would always be finite in size and therefore incomplete. In this example, using yield also allows processing for the next element to be deferred until it is required.
A real situation for me, is when i want to process a collection that takes a while to populate more smoothly.
Imagine something along the lines (psuedo code):
public IEnumberable<VerboseUserInfo> GetAllUsers()
{
foreach(UserId in userLookupList)
{
VerboseUserInfo info = new VerboseUserInfo();
info.Load(ActiveDirectory.GetLotsOfUserData(UserId));
info.Load(WebSerice.GetSomeMoreInfo(UserId));
yield return info;
}
}
Instead of having to wait a minute for the collection to populate before i can start processing items in it. I will be able to start immediately, and then report back to the user-interface as it happens.
You may not always want to use yield instead of returning a list, and in your example you use yield to actually return a list of integers. Depending on whether you want a mutable list, or a immutable sequence, you could use a list, or an iterator (or some other collection muttable/immutable).
But there are benefits to use yield.
Yield provides an easy way to build lazy evaluated iterators. (Meaning only the code to get next element in sequence is executed when the MoveNext() method is called then the iterator returns doing no more computations, until the method is called again)
Yield builds a state machine under the covers, and this saves you allot of work by not having to code the states of your generic generator => more concise/simple code.
Yield automatically builds optimized and thread safe iterators, sparing you the details on how to build them.
Yield is much more powerful than it seems at first sight and can be used for much more than just building simple iterators, check out this video to see Jeffrey Richter and his AsyncEnumerator and how yield is used make coding using the async pattern easy.
You might want to iterate through various collections:
public IEnumerable<ICustomer> Customers()
{
foreach( ICustomer customer in m_maleCustomers )
{
yield return customer;
}
foreach( ICustomer customer in m_femaleCustomers )
{
yield return customer;
}
// or add some constraints...
foreach( ICustomer customer in m_customers )
{
if( customer.Age < 16 )
{
yield return customer;
}
}
// Or....
if( Date.Today == 1 )
{
yield return m_superCustomer;
}
}
I agree with everything everyone has said here about lazy evaluation and memory usage and wanted to add another scenario where I have found the iterators using the yield keyword useful. I have run into some cases where I have to do a sequence of potentially expensive processing on some data where it is extremely useful to use iterators. Rather than processing the entire file immediately, or rolling my own processing pipeline, I can simply use iterators something like this:
IEnumerable<double> GetListFromFile(int idxItem)
{
// read data from file
return dataReadFromFile;
}
IEnumerable<double> ConvertUnits(IEnumerable<double> items)
{
foreach(double item in items)
yield return convertUnits(item);
}
IEnumerable<double> DoExpensiveProcessing(IEnumerable<double> items)
{
foreach(double item in items)
yield return expensiveProcessing(item);
}
IEnumerable<double> GetNextList()
{
return DoExpensiveProcessing(ConvertUnits(GetListFromFile(curIdx++)));
}
The advantage here is that by keeping the input and output to all of the functions IEnumerable<double>, my processing pipeline is completely composable, easy to read, and lazy evaluated so I only have to do the processing I really need to do. This lets me put almost all of my processing in the GUI thread without impacting responsiveness so I don't have to worry about any threading issues.
I came up with this to overcome .net shortcoming having to manually deep copy List.
I use this:
static public IEnumerable<SpotPlacement> CloneList(List<SpotPlacement> spotPlacements)
{
foreach (SpotPlacement sp in spotPlacements)
{
yield return (SpotPlacement)sp.Clone();
}
}
And at another place:
public object Clone()
{
OrderItem newOrderItem = new OrderItem();
...
newOrderItem._exactPlacements.AddRange(SpotPlacement.CloneList(_exactPlacements));
...
return newOrderItem;
}
I tried to come up with oneliner that does this, but it's not possible, due to yield not working inside anonymous method blocks.
EDIT:
Better still, use generic List cloner:
class Utility<T> where T : ICloneable
{
static public IEnumerable<T> CloneList(List<T> tl)
{
foreach (T t in tl)
{
yield return (T)t.Clone();
}
}
}
The method used by yield of saving memory by processing items on-the-fly is nice, but really it's just syntactic sugar. It's been around for a long time. In any language that has function or interface pointers (even C and assembly) you can get the same effect using a callback function / interface.
This fancy stuff:
static IEnumerable<string> GetItems()
{
yield return "apple";
yield return "orange";
yield return "pear";
}
foreach(string item in GetItems())
{
Console.WriteLine(item);
}
is basically equivalent to old-fashioned:
interface ItemProcessor
{
void ProcessItem(string s);
};
class MyItemProcessor : ItemProcessor
{
public void ProcessItem(string s)
{
Console.WriteLine(s);
}
};
static void ProcessItems(ItemProcessor processor)
{
processor.ProcessItem("apple");
processor.ProcessItem("orange");
processor.ProcessItem("pear");
}
ProcessItems(new MyItemProcessor());