I have been playing around with various implementations of a PriorityQueue class lately, and I have come across some behavior I do not fully understand.
Here, is a snippet from the unit test I am running:
PriorityQueue<Int32> priorityQueue = new PriorityQueue<Int32>();
Randomizer r = new Randomizer();
priorityQueue.AddRange(r.GetInts(Int32.MinValue, Int32.MaxValue, r.Next(300, 10000)));
priorityQueue.PopFront(); // Gets called, and works correctly
Int32 numberToPop = priorityQueue.Count / 3;
priorityQueue.PopFront(numberToPop); // Does not get called, an empty IEnumberable<T> (T is an Int32 here) is returned
As I noted in the comments, the PopFront() gets called and operates correctly, but when I try to call the PopFront(numberToPop), the method does not get called at all, as in, it does not even enter the method.
Here are the methods:
public T PopFront()
{
if (items.Count == 0)
{
throw new InvalidOperationException("No elements exist in the queue");
}
T item = items[0];
items.RemoveAt(0);
return item;
}
public IEnumerable<T> PopFront(Int32 numberToPop)
{
Debug.WriteLine("PriorityQueue<T>.PopFront({0})", numberToPop);
if (numberToPop > items.Count)
{
throw new ArgumentException(#"The numberToPop exceeds the number
of elements in the queue", "numberToPop");
}
while (numberToPop-- > 0)
{
yield return PopFront();
}
}
Now, previously, I had implemented the overloaded PopFront function like this:
public IEnumerable<T> PopFront(Int32 numberToPop)
{
Console.WriteLine("PriorityQueue<T>.PopFront({0})", numberToPop);
if (numberToPop > items.Count)
{
throw new ArgumentException(#"The numberToPop exceeds the number
of elements in the queue", "numberToPop");
}
var poppedItems = items.Take(numberToPop);
Clear(0, numberToPop);
return poppedItems;
}
The previous implementation (above) worked as expected. With all that being said, I am obviously aware that my use of the yield statement is incorrect (most likely because I am removing then returning elements in the PopFront() function), but what I am really interested in knowing is why the PopFront(Int32 numberToPop) is never even called and, if it is not called, why then is it returning an empty IEnumerable?
Any help/explanation to why this is occurring is greatly appreciated.
When you use yield return, the compiler creates a state machine for you. Your code won't start executing until you start to enumerate (foreach or ToList) the IEnumerable<T> returned by your method.
From the yield documentation
On an iteration of the foreach loop, the MoveNext method is called for elements. This call executes the body of MyIteratorMethod until the next yield return statement is reached. The expression returned by the yield return statement determines not only the value of the element variable for consumption by the loop body but also the Current property of elements, which is an IEnumerable.
On each subsequent iteration of the foreach loop, the execution of the iterator body continues from where it left off, again stopping when it reaches a yield return statement. The foreach loop completes when the end of the iterator method or a yield break statement is reached.
Related
I am learning basic C#
I have the following code snippet
while(p!=null)
{
foreach(var x in X)
yield return x;
//....
foreach(var y in Y)
yield return y;
p=GetP();
}
Is the code above the same as
IEnumerable<object> os;
while (p!=null)
{
foreach(var x in X)
os.Add(x);
//....
foreach(var y in Y)
os.Add(y);
p=GetP();
}
return os;
???
The two code snippets* are "the same" only in the sense that they would produce the same sequence of objects if iteration is carried out to completion. However, the actual sequence of what is going to happen during the iteration is very different.
Code with yield return may be stopped early, if the loop that iterates the resultant IEnumerable terminates early because of a break or an exception.
Code that adds to a collection prepares a new collection in memory. Code with yield return uses existing collections to make a sequence that can be iterated, without storing the result in memory.
Code with yield return can react to changes in what it iterates during the process of the iteration. For example, if the code that uses your yield return method adds to collection Y in the process of iterating X, the newly items would be returned when it's time to iterate Y. The second code example would not be able to do the same.
* Let's pretend that IEnumerable<T> has an Add method; in reality you would probably end up using a List<T> or some other collection.
I believe you are correct in the general way that the yield works. Yield should be a little more performat than just adding to the collection because it will only be access when needed.
From MSDN:
You use a yield return statement to return each element one at a time.
You consume an iterator method by using a foreach statement or LINQ query. Each iteration of the foreach loop calls the iterator method. When a yield return statement is reached in the iterator method, expression is returned, and the current location in code is retained. Execution is restarted from that location the next time that the iterator function is called.
Rather than declaring a list at the start of the method, adding to it and then returning it - I'm sure there's some shorthand return statement that can be written in a loop, for example, to save the extra code (declaring etc.) but I've forgot it. Anybody know what I mean?
Use yield:
public IEnumerable<int> BuildList()
{
yield return 1;
yield return 2;
}
I think you are looking for yield return
you can just use it like so to return elements in a loop:
public IEnumerable<T> GetElements()
{
foreach(T t in listOfT)
{
// do some work
yield return t;
//code will continue here on next iteration
}
}
be aware that often you can use linq or the extension methods to so some work on all the elements of a list without having to write a function with a loop. Like filtering the list for elements that satisfy to some condition or to perform an operation on all elements of a list.
I have following code:
private void ProcessQueue()
{
foreach (MessageQueueItem item in GetNextQueuedItem())
PerformAction(item);
}
private IEnumerable<MessageQueueItem> GetNextQueuedItem()
{
if (_messageQueue.Count > 0)
yield return _messageQueue.Dequeue();
}
Initially there is one item in the queue as ProcessQueue is called.
During PerformAction, I would add more items to _messageQueue. However, the foreach loop quits after the initial item and does not see the subsequent items added.
I sense that somehow the initial state of the queue is being captured by yield.
Can someone explain what is happening and give a solution?
Your program does exactly what you instructed to do: it yields one item if Count > 0 - and yields zero items otherwise.
To return items until the queue becomes empty, try:
while (_messageQueue.Count > 0)
yield return actually pauses execution and does a fake return (it yields a value) until the next one is requested. In this case, what happens is you check if the count is > 0 and then yield the next value. When the next one is requested, your if statement isn't checked again, it returns to the line after the yield return which is the end of the method and thus it's done.
The definition of "YIELD"
Used in an iterator block to provide a value to the enumerator object or to signal the end of iteration.
I have an excellent record of reading syntax statements wrong but I think this means it has to be in an iterator block and the one you wrote is not.
Perhaps change your code to;
foreeach (MessageQueItem item In GetNextQuedItem()
{
if (_messageQueue.Count > 0)
{
yield return _messageQueue.Dequeue();
} else {
yield break;
}
}
Working through a tutorial (Professional ASP.NET MVC - Nerd Dinner), I came across this snippet of code:
public IEnumerable<RuleViolation> GetRuleViolations() {
if (String.IsNullOrEmpty(Title))
yield return new RuleViolation("Title required", "Title");
if (String.IsNullOrEmpty(Description))
yield return new RuleViolation("Description required","Description");
if (String.IsNullOrEmpty(HostedBy))
yield return new RuleViolation("HostedBy required", "HostedBy");
if (String.IsNullOrEmpty(Address))
yield return new RuleViolation("Address required", "Address");
if (String.IsNullOrEmpty(Country))
yield return new RuleViolation("Country required", "Country");
if (String.IsNullOrEmpty(ContactPhone))
yield return new RuleViolation("Phone# required", "ContactPhone");
if (!PhoneValidator.IsValidNumber(ContactPhone, Country))
yield return new RuleViolation("Phone# does not match country", "ContactPhone");
yield break;
}
I've read up on yield, but I guess my understanding is still a little bit hazy. What it seems to do is create an object that allows cycling through the items in a collection without actually doing the cycling unless and until it's absolutely necessary.
This example is a little strange to me, though. What I think it's doing is delaying the creation of any RuleViolation instances until the programmer actually requests a specific item in the collection using either for each or a LINQ extension method like .ElementAt(2).
Beyond this, though, I have some questions:
When do the conditional parts of the if statements get evaluated? When GetRuleViolations() is called or when the enumerable is actually iterated? In other words, if the value of Title changes from null to Really Geeky Dinner between the time that I call GetRuleViolations() and the time I attempt to actually iterate over it, will RuleViolation("Title required", "Title") be created or not?
Why is yield break; necessary? What is it really doing here?
Let's say Title is null or empty. If I call GetRuleViolations() then iterate over the resulting enumerable two times in a row, how many times will new RuleViolation("Title required", "Title") be called?
A function that contains yield commands is treated differently than a normal function. What is happening behind the scenes when that function is called, is that an anonymous type is constructed of the specific IEnumerable type of the function, the function creates an object of that type and returns it. The anonymous class contains logic that executes the body of the function up until the next yield command for every time the IEnumerable.MoveNext is called. It is a bit misleading, the body of the function is not executed in one batch like a normal function, but rather in pieces, each piece executes when the enumerator moves one step forward.
With regards to your questions:
As I said, each if gets executed when you iterate to the next element.
yield break is indeed not necessary in the example above. What it does is it terminates the enumeration.
Each time you iterate over the enumerable, you force the execution of the code again. Put a breakpoint on the relevant line and test for yourself.
1) Take this simpler example:
public void Enumerate()
{
foreach (var item in EnumerateItems())
{
Console.WriteLine(item);
}
}
public IEnumerable<string> EnumerateItems()
{
yield return "item1";
yield return "item2";
yield break;
}
Each time you call MoveNext() from the IEnumerator the code returns from the yield point and moves to the next executable line of code.
2) yield break; will tell the IEnumerator that there is nothing more to enumerate.
3) once per enumeration.
Using yield break;
public IEnumerable<string> EnumerateUntilEmpty()
{
foreach (var name in nameList)
{
if (String.IsNullOrEmpty(name)) yield break;
yield return name;
}
}
Short version:
1: The yield is the magic "Stop and come back later" keyword, so the if statements in front of the "active" one have been evaluated.
2: yield break explicitly ends the enumeration (think "break" in a switch case)
3: Every time. You can cache the result, of course, by turning it into a List for example and iterating over that afterwards.
I created the ThreadSafeCachedEnumerable<T> class intending to increase performance where long running queries where being reused. The idea was to get an enumerator from an IEnumerable<T> and add items to a cache on each call to MoveNext(). The following is my current implementation:
/// <summary>
/// Wraps an IEnumerable<T> and provides a thread-safe means of caching the values."/>
/// </summary>
/// <typeparam name="T"></typeparam>
class ThreadSafeCachedEnumerable<T> : IEnumerable<T>
{
// An enumerator from the original IEnumerable<T>
private IEnumerator<T> enumerator;
// The items we have already cached (from this.enumerator)
private IList<T> cachedItems = new List<T>();
public ThreadSafeCachedEnumerable(IEnumerable<T> enumerable)
{
this.enumerator = enumerable.GetEnumerator();
}
public IEnumerator<T> GetEnumerator()
{
// The index into the sequence
int currentIndex = 0;
// We will break with yield break
while (true)
{
// The currentIndex will never be decremented,
// so we can check without locking first
if (currentIndex < this.cachedItems.Count)
{
var current = this.cachedItems[currentIndex];
currentIndex += 1;
yield return current;
}
else
{
// If !(currentIndex < this.cachedItems.Count),
// we need to synchronize access to this.enumerator
lock (enumerator)
{
// See if we have more cached items ...
if (currentIndex < this.cachedItems.Count)
{
var current = this.cachedItems[currentIndex];
currentIndex += 1;
yield return current;
}
else
{
// ... otherwise, we'll need to get the next item from this.enumerator.MoveNext()
if (this.enumerator.MoveNext())
{
// capture the current item and cache it, then increment the currentIndex
var current = this.enumerator.Current;
this.cachedItems.Add(current);
currentIndex += 1;
yield return current;
}
else
{
// We reached the end of the enumerator - we're done
yield break;
}
}
}
}
}
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return this.GetEnumerator();
}
}
I simply lock (this.enumerator) when the no more items appear to be in the cache, just in case another thread is just about to add another item (I assume that calling MoveNext() on this.enumerator from two threads is a bad idea).
The performance is great when retrieving previously cached items, but it starts to suffer when getting many items for the first time (due to the constant locking). Any suggestions for increasing the performance?
Edit: The new Reactive Framework solves the problem outlined above, using the System.Linq.EnumerableEx.MemoizeAll() extension method.
Internally, MemoizeAll() uses a System.Linq.EnumerableEx.MemoizeAllEnumerable<T> (found in the System.Interactive assembly), which is similar to my ThreadSafeCachedEnumerable<T> (sorta).
Here's an awfully contrived example that prints the contents of an Enumerable (numbers 1-10) very slowly, then quickly prints the contents a second time (because it cached the values):
// Create an Enumerable<int> containing numbers 1-10, using Thread.Sleep() to simulate work
var slowEnum = EnumerableEx.Generate(1, currentNum => (currentNum <= 10), currentNum => currentNum, previousNum => { Thread.Sleep(250); return previousNum + 1; });
// This decorates the slow enumerable with one that will cache each value.
var cachedEnum = slowEnum.MemoizeAll();
// Print the numbers
foreach (var num in cachedEnum.Repeat(2))
{
Console.WriteLine(num);
}
A couple of recommendations:
It is now generally accepted practice not to make container classes responsible for locking. Someone calling your cached enumerator, for instance, might also want to prevent new entries from being added to the container while enumerating, which means that locking would occur twice. Therefore, it's best to defer that responsibility to the caller.
Your caching depends on the enumerator always returning items in-order, which is not guaranteed. It's better to use a Dictionary or HashSet. Similarly, items may be removed inbetween calls, invalidating the cache.
It is generally not recommended to establish locks on publically accessible objects. That includes the wrapped enumerator. Exceptions are conceivable, for example when you're absolutely certain you're absolutely certain you're the only instance holding a reference to the container class you're enumerating over. This would also largely invalidate my objections under #2.
Locking in .NET is normally very quick (if there is no contention). Has profiling identified locking as the source of the performance problem? How long does it take to call MoveNext on the underlying enumerator?
Additionally, the code as it stands is not thread-safe. You cannot safely call this.cachedItems[currentIndex] on one thread (in if (currentIndex < this.cachedItems.Count)) while invoking this.cachedItems.Add(current) on another. From the List(T) documentation: "A List(T) can support multiple readers concurrently, as long as the collection is not modified." To be thread-safe, you would need to protect all access to this.cachedItems with a lock (if there's any chance that one or more threads could modify it).