Help me understand the code snippet in c# - c#

I am reading this blog: Pipes and filters pattern
I am confused by this code snippet:
public class Pipeline<T>
{
private readonly List<IOperation<T>> operations = new List<IOperation<T>>();
public Pipeline<T> Register(IOperation<T> operation)
{
operations.Add(operation);
return this;
}
public void Execute()
{
IEnumerable<T> current = new List<T>();
foreach (IOperation<T> operation in operations)
{
current = operation.Execute(current);
}
IEnumerator<T> enumerator = current.GetEnumerator();
while (enumerator.MoveNext());
}
}
what is the purpose of this statement: while (enumerator.MoveNext());? seems this code is a noop.

First consider this:
IEnumerable<T> current = new List<T>();
foreach (IOperation<T> operation in operations)
{
current = operation.Execute(current);
}
This code appears to be creating nested enumerables, each of which takes elements from the previous, applies some operation to them, and passes the result to the next. But it only constructs the enumerables. Nothing actually happens yet. It's just ready to go, stored in the variable current. There are lots of ways to implement IOperation.Execute but it could be something like this.
IEnumerable<T> Execute(IEnumerable<T> ts)
{
foreach (T t in ts)
yield return this.operation(t); // Perform some operation on t.
}
Another option suggested in the article is a sort:
IEnumerable<T> Execute(IEnumerable<T> ts)
{
// Thank-you LINQ!
// This was 10 lines of non-LINQ code in the original article.
return ts.OrderBy(t => t.Foo);
}
Now look at this:
IEnumerator<T> enumerator = current.GetEnumerator();
while (enumerator.MoveNext());
This actually causes the chain of operations to be performed. When the elements are requested from the enumeration, it causes elements from the original enumerable to be passed through the chain of IOperations, each of which performs some operation on them. The end result is discarded so only the side-effect of the operation is interesting - such as writing to the console or logging to a file. This would have been a simpler way to write the last two lines:
foreach (T t in current) {}
Another thing to observe is that the initial list that starts the process is an empty list so for this to make sense some instances of T have to be created inside the first operation. In the article this is done by asking the user for input from the console.

In this case, the while (enumerator.MoveNext()); is simply evaluating all the items that are returned by the final IOperation<T>. It looks a little confusing, but the empty List<T> is only created in order to supply a value to the first IOperation<T>.
In many collections this would do exaclty nothing as you suggest, but given that we are talking about the pipes and filters pattern it is likely that the final value is some sort of iterator that will cause code to be executed. It could be something like this, for example (assuming that is an integer):
public class WriteToConsoleOperation : IOperation<int>
{
public IEnumerable<int> Execute(IEnumerable<int> ints)
{
foreach (var i in ints)
{
Console.WriteLine(i);
yield return i;
}
}
}
So calling MoveNext() for each item on the IEnumerator<int> returned by this iterator will return each of the values (which are ignored in the while loop) but also output each of the values to the console.
Does that make sense?

while (enumerator.MoveNext());
Inside the current block of code, there is no affect (it moves through all the items in the enumeration). The displayed code doesn't act on the current element in the enumeration. What might be happening is that the MoveNext() method is moving to the next element, and it is doing something to the objects in the collection (updating an internal value, pull the next from the database etc.). Since the type is List<T> this is probably not the case, but in other instances it could be.

Related

Expensive IEnumerable: Any way to prevent multiple enumerations without forcing an immediate enumeration? [duplicate]

This question already has answers here:
Is there an IEnumerable implementation that only iterates over it's source (e.g. LINQ) once?
(4 answers)
Closed 9 months ago.
I have a very large enumeration and am preparing an expensive deferred operation on it (e.g. sorting it). I'm then passing this into a function which may or may not consume the IEnumerable, depending on some logic of its own.
Here's an illustration:
IEnumerable<Order> expensiveEnumerable = fullCatalog.OrderBy(c => Prioritize(c));
MaybeFullFillSomeOrders(expensiveEnumerable);
// Elsewhere... (example use-case for multiple enumerations, not real code)
void MaybeFullFillSomeOrders(IEnumerable<Order> nextUpOrders){
if(notAGoodTime())
return;
foreach(var order in nextUpOrders)
collectSomeInfo(order);
processInfo();
foreach(var order in nextUpOrders) {
maybeFulfill(order);
if(atCapacity())
break;
}
}
I'm would like to prepare my input to the other function such that:
If they do not consume the enumerable, the performance price of sorting is not paid.
This already precludes calling e.g. ToList() or ToArray() on it
If they choose to enumerate multiple times (perhaps not realizing how expensive it would be in this case) I want some defence in place to prevent the multiple enumeration.
Ideally, the result is still an IEnumerable<T>
The best solution I've come up with is to use Lazy<>
var expensive = new Lazy<List<Order>>>(
() => fullCatalog.OrderBy(c => Prioritize(c)).ToList());
This appears to satisfy criteria 1 and 2, but has a couple of drawbacks:
I have to change the interface to all downstream usages to expect a Lazy.
The full list (which in this case was built up from a SelectMany() on serveral smaller partitions) would need to be allocated as a new single contiguous list in memory. I'm not sure there's an easy way around this if I want to "cache" the sort result, but if you know of one I'm all ears.
One idea I had to solve the first problem was to wrap Lazy<> in some custom class that either implements or can implicitly be converted to an IEnumerable<T>, but I'm hoping someone knows of a more elegant approach.
You certainly could write your own IEnumerable<T> implementation that wraps another one, remembering all the elements it's already seen (and whether it's exhausted or not). If you need it to be thread-safe that becomes trickier, and you'd need to remember that at any time there may be multiple iterators working against the same IEnumerable<T>.
Fundamentally I think it would come down to working out what to do when asked for the next element (which is somewhat-annoyingly split into MoveNext() and Current, but that can probably be handled...):
If you've already read the next element within another iterator, you can yield it from your buffer
If you've already discovered that there is no next element, you can return that immediately
Otherwise, you need to ask the original iterator for the next element, and remember if for all the other wrapped iterators.
The other aspect that's tricky is knowing when to dispose of the underlying IEnumerator<T> - if you don't need to do that, it makes things simpler.
As a very sketchy attempt that I haven't even attempted to compile, and which is definitely not thread-safe, you could try something like this:
public class LazyEnumerable<T> : IEnumerable<T>
{
private readonly IEnumerator<T> iterator;
private List<T> buffer;
private bool completed = false;
public LazyEnumerable(IEnumerable<T> original)
{
// TODO: You could be even lazier, only calling
// GetEnumerator when you first need an element
iterator = original.GetEnumerator();
}
IEnumerator GetEnumerator() => GetEnumerator();
public IEnumerator<T> GetEnumerator()
{
int index = 0;
while (true)
{
// If we already have the element, yield it
if (index < buffer.Count)
{
yield return buffer[index];
}
// If we've yielded everything in the buffer and some
// other iterator has come to the end of the original,
// we're done.
else if (completed)
{
yield break;
}
// Otherwise, see if there's anything left in the original
// iterator.
else
{
bool hasNext = iterator.MoveNext();
if (hasNext)
{
var current = iterator.Current;
buffer.Add(current);
yield return current;
}
else
{
completed = true;
yield break;
}
}
index++;
}
}
}

Replace items in a list -- in place

I'm facing an issue with some simple C# code which I would easily fix in C/C++.
I guess I'm missing something.
I want to do the following (modifying items in a list -- in place):
//pseudocode
void modify<T>(List<T> a) {
foreach(var item in a) {
if(condition(item)) {
item = somethingElse;
}
}
}
I understand that foreach loops on a collection viewed as immutable, so the code above can't work.
I therefore tried the following :
void modify<T>(List<T> a) {
using (var sequenceEnum = a.GetEnumerator())
{
while (sequenceEnum.MoveNext())
{
var m = sequenceEnum.Current;
if(condition(m)) {
sequenceEnum.Current = somethingElse;
}
}
}
}
Naively thinking that Enumerator was some kind of pointer to my Element. Apparently enumerators are also immutable.
In C++ I would write something like that:
template<typename T>
struct Node {
T* value;
Node* next;
}
being then able to modify *value without touching anything in Node and therefore in the parent collection:
Node<T>* current = a->head;
while(current != nullptr) {
if(condition(current->value))
current->value = ...
}
current = current->next;
}
Do I really have to to unsafe code?
Or am i stuck the awfulness of calling subscript inside the loop?
You could also use a simple for loop.
void modify<T>(List<T> a)
{
for (int i = 0; i < a.Count; i++)
{
if (condition(a[i]))
{
a[i] = default(T);
}
}
}
In short - do not modify lists. You can achieve desired effect with
a = a.Select(x => <some logic> ? x : default(T)).ToList()
In general lists in C# are immutable during iteration. You can hovewer use .RemoveAll or similar methods.
As described in documentation here, System.Collections.Generic.List<T> is a generic implementation of the System.Collections.ArrayList which has O(1) complexity in indexed accessor. Very much like C++ std::vector<>, complexity of insertion/addition of elements is unpredictible, but access is time constant (complexity-wise, with respect to caching, etc).
The equivalent of your C++ code snippet would be LinkedList
As for the immutability of your collection during iteration, it is clearly stated in the documentation of GetEnumerator method here. Indeed, during enumeration (within a foreach, or using IEnumerator.MoveNext directly):
Enumerators can be used to read the data in the collection, but they
cannot be used to modify the underlying collection.
Moreover, modifying the list will invalidate the enumerator and usually throw an exception:
An enumerator remains valid as long as the collection remains
unchanged. If changes are made to the collection, such as adding,
modifying, or deleting elements, the enumerator is irrecoverably
invalidated and its behavior is undefined.
I believe this interface contract consistency between various types of collections leads to your misunderstanding: it would possible to implement a mutable list, but it is not required by the interfaces contract.
Imagine you want to implement a list that is mutable during enumeration. Would the enumerator hold a reference to (or a way to retrieve) the entry or the entry itself ? The entry itself would make it unmutable, a reference would be invalid when inserting elements in a linked list for example.
The simple for loop proposed by #Igor seems to be the best way to go if you want to use the standard Collections library. Otherwise, you may need to reimplement it yourself.
Use something like this:
List<T> GetModified<T>(List<T> list, Func<T, bool> condition, Func<T> replacement)
{
return list.Select(m => if (condition(m))
{ return m; }
else
{ return replacement(); }).ToList();
}
Usage:
originalList = GetModified(originalList, i => i.IsAwesome(), null);
But this can also get you into trouble with cross-thread operations. Try to use immutable instances where possible, especially with IEnumerable.
If you really really want to modify the instance of list:
//if you ever want to also remove items, this is magic (why I iterate backwards)
for (int i = list.Count - 1; i >= 0; i--)
{
if (condition)
{
list[i] = whateverYouWant;
}
}

Getting head and tail from IEnumerable that can only be iterated once

I have a sequence of elements. The sequence can only be iterated once and can be "infinite".
What is the best way get the head and the tail of such a sequence?
Update: A few clarifications that would have been nice if I included in the original question :)
Head is the first element of the sequence and tail is "the rest". That means the the tail is also "infinite".
When I say infinite, I mean "very large" and "I wouldn't want to store it all in memory at once". It could also have been actually infinite, like sensor data for example (but it wasn't in my case).
When I say that it can only be iterated once, I mean that generating the sequence is resource heavy, so I woundn't want to do it again. It could also have been volatile data, again like sensor data, that won't be the same on next read (but it wasn't in my case).
Decomposing IEnumerable<T> into head & tail isn't particularly good for recursive processing (unlike functional lists) because when you use the tail operation recursively, you'll create a number of indirections. However, you can write something like this:
I'm ignoring things like argument checking and exception handling, but it shows the idea...
Tuple<T, IEnumerable<T>> HeadAndTail<T>(IEnumerable<T> source) {
// Get first element of the 'source' (assuming it is there)
var en = source.GetEnumerator();
en.MoveNext();
// Return first element and Enumerable that iterates over the rest
return Tuple.Create(en.Current, EnumerateTail(en));
}
// Turn remaining (unconsumed) elements of enumerator into enumerable
IEnumerable<T> EnumerateTail<T>(IEnumerator en) {
while(en.MoveNext()) yield return en.Current;
}
The HeadAndTail method gets the first element and returns it as the first element of a tuple. The second element of a tuple is IEnumerable<T> that's generated from the remaining elements (by iterating over the rest of the enumerator that we already created).
Obviously, each call to HeadAndTail should enumerate the sequence again (unless there is some sort of caching used). For example, consider the following:
var a = HeadAndTail(sequence);
Console.WriteLine(HeadAndTail(a.Tail).Tail);
//Element #2; enumerator is at least at #2 now.
var b = HeadAndTail(sequence);
Console.WriteLine(b.Tail);
//Element #1; there is no way to get #1 unless we enumerate the sequence again.
For the same reason, HeadAndTail could not be implemented as separate Head and Tail methods (unless you want even the first call to Tail to enumerate the sequence again even if it was already enumerated by a call to Head).
Additionally, HeadAndTail should not return an instance of IEnumerable (as it could be enumerated multiple times).
This leaves us with the only option: HeadAndTail should return IEnumerator, and, to make things more obvious, it should accept IEnumerator as well (we're just moving an invocation of GetEnumerator from inside the HeadAndTail to the outside, to emphasize it is of one-time use only).
Now that we have worked out the requirements, the implementation is pretty straightforward:
class HeadAndTail<T> {
public readonly T Head;
public readonly IEnumerator<T> Tail;
public HeadAndTail(T head, IEnumerator<T> tail) {
Head = head;
Tail = tail;
}
}
static class IEnumeratorExtensions {
public static HeadAndTail<T> HeadAndTail<T>(this IEnumerator<T> enumerator) {
if (!enumerator.MoveNext()) return null;
return new HeadAndTail<T>(enumerator.Current, enumerator);
}
}
And now it can be used like this:
Console.WriteLine(sequence.GetEnumerator().HeadAndTail().Tail.HeadAndTail().Head);
//Element #2
Or in recursive functions like this:
TResult FoldR<TSource, TResult>(
IEnumerator<TSource> sequence,
TResult seed,
Func<TSource, TResult, TResult> f
) {
var headAndTail = sequence.HeadAndTail();
if (headAndTail == null) return seed;
return f(headAndTail.Head, FoldR(headAndTail.Tail, seed, f));
}
int Sum(IEnumerator<int> sequence) {
return FoldR(sequence, 0, (x, y) => x+y);
}
var array = Enumerable.Range(1, 5);
Console.WriteLine(Sum(array.GetEnumerator())); //1+(2+(3+(4+(5+0)))))
While other approaches here suggest using yield return for the tail enumerable, such an approach adds unnecessary nesting overhead. A better approach would be to convert the Enumerator<T> back into something that can be used with foreach:
public struct WrappedEnumerator<T>
{
T myEnumerator;
public T GetEnumerator() { return myEnumerator; }
public WrappedEnumerator(T theEnumerator) { myEnumerator = theEnumerator; }
}
public static class AsForEachHelper
{
static public WrappedEnumerator<IEnumerator<T>> AsForEach<T>(this IEnumerator<T> theEnumerator) {return new WrappedEnumerator<IEnumerator<T>>(theEnumerator);}
static public WrappedEnumerator<System.Collections.IEnumerator> AsForEach(this System.Collections.IEnumerator theEnumerator)
{ return new WrappedEnumerator<System.Collections.IEnumerator>(theEnumerator); }
}
If one used separate WrappedEnumerator structs for the generic IEnumerable<T> and non-generic IEnumerable, one could have them implement IEnumerable<T> and IEnumerable respectively; they wouldn't really obey the IEnumerable<T> contract, though, which specifies that it should be possible to possible to call GetEnumerator() multiple times, with each call returning an independent enumerator.
Another important caveat is that if one uses AsForEach on an IEnumerator<T>, the resulting WrappedEnumerator should be enumerated exactly once. If it is never enumerated, the underlying IEnumerator<T> will never have its Dispose method called.
Applying the above-supplied methods to the problem at hand, it would be easy to call GetEnumerator() on an IEnumerable<T>, read out the first few items, and then use AsForEach() to convert the remainder so it can be used with a ForEach loop (or perhaps, as noted above, to convert it into an implementation of IEnumerable<T>). It's important to note, however, that calling GetEnumerator() creates an obligation to Dispose the resulting IEnumerator<T>, and the class that performs the head/tail split would have no way to do that if nothing ever calls GetEnumerator() on the tail.
probably not the best way to do it but if you use the .ToList() method you can then get the elements in position [0] and [Count-1], if Count > 0.
But you should specify what do you mean by "can be iterated only once"
What exactly is wrong with .First() and .Last()? Though yeah, I have to agree with the people who asked "what does the tail of an infinite list mean"... the notion doesn't make sense, IMO.

Yield keyword value added?

still trying to find where i would use the "yield" keyword in a real situation.
I see this thread on the subject
What is the yield keyword used for in C#?
but in the accepted answer, they have this as an example where someone is iterating around Integers()
public IEnumerable<int> Integers()
{
yield return 1;
yield return 2;
yield return 4;
yield return 8;
yield return 16;
yield return 16777216;
}
but why not just use
list<int>
here instead. seems more straightforward..
If you build and return a List (say it has 1 million elements), that's a big chunk of memory, and also of work to create it.
Sometimes the caller may only want to know what the first element is. Or they might want to write them to a file as they get them, rather than building the whole list in memory and then writing it to a file.
That's why it makes more sense to use yield return. It doesn't look that different to building the whole list and returning it, but it's very different because the whole list doesn't have to be created in memory before the caller can look at the first item on it.
When the caller says:
foreach (int i in Integers())
{
// do something with i
}
Each time the loop requires a new i, it runs a bit more of the code in Integers(). The code in that function is "paused" when it hits a yield return statement.
Yield allows you to build methods that produce data without having to gather everything up before returning. Think of it as returning multiple values along the way.
Here's a couple of methods that illustrate the point
public IEnumerable<String> LinesFromFile(String fileName)
{
using (StreamReader reader = new StreamReader(fileName))
{
String line;
while ((line = reader.ReadLine()) != null)
yield return line;
}
}
public IEnumerable<String> LinesWithEmails(IEnumerable<String> lines)
{
foreach (String line in lines)
{
if (line.Contains("#"))
yield return line;
}
}
Neither of these two methods will read the whole contents of the file into memory, yet you can use them like this:
foreach (String lineWithEmail in LinesWithEmails(LinesFromFile("test.txt")))
Console.Out.WriteLine(lineWithEmail);
You can use yield to build any iterator. That could be a lazily evaluated series (reading lines from a file or database, for example, without reading everything at once, which could be too much to hold in memory), or could be iterating over existing data such as a List<T>.
C# in Depth has a free chapter (6) all about iterator blocks.
I also blogged very recently about using yield for smart brute-force algorithms.
For an example of the lazy file reader:
static IEnumerable<string> ReadLines(string path) {
using (StreamReader reader = File.OpenText(path)) {
string line;
while ((line = reader.ReadLine()) != null) {
yield return line;
}
}
}
This is entirely "lazy"; nothing is read until you start enumerating, and only a single line is ever held in memory.
Note that LINQ-to-Objects makes extensive use of iterator blocks (yield). For example, the Where extension is essentially:
static IEnumerable<T> Where<T>(this IEnumerable<T> data, Func<T, bool> predicate) {
foreach (T item in data) {
if (predicate(item)) yield return item;
}
}
And again, fully lazy - allowing you to chain together multiple operations without forcing everything to be loaded into memory.
yield allows you to process collections that are potentially infinite in size because the entire collection is never loaded into memory in one go, unlike a List based approach. For instance an IEnumerable<> of all the prime numbers could be backed off by the appropriate algo for finding the primes, whereas a List approach would always be finite in size and therefore incomplete. In this example, using yield also allows processing for the next element to be deferred until it is required.
A real situation for me, is when i want to process a collection that takes a while to populate more smoothly.
Imagine something along the lines (psuedo code):
public IEnumberable<VerboseUserInfo> GetAllUsers()
{
foreach(UserId in userLookupList)
{
VerboseUserInfo info = new VerboseUserInfo();
info.Load(ActiveDirectory.GetLotsOfUserData(UserId));
info.Load(WebSerice.GetSomeMoreInfo(UserId));
yield return info;
}
}
Instead of having to wait a minute for the collection to populate before i can start processing items in it. I will be able to start immediately, and then report back to the user-interface as it happens.
You may not always want to use yield instead of returning a list, and in your example you use yield to actually return a list of integers. Depending on whether you want a mutable list, or a immutable sequence, you could use a list, or an iterator (or some other collection muttable/immutable).
But there are benefits to use yield.
Yield provides an easy way to build lazy evaluated iterators. (Meaning only the code to get next element in sequence is executed when the MoveNext() method is called then the iterator returns doing no more computations, until the method is called again)
Yield builds a state machine under the covers, and this saves you allot of work by not having to code the states of your generic generator => more concise/simple code.
Yield automatically builds optimized and thread safe iterators, sparing you the details on how to build them.
Yield is much more powerful than it seems at first sight and can be used for much more than just building simple iterators, check out this video to see Jeffrey Richter and his AsyncEnumerator and how yield is used make coding using the async pattern easy.
You might want to iterate through various collections:
public IEnumerable<ICustomer> Customers()
{
foreach( ICustomer customer in m_maleCustomers )
{
yield return customer;
}
foreach( ICustomer customer in m_femaleCustomers )
{
yield return customer;
}
// or add some constraints...
foreach( ICustomer customer in m_customers )
{
if( customer.Age < 16 )
{
yield return customer;
}
}
// Or....
if( Date.Today == 1 )
{
yield return m_superCustomer;
}
}
I agree with everything everyone has said here about lazy evaluation and memory usage and wanted to add another scenario where I have found the iterators using the yield keyword useful. I have run into some cases where I have to do a sequence of potentially expensive processing on some data where it is extremely useful to use iterators. Rather than processing the entire file immediately, or rolling my own processing pipeline, I can simply use iterators something like this:
IEnumerable<double> GetListFromFile(int idxItem)
{
// read data from file
return dataReadFromFile;
}
IEnumerable<double> ConvertUnits(IEnumerable<double> items)
{
foreach(double item in items)
yield return convertUnits(item);
}
IEnumerable<double> DoExpensiveProcessing(IEnumerable<double> items)
{
foreach(double item in items)
yield return expensiveProcessing(item);
}
IEnumerable<double> GetNextList()
{
return DoExpensiveProcessing(ConvertUnits(GetListFromFile(curIdx++)));
}
The advantage here is that by keeping the input and output to all of the functions IEnumerable<double>, my processing pipeline is completely composable, easy to read, and lazy evaluated so I only have to do the processing I really need to do. This lets me put almost all of my processing in the GUI thread without impacting responsiveness so I don't have to worry about any threading issues.
I came up with this to overcome .net shortcoming having to manually deep copy List.
I use this:
static public IEnumerable<SpotPlacement> CloneList(List<SpotPlacement> spotPlacements)
{
foreach (SpotPlacement sp in spotPlacements)
{
yield return (SpotPlacement)sp.Clone();
}
}
And at another place:
public object Clone()
{
OrderItem newOrderItem = new OrderItem();
...
newOrderItem._exactPlacements.AddRange(SpotPlacement.CloneList(_exactPlacements));
...
return newOrderItem;
}
I tried to come up with oneliner that does this, but it's not possible, due to yield not working inside anonymous method blocks.
EDIT:
Better still, use generic List cloner:
class Utility<T> where T : ICloneable
{
static public IEnumerable<T> CloneList(List<T> tl)
{
foreach (T t in tl)
{
yield return (T)t.Clone();
}
}
}
The method used by yield of saving memory by processing items on-the-fly is nice, but really it's just syntactic sugar. It's been around for a long time. In any language that has function or interface pointers (even C and assembly) you can get the same effect using a callback function / interface.
This fancy stuff:
static IEnumerable<string> GetItems()
{
yield return "apple";
yield return "orange";
yield return "pear";
}
foreach(string item in GetItems())
{
Console.WriteLine(item);
}
is basically equivalent to old-fashioned:
interface ItemProcessor
{
void ProcessItem(string s);
};
class MyItemProcessor : ItemProcessor
{
public void ProcessItem(string s)
{
Console.WriteLine(s);
}
};
static void ProcessItems(ItemProcessor processor)
{
processor.ProcessItem("apple");
processor.ProcessItem("orange");
processor.ProcessItem("pear");
}
ProcessItems(new MyItemProcessor());

Is yield useful outside of LINQ?

When ever I think I can use the yield keyword, I take a step back and look at how it will impact my project. I always end up returning a collection instead of yeilding because I feel the overhead of maintaining the state of the yeilding method doesn't buy me much. In almost all cases where I am returning a collection I feel that 90% of the time, the calling method will be iterating over all elements in the collection, or will be seeking a series of elements throughout the entire collection.
I do understand its usefulness in linq, but I feel that only the linq team is writing such complex queriable objects that yield is useful.
Has anyone written anything like or not like linq where yield was useful?
Note that with yield, you are iterating over the collection once, but when you build a list, you'll be iterating over it twice.
Take, for example, a filter iterator:
IEnumerator<T> Filter(this IEnumerator<T> coll, Func<T, bool> func)
{
foreach(T t in coll)
if (func(t)) yield return t;
}
Now, you can chain this:
MyColl.Filter(x=> x.id > 100).Filter(x => x.val < 200).Filter (etc)
You method would be creating (and tossing) three lists. My method iterates over it just once.
Also, when you return a collection, you are forcing a particular implementation on you users. An iterator is more generic.
I do understand its usefulness in linq, but I feel that only the linq team is writing such complex queriable objects that yield is useful.
Yield was useful as soon as it got implemented in .NET 2.0, which was long before anyone ever thought of LINQ.
Why would I write this function:
IList<string> LoadStuff() {
var ret = new List<string>();
foreach(var x in SomeExternalResource)
ret.Add(x);
return ret;
}
When I can use yield, and save the effort and complexity of creating a temporary list for no good reason:
IEnumerable<string> LoadStuff() {
foreach(var x in SomeExternalResource)
yield return x;
}
It can also have huge performance advantages. If your code only happens to use the first 5 elements of the collection, then using yield will often avoid the effort of loading anything past that point. If you build a collection then return it, you waste a ton of time and space loading things you'll never need.
I could go on and on....
I recently had to make a representation of mathematical expressions in the form of an Expression class. When evaluating the expression I have to traverse the tree structure with a post-order treewalk. To achieve this I implemented IEnumerable<T> like this:
public IEnumerator<Expression<T>> GetEnumerator()
{
if (IsLeaf)
{
yield return this;
}
else
{
foreach (Expression<T> expr in LeftExpression)
{
yield return expr;
}
foreach (Expression<T> expr in RightExpression)
{
yield return expr;
}
yield return this;
}
}
Then I can simply use a foreach to traverse the expression. You can also add a Property to change the traversal algorithm as needed.
At a previous company, I found myself writing loops like this:
for (DateTime date = schedule.StartDate; date <= schedule.EndDate;
date = date.AddDays(1))
With a very simple iterator block, I was able to change this to:
foreach (DateTime date in schedule.DateRange)
It made the code a lot easier to read, IMO.
yield was developed for C#2 (before Linq in C#3).
We used it heavily in a large enterprise C#2 web application when dealing with data access and heavily repeated calculations.
Collections are great any time you have a few elements that you're going to hit multiple times.
However in lots of data access scenarios you have large numbers of elements that you don't necessarily need to pass round in a great big collection.
This is essentially what the SqlDataReader does - it's a forward only custom enumerator.
What yield lets you do is quickly and with minimal code write your own custom enumerators.
Everything yield does could be done in C#1 - it just took reams of code to do it.
Linq really maximises the value of the yield behaviour, but it certainly isn't the only application.
Whenever your function returns IEnumerable you should use "yielding". Not in .Net > 3.0 only.
.Net 2.0 example:
public static class FuncUtils
{
public delegate T Func<T>();
public delegate T Func<A0, T>(A0 arg0);
public delegate T Func<A0, A1, T>(A0 arg0, A1 arg1);
...
public static IEnumerable<T> Filter<T>(IEnumerable<T> e, Func<T, bool> filterFunc)
{
foreach (T el in e)
if (filterFunc(el))
yield return el;
}
public static IEnumerable<R> Map<T, R>(IEnumerable<T> e, Func<T, R> mapFunc)
{
foreach (T el in e)
yield return mapFunc(el);
}
...
I'm not sure about C#'s implementation of yield(), but on dynamic languages, it's far more efficient than creating the whole collection. on many cases, it makes it easy to work with datasets much bigger than RAM.
I am a huge Yield fan in C#. This is especially true in large homegrown frameworks where often methods or properties return List that is a sub-set of another IEnumerable. The benefits that I see are:
the return value of a method that uses yield is immutable
you are only iterating over the list once
it a late or lazy execution variable, meaning the code to return the values are not executed until needed (though this can bite you if you dont know what your doing)
of the source list changes, you dont have to call to get another IEnumerable, you just iterate over IEnumeable again
many more
One other HUGE benefit of yield is when your method potentially will return millions of values. So many that there is the potential of running out of memory just building the List before the method can even return it. With yield, the method can just create and return millions of values, and as long the caller also doesnt store every value. So its good for large scale data processing / aggregating operations
Personnally, I haven't found I'm using yield in my normal day-to-day programming. However, I've recently started playing with the Robotics Studio samples and found that yield is used extensively there, so I also see it being used in conjunction with the CCR (Concurrency and Coordination Runtime) where you have async and concurrency issues.
Anyway, still trying to get my head around it as well.
Yield is useful because it saves you space. Most optimizations in programming makes a trade off between space (disk, memory, networking) and processing. Yield as a programming construct allows you to iterate over a collection many times in sequence without needing a separate copy of the collection for each iteration.
consider this example:
static IEnumerable<Person> GetAllPeople()
{
return new List<Person>()
{
new Person() { Name = "George", Surname = "Bush", City = "Washington" },
new Person() { Name = "Abraham", Surname = "Lincoln", City = "Washington" },
new Person() { Name = "Joe", Surname = "Average", City = "New York" }
};
}
static IEnumerable<Person> GetPeopleFrom(this IEnumerable<Person> people, string where)
{
foreach (var person in people)
{
if (person.City == where) yield return person;
}
yield break;
}
static IEnumerable<Person> GetPeopleWithInitial(this IEnumerable<Person> people, string initial)
{
foreach (var person in people)
{
if (person.Name.StartsWith(initial)) yield return person;
}
yield break;
}
static void Main(string[] args)
{
var people = GetAllPeople();
foreach (var p in people.GetPeopleFrom("Washington"))
{
// do something with washingtonites
}
foreach (var p in people.GetPeopleWithInitial("G"))
{
// do something with people with initial G
}
foreach (var p in people.GetPeopleWithInitial("P").GetPeopleFrom("New York"))
{
// etc
}
}
(Obviously you are not required to use yield with extension methods, it just creates a powerful paradigm to think about data.)
As you can see, if you have a lot of these "filter" methods (but it can be any kind of method that does some work on a list of people) you can chain many of them together without requiring extra storage space for each step. This is one way of raising the programming language (C#) up to express your solutions better.
The first side-effect of yield is that it delays execution of the filtering logic until you actually require it. If you therefore create a variable of type IEnumerable<> (with yields) but never iterate through it, you never execute the logic or consume the space which is a powerful and free optimization.
The other side-effect is that yield operates on the lowest common collection interface (IEnumerable<>) which enables the creation of library-like code with wide applicability.
Note that yield allows you to do things in a "lazy" way. By lazy, I mean that the evaluation of the next element in the IEnumberable is not done until the element is actually requested. This allows you the power to do a couple of different things. One is that you could yield an infinitely long list without the need to actually make infinite calculations. Second, you could return an enumeration of function applications. The functions would only be applied when you iterate through the list.
I've used yeild in non-linq code things like this (assuming functions do not live in same class):
public IEnumerable<string> GetData()
{
foreach(String name in _someInternalDataCollection)
{
yield return name;
}
}
...
public void DoSomething()
{
foreach(String value in GetData())
{
//... Do something with value that doesn't modify _someInternalDataCollection
}
}
You have to be careful not to inadvertently modify the collection that your GetData() function is iterating over though, or it will throw an exception.
Yield is very useful in general. It's in ruby among other languages that support functional style programming, so its like it's tied to linq. It's more the other way around, that linq is functional in style, so it uses yield.
I had a problem where my program was using a lot of cpu in some background tasks. What I really wanted was to still be able to write functions like normal, so that I could easily read them (i.e. the whole threading vs. event based argument). And still be able to break the functions up if they took too much cpu. Yield is perfect for this. I wrote a blog post about this and the source is available for all to grok :)
The System.Linq IEnumerable extensions are great, but sometime you want more. For example, consider the following extension:
public static class CollectionSampling
{
public static IEnumerable<T> Sample<T>(this IEnumerable<T> coll, int max)
{
var rand = new Random();
using (var enumerator = coll.GetEnumerator());
{
while (enumerator.MoveNext())
{
yield return enumerator.Current;
int currentSample = rand.Next(max);
for (int i = 1; i <= currentSample; i++)
enumerator.MoveNext();
}
}
}
}
Another interesting advantage of yielding is that the caller cannot cast the return value to the original collection type and modify your internal collection

Categories

Resources