Approach 1:
class myClass
{
List<SomeType> _list;
IENumerator<SomeType> GetEnumerator()
{
foreach(SomeType t in _list)
yield return t;
}
}
myClass m = new myClass();
List<SomeType> list;
...
foreach(SomeType t in m)
list.Add(t);
Approach 2:
class myClass
{
public List<SomeType> _list {get; private set;}
}
myClass m = new myClass();
...
List<SomeType> list = m.list;
Which approach is better? If second then could you please show me real-proof usage of yield return?
A class with a collection-property is usually best done as a list, so that it can be iterated multiple times, and mutated. That is not possible for an enumerator, which represents just a sequence. Typical uses of enumerators might be:
filtering data on the fly (such as Enumerable.Where)
reading individual items sequentially from a file that contains multiple records, without loading them all at once
or network socket
or database server
providing a forwards-only, read-only wrapper over a sequence of data
etc
Neither.
If you want an enumerator, and already have an underlying IEnumerable type (List implements IList which extends IEnumerable), then just return it like that:
public IEnumerator<SomeType> GetEnumerator ()
{
return _list.GetEnumerator();
}
Otherwise, if you actually need a list, i.e. random access using indexes, then return an IList; and if you actually want to return its internal implementation type, i.e. List, then you can just make it an accessible property. Note though, that a private setter does not prevent modifying the list (adding or removing items etc.). If you want that, return a read-only list instead:
public IList<SomeType> List
{
get { return _list.AsReadOnly(); }
}
Regarding yield
could you please show me real-proof usage of yield return?
yield return is useful, when you actually have a generator, when you actually need to generate the next item. A simple example would be a random number generator which provides you with another random number, as long as you keep asking it. There is not necessarily an end to it, but you might not know the amount of numbers before starting.
Another common usage would be anything that retrieves the data from some external source. For example a list of items from a webservice. Before you don’t know how many items there are, and you don’t necessarily know how many items you actually want (as you might want to display it in an endless display, showing one at a time). In that case, you could do it like that:
IEnumerable<Item> GetItems()
{
while (Service.HasMorePages())
{
foreach (Item item in Service.GetNextPage())
{
yield return item;
}
}
yield break;
}
GetNextPage would always return a list of N items at a time, and you would get the next one whenever you want more items than you have already received.
The options are without bound,
public IEnumerator<int> FibonnaciSeries()
{
int a = 1;
int b = 1;
yield return 1;
yield return 1;
while (true)
{
var c = a + b;
a = b;
b = c;
yield return c;
}
}
Is one trivial example that comes to mind, is the Fibonnaci Series real world?
This is not the most efficient implementation possible.
Related
This question was posted here (https://stackoverflow.com/questions/15881110/java-to-c-sharp-conversion) by a team member but was closed due to the community not having enough information.
Here's my attempt to revive such a question being, How would I go about converting this java extract into C#?
Java Extract:
PriorityQueue<PuzzleNode> openList = new PriorityQueue<PuzzleNode>
(1,
new Comparator<PuzzleNode>(){
public int compare(PuzzleNode a, PuzzleNode b){
if (a.getPathCost() > b.getPathCost())
return 1;
else if (a.getPathCost() < b.getPathCost())
return -1;
else
return 0;
}
}
);
A sortedList has been thought about but to no avail as I'm unsure how to code it.
I've also tried creating a standard list with a method:
List<PuzzleNode> openList = new List<PuzzleNode>();
//Method to sort the list
public int CompareFCost(PuzzleNode a, PuzzleNode b)
{
if (a.getPathCost() > b.getPathCost())
{
return 1;
}
else if (a.getPathCost() > b.getPathCost())
{
return -1;
}
else
return 0;
}//end CompareFCost
and then calling: openList.Sort(CompareFCost); at appropriate locations, however this doesn't work.
What the code is used for?
It orders the objects 'PuzzleNode' depending on a score (pathCost) I have set else where in the program. A while loop then operates and pulls the first object from the list. The list needs to be ordered otherwise an object with a higher pathCost could be chosen and the while loop will run for longer. The objective is to pull the lower pathCost from the list.
I ask for a conversion because it works in Java & the rest of the code has pretty much originated from Java.
Any takers? If you need further info I'm happy to discuss it further.
I suppose you could misappropriate a SortedList something like this:
var openList=new SortedList<PuzzleNode,PuzzleNode>(
//assumes .Net4.5 for Comparer.Create
Comparer<PuzzleNode>.Create((a,b)=>{
if (a.getPathCost() > b.getPathCost())
return 1;
else if (a.getPathCost() < b.getPathCost())
return -1;
else
return 0;
}));
openList.Add(new PuzzleNode());
foreach(var x in openList.Keys)
{
//ordered enumeration
}
var firstItem = openList.Dequeue();
by creating some extension methods to make things a little more queue-like
static class SortedListExtensions
{
public static void Add<T>(this SortedList<T,T> list,T item)
{
list.Add(item,item);
}
public static T Dequeue<T>(this SortedList<T,T> list)
{
var item=list.Keys.First();
list.Remove(item);
return item;
}
//and so on...
}
TBH, I'd probably go for #valverij's answer in the comment to your original question, but if the cost of repeated sorting is prohibitive, this may be what you need.
What the code is used for? It orders the objects 'PuzzleNode'
depending on a score (pathCost) I have set else where in the program.
A while loop then operates and pulls the first object from the list.
The list needs to be ordered otherwise an object with a higher
pathCost could be chosen and the while loop will run for longer. The
objective is to pull the lower pathCost from the list.
1: There's LinQ for that. You don't usually do any of these things in C#, because LinQ does it for you.
It orders the objects 'PuzzleNode' depending on a score (pathCost)
That's Achieved with LinQ's Enumerable.OrderBy() Extension:
//Assuming PathCost is a property of a primitive type (int, double, string, etc)
var orderedlist = list.OrderBy(x => x.PathCost);
The objective is to pull the lower pathCost from the list.
That's achieved using LinQ's Enumerable.Min() or Enumerable.Max() extensions.
//Same assumption as above.
var puzzlewithlowestpath = list.Min(x => x.PathCost);
here goes my rant about java being incomplete compared to C# because it lacks something like LinQ, but I will not do any more ranting in StackOverflow by now.
Another thing I wanted to mention is that if you are coding in C#, you'd better use C# Naming Conventions, where Properties are ProperCased:
public int PathCost {get;set;}
//or double or whatever
instead of:
public int getPathCost()
public int setPathCost()
Given this code:
IEnumerable<object> FilteredList()
{
foreach( object item in FullList )
{
if( IsItemInPartialList( item ) )
yield return item;
}
}
Why should I not just code it this way?:
IEnumerable<object> FilteredList()
{
var list = new List<object>();
foreach( object item in FullList )
{
if( IsItemInPartialList( item ) )
list.Add(item);
}
return list;
}
I sort of understand what the yield keyword does. It tells the compiler to build a certain kind of thing (an iterator). But why use it? Apart from it being slightly less code, what's it do for me?
Using yield makes the collection lazy.
Let's say you just need the first five items. Your way, I have to loop through the entire list to get the first five items. With yield, I only loop through the first five items.
The benefit of iterator blocks is that they work lazily. So you can write a filtering method like this:
public static IEnumerable<T> Where<T>(this IEnumerable<T> source,
Func<T, bool> predicate)
{
foreach (var item in source)
{
if (predicate(item))
{
yield return item;
}
}
}
That will allow you to filter a stream as long as you like, never buffering more than a single item at a time. If you only need the first value from the returned sequence, for example, why would you want to copy everything into a new list?
As another example, you can easily create an infinite stream using iterator blocks. For example, here's a sequence of random numbers:
public static IEnumerable<int> RandomSequence(int minInclusive, int maxExclusive)
{
Random rng = new Random();
while (true)
{
yield return rng.Next(minInclusive, maxExclusive);
}
}
How would you store an infinite sequence in a list?
My Edulinq blog series gives a sample implementation of LINQ to Objects which makes heavy use of iterator blocks. LINQ is fundamentally lazy where it can be - and putting things in a list simply doesn't work that way.
With the "list" code, you have to process the full list before you can pass it on to the next step. The "yield" version passes the processed item immediately to the next step. If that "next step" contains a ".Take(10)" then the "yield" version will only process the first 10 items and forget about the rest. The "list" code would have processed everything.
This means that you see the most difference when you need to do a lot of processing and/or have long lists of items to process.
You can use yield to return items that aren't in a list. Here's a little sample that could iterate infinitely through a list until canceled.
public IEnumerable<int> GetNextNumber()
{
while (true)
{
for (int i = 0; i < 10; i++)
{
yield return i;
}
}
}
public bool Canceled { get; set; }
public void StartCounting()
{
foreach (var number in GetNextNumber())
{
if (this.Canceled) break;
Console.WriteLine(number);
}
}
This writes
0
1
2
3
4
5
6
7
8
9
0
1
2
3
4
...etc. to the console until canceled.
object jamesItem = null;
foreach(var item in FilteredList())
{
if (item.Name == "James")
{
jamesItem = item;
break;
}
}
return jamesItem;
When the above code is used to loop through FilteredList() and assuming item.Name == "James" will be satisfied on 2nd item in the list, the method using yield will yield twice. This is a lazy behavior.
Where as the method using list will add all the n objects to the list and pass the complete list to the calling method.
This is exactly a use case where difference between IEnumerable and IList can be highlighted.
The best real world example I've seen for the use of yield would be to calculate a Fibonacci sequence.
Consider the following code:
class Program
{
static void Main(string[] args)
{
Console.WriteLine(string.Join(", ", Fibonacci().Take(10)));
Console.WriteLine(string.Join(", ", Fibonacci().Skip(15).Take(1)));
Console.WriteLine(string.Join(", ", Fibonacci().Skip(10).Take(5)));
Console.WriteLine(string.Join(", ", Fibonacci().Skip(100).Take(1)));
Console.ReadKey();
}
private static IEnumerable<long> Fibonacci()
{
long a = 0;
long b = 1;
while (true)
{
long temp = a;
a = b;
yield return a;
b = temp + b;
}
}
}
This will return:
1, 1, 2, 3, 5, 8, 13, 21, 34, 55
987
89, 144, 233, 377, 610
1298777728820984005
This is nice because it allows you to calculate out an infinite series quickly and easily, giving you the ability to use the Linq extensions and query only what you need.
why use [yield]? Apart from it being slightly less code, what's it do for me?
Sometimes it is useful, sometimes not. If the entire set of data must be examined and returned then there is not going to be any benefit in using yield because all it did was introduce overhead.
When yield really shines is when only a partial set is returned. I think the best example is sorting. Assume you have a list of objects containing a date and a dollar amount from this year and you would like to see the first handful (5) records of the year.
In order to accomplish this, the list must be sorted ascending by date, and then have the first 5 taken. If this was done without yield, the entire list would have to be sorted, right up to making sure the last two dates were in order.
However, with yield, once the first 5 items have been established the sorting stops and the results are available. This can save a large amount of time.
The yield return statement allows you to return only one item at a time. You are collecting all the items in a list and again returning that list, which is a memory overhead.
A trivial example of an "infinite" IEnumerable would be
IEnumerable<int> Numbers() {
int i=0;
while(true) {
yield return unchecked(i++);
}
}
I know, that
foreach(int i in Numbers().Take(10)) {
Console.WriteLine(i);
}
and
var q = Numbers();
foreach(int i in q.Take(10)) {
Console.WriteLine(i);
}
both work fine (and print out the number 0-9).
But are there any pitfalls when copying or handling expressions like q? Can I rely on the fact, that they are always evaluated "lazy"? Is there any danger to produce an infinite loop?
As long as you only call lazy, un-buffered methods you should be fine. So Skip, Take, Select, etc are fine. However, Min, Count, OrderBy etc would go crazy.
It can work, but you need to be cautious. Or inject a Take(somethingFinite) as a safety measure (or some other custom extension method that throws an exception after too much data).
For example:
public static IEnumerable<T> SanityCheck<T>(this IEnumerable<T> data, int max) {
int i = 0;
foreach(T item in data) {
if(++i >= max) throw new InvalidOperationException();
yield return item;
}
}
Yes, you are guaranteed that the code above will be executed lazily. While it looks (in your code) like you'd loop forever, your code actually produces something like this:
IEnumerable<int> Numbers()
{
return new PrivateNumbersEnumerable();
}
private class PrivateNumbersEnumerable : IEnumerable<int>
{
public IEnumerator<int> GetEnumerator()
{
return new PrivateNumbersEnumerator();
}
}
private class PrivateNumbersEnumerator : IEnumerator<int>
{
private int i;
public bool MoveNext() { i++; return true; }
public int Current
{
get { return i; }
}
}
(This obviously isn't exactly what will be generated, since this is pretty specific to your code, but it's nonetheless similar and should show you why it's going to be lazily evaluated).
You would have to avoid any greedy functions that attempt to read to end. This would include Enumerable extensions like: Count, ToArray/ToList, and aggregates Avg/Min/Max, etc.
There's nothing wrong with infinite lazy lists, but you must make conscious decisions about how to handle them.
Use Take to limit the impact of an endless loop by setting an upper bound even if you don't need them all.
Yes, your code will always work without infinite looping. Someone might come along though later and mess things up. Suppose they want to do:
var q = Numbers().ToList();
Then, you're hosed! Many "aggregate" functions will kill you, like Max().
If it wasn't lazy evaluation, your first example won't work as expected in the first place.
still trying to find where i would use the "yield" keyword in a real situation.
I see this thread on the subject
What is the yield keyword used for in C#?
but in the accepted answer, they have this as an example where someone is iterating around Integers()
public IEnumerable<int> Integers()
{
yield return 1;
yield return 2;
yield return 4;
yield return 8;
yield return 16;
yield return 16777216;
}
but why not just use
list<int>
here instead. seems more straightforward..
If you build and return a List (say it has 1 million elements), that's a big chunk of memory, and also of work to create it.
Sometimes the caller may only want to know what the first element is. Or they might want to write them to a file as they get them, rather than building the whole list in memory and then writing it to a file.
That's why it makes more sense to use yield return. It doesn't look that different to building the whole list and returning it, but it's very different because the whole list doesn't have to be created in memory before the caller can look at the first item on it.
When the caller says:
foreach (int i in Integers())
{
// do something with i
}
Each time the loop requires a new i, it runs a bit more of the code in Integers(). The code in that function is "paused" when it hits a yield return statement.
Yield allows you to build methods that produce data without having to gather everything up before returning. Think of it as returning multiple values along the way.
Here's a couple of methods that illustrate the point
public IEnumerable<String> LinesFromFile(String fileName)
{
using (StreamReader reader = new StreamReader(fileName))
{
String line;
while ((line = reader.ReadLine()) != null)
yield return line;
}
}
public IEnumerable<String> LinesWithEmails(IEnumerable<String> lines)
{
foreach (String line in lines)
{
if (line.Contains("#"))
yield return line;
}
}
Neither of these two methods will read the whole contents of the file into memory, yet you can use them like this:
foreach (String lineWithEmail in LinesWithEmails(LinesFromFile("test.txt")))
Console.Out.WriteLine(lineWithEmail);
You can use yield to build any iterator. That could be a lazily evaluated series (reading lines from a file or database, for example, without reading everything at once, which could be too much to hold in memory), or could be iterating over existing data such as a List<T>.
C# in Depth has a free chapter (6) all about iterator blocks.
I also blogged very recently about using yield for smart brute-force algorithms.
For an example of the lazy file reader:
static IEnumerable<string> ReadLines(string path) {
using (StreamReader reader = File.OpenText(path)) {
string line;
while ((line = reader.ReadLine()) != null) {
yield return line;
}
}
}
This is entirely "lazy"; nothing is read until you start enumerating, and only a single line is ever held in memory.
Note that LINQ-to-Objects makes extensive use of iterator blocks (yield). For example, the Where extension is essentially:
static IEnumerable<T> Where<T>(this IEnumerable<T> data, Func<T, bool> predicate) {
foreach (T item in data) {
if (predicate(item)) yield return item;
}
}
And again, fully lazy - allowing you to chain together multiple operations without forcing everything to be loaded into memory.
yield allows you to process collections that are potentially infinite in size because the entire collection is never loaded into memory in one go, unlike a List based approach. For instance an IEnumerable<> of all the prime numbers could be backed off by the appropriate algo for finding the primes, whereas a List approach would always be finite in size and therefore incomplete. In this example, using yield also allows processing for the next element to be deferred until it is required.
A real situation for me, is when i want to process a collection that takes a while to populate more smoothly.
Imagine something along the lines (psuedo code):
public IEnumberable<VerboseUserInfo> GetAllUsers()
{
foreach(UserId in userLookupList)
{
VerboseUserInfo info = new VerboseUserInfo();
info.Load(ActiveDirectory.GetLotsOfUserData(UserId));
info.Load(WebSerice.GetSomeMoreInfo(UserId));
yield return info;
}
}
Instead of having to wait a minute for the collection to populate before i can start processing items in it. I will be able to start immediately, and then report back to the user-interface as it happens.
You may not always want to use yield instead of returning a list, and in your example you use yield to actually return a list of integers. Depending on whether you want a mutable list, or a immutable sequence, you could use a list, or an iterator (or some other collection muttable/immutable).
But there are benefits to use yield.
Yield provides an easy way to build lazy evaluated iterators. (Meaning only the code to get next element in sequence is executed when the MoveNext() method is called then the iterator returns doing no more computations, until the method is called again)
Yield builds a state machine under the covers, and this saves you allot of work by not having to code the states of your generic generator => more concise/simple code.
Yield automatically builds optimized and thread safe iterators, sparing you the details on how to build them.
Yield is much more powerful than it seems at first sight and can be used for much more than just building simple iterators, check out this video to see Jeffrey Richter and his AsyncEnumerator and how yield is used make coding using the async pattern easy.
You might want to iterate through various collections:
public IEnumerable<ICustomer> Customers()
{
foreach( ICustomer customer in m_maleCustomers )
{
yield return customer;
}
foreach( ICustomer customer in m_femaleCustomers )
{
yield return customer;
}
// or add some constraints...
foreach( ICustomer customer in m_customers )
{
if( customer.Age < 16 )
{
yield return customer;
}
}
// Or....
if( Date.Today == 1 )
{
yield return m_superCustomer;
}
}
I agree with everything everyone has said here about lazy evaluation and memory usage and wanted to add another scenario where I have found the iterators using the yield keyword useful. I have run into some cases where I have to do a sequence of potentially expensive processing on some data where it is extremely useful to use iterators. Rather than processing the entire file immediately, or rolling my own processing pipeline, I can simply use iterators something like this:
IEnumerable<double> GetListFromFile(int idxItem)
{
// read data from file
return dataReadFromFile;
}
IEnumerable<double> ConvertUnits(IEnumerable<double> items)
{
foreach(double item in items)
yield return convertUnits(item);
}
IEnumerable<double> DoExpensiveProcessing(IEnumerable<double> items)
{
foreach(double item in items)
yield return expensiveProcessing(item);
}
IEnumerable<double> GetNextList()
{
return DoExpensiveProcessing(ConvertUnits(GetListFromFile(curIdx++)));
}
The advantage here is that by keeping the input and output to all of the functions IEnumerable<double>, my processing pipeline is completely composable, easy to read, and lazy evaluated so I only have to do the processing I really need to do. This lets me put almost all of my processing in the GUI thread without impacting responsiveness so I don't have to worry about any threading issues.
I came up with this to overcome .net shortcoming having to manually deep copy List.
I use this:
static public IEnumerable<SpotPlacement> CloneList(List<SpotPlacement> spotPlacements)
{
foreach (SpotPlacement sp in spotPlacements)
{
yield return (SpotPlacement)sp.Clone();
}
}
And at another place:
public object Clone()
{
OrderItem newOrderItem = new OrderItem();
...
newOrderItem._exactPlacements.AddRange(SpotPlacement.CloneList(_exactPlacements));
...
return newOrderItem;
}
I tried to come up with oneliner that does this, but it's not possible, due to yield not working inside anonymous method blocks.
EDIT:
Better still, use generic List cloner:
class Utility<T> where T : ICloneable
{
static public IEnumerable<T> CloneList(List<T> tl)
{
foreach (T t in tl)
{
yield return (T)t.Clone();
}
}
}
The method used by yield of saving memory by processing items on-the-fly is nice, but really it's just syntactic sugar. It's been around for a long time. In any language that has function or interface pointers (even C and assembly) you can get the same effect using a callback function / interface.
This fancy stuff:
static IEnumerable<string> GetItems()
{
yield return "apple";
yield return "orange";
yield return "pear";
}
foreach(string item in GetItems())
{
Console.WriteLine(item);
}
is basically equivalent to old-fashioned:
interface ItemProcessor
{
void ProcessItem(string s);
};
class MyItemProcessor : ItemProcessor
{
public void ProcessItem(string s)
{
Console.WriteLine(s);
}
};
static void ProcessItems(ItemProcessor processor)
{
processor.ProcessItem("apple");
processor.ProcessItem("orange");
processor.ProcessItem("pear");
}
ProcessItems(new MyItemProcessor());
When ever I think I can use the yield keyword, I take a step back and look at how it will impact my project. I always end up returning a collection instead of yeilding because I feel the overhead of maintaining the state of the yeilding method doesn't buy me much. In almost all cases where I am returning a collection I feel that 90% of the time, the calling method will be iterating over all elements in the collection, or will be seeking a series of elements throughout the entire collection.
I do understand its usefulness in linq, but I feel that only the linq team is writing such complex queriable objects that yield is useful.
Has anyone written anything like or not like linq where yield was useful?
Note that with yield, you are iterating over the collection once, but when you build a list, you'll be iterating over it twice.
Take, for example, a filter iterator:
IEnumerator<T> Filter(this IEnumerator<T> coll, Func<T, bool> func)
{
foreach(T t in coll)
if (func(t)) yield return t;
}
Now, you can chain this:
MyColl.Filter(x=> x.id > 100).Filter(x => x.val < 200).Filter (etc)
You method would be creating (and tossing) three lists. My method iterates over it just once.
Also, when you return a collection, you are forcing a particular implementation on you users. An iterator is more generic.
I do understand its usefulness in linq, but I feel that only the linq team is writing such complex queriable objects that yield is useful.
Yield was useful as soon as it got implemented in .NET 2.0, which was long before anyone ever thought of LINQ.
Why would I write this function:
IList<string> LoadStuff() {
var ret = new List<string>();
foreach(var x in SomeExternalResource)
ret.Add(x);
return ret;
}
When I can use yield, and save the effort and complexity of creating a temporary list for no good reason:
IEnumerable<string> LoadStuff() {
foreach(var x in SomeExternalResource)
yield return x;
}
It can also have huge performance advantages. If your code only happens to use the first 5 elements of the collection, then using yield will often avoid the effort of loading anything past that point. If you build a collection then return it, you waste a ton of time and space loading things you'll never need.
I could go on and on....
I recently had to make a representation of mathematical expressions in the form of an Expression class. When evaluating the expression I have to traverse the tree structure with a post-order treewalk. To achieve this I implemented IEnumerable<T> like this:
public IEnumerator<Expression<T>> GetEnumerator()
{
if (IsLeaf)
{
yield return this;
}
else
{
foreach (Expression<T> expr in LeftExpression)
{
yield return expr;
}
foreach (Expression<T> expr in RightExpression)
{
yield return expr;
}
yield return this;
}
}
Then I can simply use a foreach to traverse the expression. You can also add a Property to change the traversal algorithm as needed.
At a previous company, I found myself writing loops like this:
for (DateTime date = schedule.StartDate; date <= schedule.EndDate;
date = date.AddDays(1))
With a very simple iterator block, I was able to change this to:
foreach (DateTime date in schedule.DateRange)
It made the code a lot easier to read, IMO.
yield was developed for C#2 (before Linq in C#3).
We used it heavily in a large enterprise C#2 web application when dealing with data access and heavily repeated calculations.
Collections are great any time you have a few elements that you're going to hit multiple times.
However in lots of data access scenarios you have large numbers of elements that you don't necessarily need to pass round in a great big collection.
This is essentially what the SqlDataReader does - it's a forward only custom enumerator.
What yield lets you do is quickly and with minimal code write your own custom enumerators.
Everything yield does could be done in C#1 - it just took reams of code to do it.
Linq really maximises the value of the yield behaviour, but it certainly isn't the only application.
Whenever your function returns IEnumerable you should use "yielding". Not in .Net > 3.0 only.
.Net 2.0 example:
public static class FuncUtils
{
public delegate T Func<T>();
public delegate T Func<A0, T>(A0 arg0);
public delegate T Func<A0, A1, T>(A0 arg0, A1 arg1);
...
public static IEnumerable<T> Filter<T>(IEnumerable<T> e, Func<T, bool> filterFunc)
{
foreach (T el in e)
if (filterFunc(el))
yield return el;
}
public static IEnumerable<R> Map<T, R>(IEnumerable<T> e, Func<T, R> mapFunc)
{
foreach (T el in e)
yield return mapFunc(el);
}
...
I'm not sure about C#'s implementation of yield(), but on dynamic languages, it's far more efficient than creating the whole collection. on many cases, it makes it easy to work with datasets much bigger than RAM.
I am a huge Yield fan in C#. This is especially true in large homegrown frameworks where often methods or properties return List that is a sub-set of another IEnumerable. The benefits that I see are:
the return value of a method that uses yield is immutable
you are only iterating over the list once
it a late or lazy execution variable, meaning the code to return the values are not executed until needed (though this can bite you if you dont know what your doing)
of the source list changes, you dont have to call to get another IEnumerable, you just iterate over IEnumeable again
many more
One other HUGE benefit of yield is when your method potentially will return millions of values. So many that there is the potential of running out of memory just building the List before the method can even return it. With yield, the method can just create and return millions of values, and as long the caller also doesnt store every value. So its good for large scale data processing / aggregating operations
Personnally, I haven't found I'm using yield in my normal day-to-day programming. However, I've recently started playing with the Robotics Studio samples and found that yield is used extensively there, so I also see it being used in conjunction with the CCR (Concurrency and Coordination Runtime) where you have async and concurrency issues.
Anyway, still trying to get my head around it as well.
Yield is useful because it saves you space. Most optimizations in programming makes a trade off between space (disk, memory, networking) and processing. Yield as a programming construct allows you to iterate over a collection many times in sequence without needing a separate copy of the collection for each iteration.
consider this example:
static IEnumerable<Person> GetAllPeople()
{
return new List<Person>()
{
new Person() { Name = "George", Surname = "Bush", City = "Washington" },
new Person() { Name = "Abraham", Surname = "Lincoln", City = "Washington" },
new Person() { Name = "Joe", Surname = "Average", City = "New York" }
};
}
static IEnumerable<Person> GetPeopleFrom(this IEnumerable<Person> people, string where)
{
foreach (var person in people)
{
if (person.City == where) yield return person;
}
yield break;
}
static IEnumerable<Person> GetPeopleWithInitial(this IEnumerable<Person> people, string initial)
{
foreach (var person in people)
{
if (person.Name.StartsWith(initial)) yield return person;
}
yield break;
}
static void Main(string[] args)
{
var people = GetAllPeople();
foreach (var p in people.GetPeopleFrom("Washington"))
{
// do something with washingtonites
}
foreach (var p in people.GetPeopleWithInitial("G"))
{
// do something with people with initial G
}
foreach (var p in people.GetPeopleWithInitial("P").GetPeopleFrom("New York"))
{
// etc
}
}
(Obviously you are not required to use yield with extension methods, it just creates a powerful paradigm to think about data.)
As you can see, if you have a lot of these "filter" methods (but it can be any kind of method that does some work on a list of people) you can chain many of them together without requiring extra storage space for each step. This is one way of raising the programming language (C#) up to express your solutions better.
The first side-effect of yield is that it delays execution of the filtering logic until you actually require it. If you therefore create a variable of type IEnumerable<> (with yields) but never iterate through it, you never execute the logic or consume the space which is a powerful and free optimization.
The other side-effect is that yield operates on the lowest common collection interface (IEnumerable<>) which enables the creation of library-like code with wide applicability.
Note that yield allows you to do things in a "lazy" way. By lazy, I mean that the evaluation of the next element in the IEnumberable is not done until the element is actually requested. This allows you the power to do a couple of different things. One is that you could yield an infinitely long list without the need to actually make infinite calculations. Second, you could return an enumeration of function applications. The functions would only be applied when you iterate through the list.
I've used yeild in non-linq code things like this (assuming functions do not live in same class):
public IEnumerable<string> GetData()
{
foreach(String name in _someInternalDataCollection)
{
yield return name;
}
}
...
public void DoSomething()
{
foreach(String value in GetData())
{
//... Do something with value that doesn't modify _someInternalDataCollection
}
}
You have to be careful not to inadvertently modify the collection that your GetData() function is iterating over though, or it will throw an exception.
Yield is very useful in general. It's in ruby among other languages that support functional style programming, so its like it's tied to linq. It's more the other way around, that linq is functional in style, so it uses yield.
I had a problem where my program was using a lot of cpu in some background tasks. What I really wanted was to still be able to write functions like normal, so that I could easily read them (i.e. the whole threading vs. event based argument). And still be able to break the functions up if they took too much cpu. Yield is perfect for this. I wrote a blog post about this and the source is available for all to grok :)
The System.Linq IEnumerable extensions are great, but sometime you want more. For example, consider the following extension:
public static class CollectionSampling
{
public static IEnumerable<T> Sample<T>(this IEnumerable<T> coll, int max)
{
var rand = new Random();
using (var enumerator = coll.GetEnumerator());
{
while (enumerator.MoveNext())
{
yield return enumerator.Current;
int currentSample = rand.Next(max);
for (int i = 1; i <= currentSample; i++)
enumerator.MoveNext();
}
}
}
}
Another interesting advantage of yielding is that the caller cannot cast the return value to the original collection type and modify your internal collection