Which is the best way to loop thru an list? is for loop better than numerous List class's find method? Also if i use its find methed as i mentioned below which is anonymous delegate an instance of an predicate delegate, does it better than using lambda expression? Which one will execute faster?
var result = Books.FindLast(
delegate(Book bk)
{
DateTime year2001 = new DateTime(2001,01,01);
return bk.Publish_date < year2001;
});
It's a complex question because it involves a lot of different topics.
In general, delegates are many times slower than a simple function call but to enumerate a list (via foreach) is terribly slow too.
If you really care about performance (but do not do it a-priori, profile!) you should avoid delegates and enumerations. First big step (whenever possible) could be to use a Hashtable instead of a simple list.
Examples
Now some examples, I'll write the same function in different ways, from the more readable (but slower) to the less readable (but faster). I omit every error checking but a real world function shouldn't (at least some asserts are required).
This function uses LINQ, it's the more easy to understand but the slowest.
Note the books can be a generic enumeration (it's not required to be a List<T>)
public static Book FindLastBookPublishedBefore(IEnumerable<Book> books,
DateTime date)
{
return books.FindLast(x => x.Publish_date < date);
}
Same as before but without LINQ. Note that this function
handles a special case: the list doesn't contains any eligible book.
public static Book FindLastBookPublishedBefore(IEnumerable<Book> books,
DateTime date)
{
Book candidate = null;
foreach (Book book in books)
{
if (candidate == null || candidate.Publish_date > book.Publish_date)
candidate = book;
}
return candidate;
}
Same as before but without enumeration, note that this function
handles a special case: the list doesn't contains any eligible book.
public static Book FindLastBookPublishedBefore(List<Book> books,
DateTime date)
{
Book candidate = null;
for (int i=0; i < books.Count; ++i)
{
if (candidate == null || candidate.Publish_date > books[i].Publish_date)
candidate = books[i];
}
return candidate;
}
Same as before but with a SortedList<T> as suggested by #MaratKhasanov. Please note that with this container you'll have the good performances during the search but insertion of a new element can be more slow than a normal unsorted list (because list itself must be kept sorted). If the number of elements in the list is very high you may think to write your own sorted list using a Hashtable (using, for example, the year as key for the first level).
public static Book FindLastBookPublishedBefore(SortedList<Book> books,
DateTime date)
{
Book candidate = null;
for (int i=0; i < books.Count; ++i)
{
DateTime publishDate = books[i].Publish_date;
if (publishDate > date)
return candidate;
if (candidate == null || candidate.Publish_date > publishDate)
candidate = books[i];
}
return candidate;
}
Now an example a little bit more complex but with best search performance. Algorithm is derived from an ordinary binary search (note that if you want to match the first element that matches the predicate you may use the List.BinarySearch method directly).
Note that code is untested and can be optimized too, please consider it just an example.
public static Book FindLastBookPublishedBefore(List<Book> books,
DateTime date)
{
int min = 0, max = books.Count;
Book candidate = null;
while (min < max)
{
int mid = (min + max) / 2;
Book book = books[mid];
if (book.Publish_date > date)
max = mid - 1;
else
{
candidate = book;
++min;
}
if (min >= max)
break;
}
return candidate;
}
Before moving to a more complex container you may think to keep your SortedList<T> unsorted until the first search. It'll be really slow (because it will sort the list too) but inserts will be as fast as a normal list (but you have to try with real world data). Anyway last algorithm can be optimized a lot.
Maybe if you have so many items in your collection that you can't manage them with a normal collection you may think to move everything to a database...lol
Use whatever makes your code more readable. You can simplify the code above by using lambda expressions; they are just a simplified syntax for anonymous delegates. Wherever you can use delegates you can use lambda expressions or normal methods. You would pass a normal method as argument without parentheses.
The C# compiler really just makes a hidden method for anonymous delegates and lambda expressions. You will probably not experience any difference in speed.
var result = Books.FindLast(bk => bk.Publish_date < new DateTime(2001,01,01));
Related
This question was posted here (https://stackoverflow.com/questions/15881110/java-to-c-sharp-conversion) by a team member but was closed due to the community not having enough information.
Here's my attempt to revive such a question being, How would I go about converting this java extract into C#?
Java Extract:
PriorityQueue<PuzzleNode> openList = new PriorityQueue<PuzzleNode>
(1,
new Comparator<PuzzleNode>(){
public int compare(PuzzleNode a, PuzzleNode b){
if (a.getPathCost() > b.getPathCost())
return 1;
else if (a.getPathCost() < b.getPathCost())
return -1;
else
return 0;
}
}
);
A sortedList has been thought about but to no avail as I'm unsure how to code it.
I've also tried creating a standard list with a method:
List<PuzzleNode> openList = new List<PuzzleNode>();
//Method to sort the list
public int CompareFCost(PuzzleNode a, PuzzleNode b)
{
if (a.getPathCost() > b.getPathCost())
{
return 1;
}
else if (a.getPathCost() > b.getPathCost())
{
return -1;
}
else
return 0;
}//end CompareFCost
and then calling: openList.Sort(CompareFCost); at appropriate locations, however this doesn't work.
What the code is used for?
It orders the objects 'PuzzleNode' depending on a score (pathCost) I have set else where in the program. A while loop then operates and pulls the first object from the list. The list needs to be ordered otherwise an object with a higher pathCost could be chosen and the while loop will run for longer. The objective is to pull the lower pathCost from the list.
I ask for a conversion because it works in Java & the rest of the code has pretty much originated from Java.
Any takers? If you need further info I'm happy to discuss it further.
I suppose you could misappropriate a SortedList something like this:
var openList=new SortedList<PuzzleNode,PuzzleNode>(
//assumes .Net4.5 for Comparer.Create
Comparer<PuzzleNode>.Create((a,b)=>{
if (a.getPathCost() > b.getPathCost())
return 1;
else if (a.getPathCost() < b.getPathCost())
return -1;
else
return 0;
}));
openList.Add(new PuzzleNode());
foreach(var x in openList.Keys)
{
//ordered enumeration
}
var firstItem = openList.Dequeue();
by creating some extension methods to make things a little more queue-like
static class SortedListExtensions
{
public static void Add<T>(this SortedList<T,T> list,T item)
{
list.Add(item,item);
}
public static T Dequeue<T>(this SortedList<T,T> list)
{
var item=list.Keys.First();
list.Remove(item);
return item;
}
//and so on...
}
TBH, I'd probably go for #valverij's answer in the comment to your original question, but if the cost of repeated sorting is prohibitive, this may be what you need.
What the code is used for? It orders the objects 'PuzzleNode'
depending on a score (pathCost) I have set else where in the program.
A while loop then operates and pulls the first object from the list.
The list needs to be ordered otherwise an object with a higher
pathCost could be chosen and the while loop will run for longer. The
objective is to pull the lower pathCost from the list.
1: There's LinQ for that. You don't usually do any of these things in C#, because LinQ does it for you.
It orders the objects 'PuzzleNode' depending on a score (pathCost)
That's Achieved with LinQ's Enumerable.OrderBy() Extension:
//Assuming PathCost is a property of a primitive type (int, double, string, etc)
var orderedlist = list.OrderBy(x => x.PathCost);
The objective is to pull the lower pathCost from the list.
That's achieved using LinQ's Enumerable.Min() or Enumerable.Max() extensions.
//Same assumption as above.
var puzzlewithlowestpath = list.Min(x => x.PathCost);
here goes my rant about java being incomplete compared to C# because it lacks something like LinQ, but I will not do any more ranting in StackOverflow by now.
Another thing I wanted to mention is that if you are coding in C#, you'd better use C# Naming Conventions, where Properties are ProperCased:
public int PathCost {get;set;}
//or double or whatever
instead of:
public int getPathCost()
public int setPathCost()
Lets assume you have a function that returns a lazily-enumerated object:
struct AnimalCount
{
int Chickens;
int Goats;
}
IEnumerable<AnimalCount> FarmsInEachPen()
{
....
yield new AnimalCount(x, y);
....
}
You also have two functions that consume two separate IEnumerables, for example:
ConsumeChicken(IEnumerable<int>);
ConsumeGoat(IEnumerable<int>);
How can you call ConsumeChicken and ConsumeGoat without a) converting FarmsInEachPen() ToList() beforehand because it might have two zillion records, b) no multi-threading.
Basically:
ConsumeChicken(FarmsInEachPen().Select(x => x.Chickens));
ConsumeGoats(FarmsInEachPen().Select(x => x.Goats));
But without forcing the double enumeration.
I can solve it with multithread, but it gets unnecessarily complicated with a buffer queue for each list.
So I'm looking for a way to split the AnimalCount enumerator into two int enumerators without fully evaluating AnimalCount. There is no problem running ConsumeGoat and ConsumeChicken together in lock-step.
I can feel the solution just out of my grasp but I'm not quite there. I'm thinking along the lines of a helper function that returns an IEnumerable being fed into ConsumeChicken and each time the iterator is used, it internally calls ConsumeGoat, thus executing the two functions in lock-step. Except, of course, I don't want to call ConsumeGoat more than once..
I don't think there is a way to do what you want, since ConsumeChickens(IEnumerable<int>) and ConsumeGoats(IEnumerable<int>) are being called sequentially, each of them enumerating a list separately - how do you expect that to work without two separate enumerations of the list?
Depending on the situation, a better solution is to have ConsumeChicken(int) and ConsumeGoat(int) methods (which each consume a single item), and call them in alternation. Like this:
foreach(var animal in animals)
{
ConsomeChicken(animal.Chickens);
ConsomeGoat(animal.Goats);
}
This will enumerate the animals collection only once.
Also, a note: depending on your LINQ-provider and what exactly it is you're trying to do, there may be better options. For example, if you're trying to get the total sum of both chickens and goats from a database using linq-to-sql or linq-to-entities, the following query..
from a in animals
group a by 0 into g
select new
{
TotalChickens = g.Sum(x => x.Chickens),
TotalGoats = g.Sum(x => x.Goats)
}
will result in a single query, and do the summation on the database-end, which is greatly preferable to pulling the entire table over and doing the summation on the client end.
The way you have posed your problem, there is no way to do this. IEnumerable<T> is a pull enumerable - that is, you can GetEnumerator to the front of the sequence and then repeatedly ask "Give me the next item" (MoveNext/Current). You can't, on one thread, have two different things pulling from the animals.Select(a => a.Chickens) and animals.Select(a => a.Goats) at the same time. You would have to do one then the other (which would require materializing the second).
The suggestion BlueRaja made is one way to change the problem slightly. I would suggest going that route.
The other alternative is to utilize IObservable<T> from Microsoft's reactive extensions (Rx), a push enumerable. I won't go into the details of how you would do that, but it's something you could look into.
Edit:
The above is assuming that ConsumeChickens and ConsumeGoats are both returning void or are at least not returning IEnumerable<T> themselves - which seems like an obvious assumption. I'd appreciate it if the lame downvoter would actually comment.
Actually simples way to achieve what you what is convert FarmsInEachPen return value to push collection or IObservable and use ReactiveExtensions for working with it
var observable = new Subject<Animals>()
observable.Do(x=> DoSomethingWithChicken(x. Chickens))
observable.Do(x=> DoSomethingWithGoat(x.Goats))
foreach(var item in FarmsInEachPen())
{
observable.OnNext(item)
}
I figured it out, thanks in large part due to the path that #Lee put me on.
You need to share a single enumerator between the two zips, and use an adapter function to project the correct element into the sequence.
private static IEnumerable<object> ConsumeChickens(IEnumerable<int> xList)
{
foreach (var x in xList)
{
Console.WriteLine("X: " + x);
yield return null;
}
}
private static IEnumerable<object> ConsumeGoats(IEnumerable<int> yList)
{
foreach (var y in yList)
{
Console.WriteLine("Y: " + y);
yield return null;
}
}
private static IEnumerable<int> SelectHelper(IEnumerator<AnimalCount> enumerator, int i)
{
bool c = i != 0 || enumerator.MoveNext();
while (c)
{
if (i == 0)
{
yield return enumerator.Current.Chickens;
c = enumerator.MoveNext();
}
else
{
yield return enumerator.Current.Goats;
}
}
}
private static void Main(string[] args)
{
var enumerator = GetAnimals().GetEnumerator();
var chickensList = ConsumeChickens(SelectHelper(enumerator, 0));
var goatsList = ConsumeGoats(SelectHelper(enumerator, 1));
var temp = chickensList.Zip(goatsList, (i, i1) => (object) null);
temp.ToList();
Console.WriteLine("Total iterations: " + iterations);
}
What is the most efficient way to find a sequence within a IEnumerable<T> using LINQ
I want to be able to create an extension method which allows the following call:
int startIndex = largeSequence.FindSequence(subSequence)
The match must be adjacent and in order.
Here's an implementation of an algorithm that finds a subsequence in a sequence. I called the method IndexOfSequence, because it makes the intent more explicit and is similar to the existing IndexOf method:
public static class ExtensionMethods
{
public static int IndexOfSequence<T>(this IEnumerable<T> source, IEnumerable<T> sequence)
{
return source.IndexOfSequence(sequence, EqualityComparer<T>.Default);
}
public static int IndexOfSequence<T>(this IEnumerable<T> source, IEnumerable<T> sequence, IEqualityComparer<T> comparer)
{
var seq = sequence.ToArray();
int p = 0; // current position in source sequence
int i = 0; // current position in searched sequence
var prospects = new List<int>(); // list of prospective matches
foreach (var item in source)
{
// Remove bad prospective matches
prospects.RemoveAll(k => !comparer.Equals(item, seq[p - k]));
// Is it the start of a prospective match ?
if (comparer.Equals(item, seq[0]))
{
prospects.Add(p);
}
// Does current character continues partial match ?
if (comparer.Equals(item, seq[i]))
{
i++;
// Do we have a complete match ?
if (i == seq.Length)
{
// Bingo !
return p - seq.Length + 1;
}
}
else // Mismatch
{
// Do we have prospective matches to fall back to ?
if (prospects.Count > 0)
{
// Yes, use the first one
int k = prospects[0];
i = p - k + 1;
}
else
{
// No, start from beginning of searched sequence
i = 0;
}
}
p++;
}
// No match
return -1;
}
}
I didn't fully test it, so it might still contain bugs. I just did a few tests on well-known corner cases to make sure I wasn't falling into obvious traps. Seems to work fine so far...
I think the complexity is close to O(n), but I'm not an expert of Big O notation so I could be wrong... at least it only enumerates the source sequence once, whithout ever going back, so it should be reasonably efficient.
The code you say you want to be able to use isn't LINQ, so I don't see why it need be implemented with LINQ.
This is essentially the same problem as substring searching (indeed, an enumeration where order is significant is a generalisation of "string").
Since computer science has considered this problem frequently for a long time, so you get to stand on the shoulders of giants.
Some reasonable starting points are:
http://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm
http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm
http://en.wikipedia.org/wiki/Rabin-karp
Even just the pseudocode in the wikipedia articles is enough to port to C# quite easily. Look at the descriptions of performance in different cases and decide which cases are most likely to be encountered by your code.
I understand this is an old question, but I needed this exact method and I wrote it up like so:
public static int ContainsSubsequence<T>(this IEnumerable<T> elements, IEnumerable<T> subSequence) where T: IEquatable<T>
{
return ContainsSubsequence(elements, 0, subSequence);
}
private static int ContainsSubsequence<T>(IEnumerable<T> elements, int index, IEnumerable<T> subSequence) where T: IEquatable<T>
{
// Do we have any elements left?
bool elementsLeft = elements.Any();
// Do we have any of the sub-sequence left?
bool sequenceLeft = subSequence.Any();
// No elements but sub-sequence not fully matched
if (!elementsLeft && sequenceLeft)
return -1; // Nope, didn't match
// No elements of sub-sequence, which means even if there are
// more elements, we matched the sub-sequence fully
if (!sequenceLeft)
return index - subSequence.Count(); // Matched!
// If we didn't reach a terminal condition,
// check the first element of the sub-sequence against the first element
if (subSequence.First().Equals(e.First()))
// Yes, it matched - move onto the next. Consume (skip) one element in each
return ContainsSubsequence(elements.Skip(1), index + 1 subSequence.Skip(1));
else
// No, it didn't match. Try the next element, without consuming an element
// from the sub-sequence
return ContainsSubsequence(elements.Skip(1), index + 1, subSequence);
}
Updated to not just return if the sub-sequence matched, but where it started in the original sequence.
This is an extension method on IEnumerable, fully lazy, terminates early and is far more linq-ified than the currently up-voted answer. Bewarned, however (as #wai-ha-lee points out) it is recursive and creates a lot of enumerators. Use it where applicable (performance/memory). This was fine for my needs, but YMMV.
You can use this library called Sequences to do that (disclaimer: I'm the author).
It has a IndexOfSlice method that does exactly what you need - it's an implementation of the Knuth-Morris-Pratt algorithm.
int startIndex = largeSequence.AsSequence().IndexOfSlice(subSequence);
UPDATE:
Given the clarification of the question my response below isn't as applicable. Leaving it for historical purposes.
You probably want to use mySequence.Where(). Then the key is to optimize the predicate to work well in your environment. This can vary quite a bit depending on your requirements and typical usage patterns.
It is quite possible that what works well for small collections doesn't scale well for much larger collections depending on what type T is.
Of course, if the 90% use is for small collections then optimizing for the outlier large collection seems a bit YAGNI.
I have a large collection of strings (up to 1M) alphabetically sorted. I have experimented with LINQ queries against this collection using HashSet, SortedDictionary, and Dictionary. I am static caching the collection, it's up to 50MB in size, and I'm always calling the LINQ query against the cached collection. My problem is as follows:
Regardless of collection type, performance is much poorer than SQL (up to 200ms). When doing a similar query against the underlying SQL tables, performance is much quicker ( 5-10ms). I have implemented my LINQ queries as follows:
public static string ReturnSomething(string query, int limit)
{
StringBuilder sb = new StringBuilder();
foreach (var stringitem in MyCollection.Where(
x => x.StartsWith(query) && x.Length > q.Length).Take(limit))
{
sb.Append(stringitem);
}
return sb.ToString();
}
It is my understanding that the HashSet, Dictionary, etc. implement lookups using binary tree search instead of the standard enumeration. What are my options for high performance LINQ queries into the advanced collection types?
In your current code you don't make use of any of the special features of the Dictionary / SortedDictionary / HashSet collections, you are using them the same way that you would use a List. That is why you don't see any difference in performance.
If you use a dictionary as index where the first few characters of the string is the key and a list of strings is the value, you can from the search string pick out a small part of the entire collection of strings that has possible matches.
I wrote the class below to test this. If I populate it with a million strings and search with an eight character string it rips through all possible matches in about 3 ms. Searching with a one character string is the worst case, but it finds the first 1000 matches in about 4 ms. Finding all matches for a one character strings takes about 25 ms.
The class creates indexes for 1, 2, 4 and 8 character keys. If you look at your specific data and what you search for, you should be able to select what indexes to create to optimise it for your conditions.
public class IndexedList {
private class Index : Dictionary<string, List<string>> {
private int _indexLength;
public Index(int indexLength) {
_indexLength = indexLength;
}
public void Add(string value) {
if (value.Length >= _indexLength) {
string key = value.Substring(0, _indexLength);
List<string> list;
if (!this.TryGetValue(key, out list)) {
Add(key, list = new List<string>());
}
list.Add(value);
}
}
public IEnumerable<string> Find(string query, int limit) {
return
this[query.Substring(0, _indexLength)]
.Where(s => s.Length > query.Length && s.StartsWith(query))
.Take(limit);
}
}
private Index _index1;
private Index _index2;
private Index _index4;
private Index _index8;
public IndexedList(IEnumerable<string> values) {
_index1 = new Index(1);
_index2 = new Index(2);
_index4 = new Index(4);
_index8 = new Index(8);
foreach (string value in values) {
_index1.Add(value);
_index2.Add(value);
_index4.Add(value);
_index8.Add(value);
}
}
public IEnumerable<string> Find(string query, int limit) {
if (query.Length >= 8) return _index8.Find(query, limit);
if (query.Length >= 4) return _index4.Find(query,limit);
if (query.Length >= 2) return _index2.Find(query,limit);
return _index1.Find(query, limit);
}
}
I bet you have an index on the column so SQL server can do the comparison in O(log(n)) operations rather than O(n). To imitate the SQL server behavior, use a sorted collection and find all strings s such that s >= query and then look at values until you find a value that does not start with s and then do an additional filter on the values. This is what is called a range scan (Oracle) or an index seek (SQL server).
This is some example code which is very likely to go into infinite loops or have one-off errors because I didn't test it, but you should get the idea.
// Note, list must be sorted before being passed to this function
IEnumerable<string> FindStringsThatStartWith(List<string> list, string query) {
int low = 0, high = list.Count - 1;
while (high > low) {
int mid = (low + high) / 2;
if (list[mid] < query)
low = mid + 1;
else
high = mid - 1;
}
while (low < list.Count && list[low].StartsWith(query) && list[low].Length > query.Length)
yield return list[low];
low++;
}
}
If you're doing a "starts with", you only care about ordinal comparisons, and you can have the collection sorted (again in ordinal order) then I would suggest you have the values in a list. You can then binary search to find the first value which starts with the right prefix, then go down the list linearly yielding results until the first value which doesn't start with the right prefix.
In fact, you could probably do another binary search for the first value which doesn't start with the prefix, so you'd have a start and an end point. Then you just need to apply the length criterion to that matching portion. (I'd hope that if it's sensible data, the prefix matching is going to get rid of most candidate values.) The way to find the first value which doesn't start with the prefix is to search for the lexicographically-first value which doesn't - e.g. with a prefix of "ABC", search for "ABD".
None of this uses LINQ, and it's all very specific to your particular case, but it should work. Let me know if any of this doesn't make sense.
If you are trying to optimize looking up a list of strings with a given prefix you might want to take a look at implementing a Trie (not to be mistaken with a regular tree) data structure in C#.
Tries offer very fast prefix lookups and have a very small memory overhead compared to other data structures for this sort of operation.
About LINQ to Objects in general. It's not unusual to have a speed reduction compared to SQL. The net is littered with articles analyzing its performance.
Just looking at your code, I would say that you should reorder the comparison to take advantage of short-circuiting when using boolean operators:
foreach (var stringitem in MyCollection.Where(
x => x.Length > q.Length && x.StartsWith(query)).Take(limit))
The comparison of length is always going to be an O(1) operation (as the length is being stored as part of the string, it doesn't count each character every time), whereas the call to StartsWith is going to be an O(N) operation, where N is the length of query (or the length of the string, whichever is smaller).
By placing the comparison of length before the call to StartsWith, if that comparison fails, you save yourself some extra cycles which could add up when processing large numbers of items.
I don't think that a lookup table is going to help you here, as lookup tables are good when you are comparing the entire key, not parts of the key, like you are doing with the call to StartsWith.
Rather, you might be better off using a tree structure which is split based on the letters in the words in the list.
However, at that point, you are really just recreating what SQL Server is doing (in the case of indexes) and that would just be a duplication of effort on your part.
I think the problem is that Linq has no way to use the fact that your sequence is already sorted. Especially it cannot know, that applying the StartsWith function retains the order.
I would suggest to use the List.BinarySearch method together with a IComparer<string> that does only comparison of the first query chars (this might be tricky, since it's not clear, if the query string will always be the first or the second parameter to ()).
You could even use the standard string comparison, since BinarySearch returns a negative number which you can complement (using ~) in order to get the index of the first element that is larger than your query.
You have then to start from the returned index (in both directions!) to find all elements matching your query string.
When ever I think I can use the yield keyword, I take a step back and look at how it will impact my project. I always end up returning a collection instead of yeilding because I feel the overhead of maintaining the state of the yeilding method doesn't buy me much. In almost all cases where I am returning a collection I feel that 90% of the time, the calling method will be iterating over all elements in the collection, or will be seeking a series of elements throughout the entire collection.
I do understand its usefulness in linq, but I feel that only the linq team is writing such complex queriable objects that yield is useful.
Has anyone written anything like or not like linq where yield was useful?
Note that with yield, you are iterating over the collection once, but when you build a list, you'll be iterating over it twice.
Take, for example, a filter iterator:
IEnumerator<T> Filter(this IEnumerator<T> coll, Func<T, bool> func)
{
foreach(T t in coll)
if (func(t)) yield return t;
}
Now, you can chain this:
MyColl.Filter(x=> x.id > 100).Filter(x => x.val < 200).Filter (etc)
You method would be creating (and tossing) three lists. My method iterates over it just once.
Also, when you return a collection, you are forcing a particular implementation on you users. An iterator is more generic.
I do understand its usefulness in linq, but I feel that only the linq team is writing such complex queriable objects that yield is useful.
Yield was useful as soon as it got implemented in .NET 2.0, which was long before anyone ever thought of LINQ.
Why would I write this function:
IList<string> LoadStuff() {
var ret = new List<string>();
foreach(var x in SomeExternalResource)
ret.Add(x);
return ret;
}
When I can use yield, and save the effort and complexity of creating a temporary list for no good reason:
IEnumerable<string> LoadStuff() {
foreach(var x in SomeExternalResource)
yield return x;
}
It can also have huge performance advantages. If your code only happens to use the first 5 elements of the collection, then using yield will often avoid the effort of loading anything past that point. If you build a collection then return it, you waste a ton of time and space loading things you'll never need.
I could go on and on....
I recently had to make a representation of mathematical expressions in the form of an Expression class. When evaluating the expression I have to traverse the tree structure with a post-order treewalk. To achieve this I implemented IEnumerable<T> like this:
public IEnumerator<Expression<T>> GetEnumerator()
{
if (IsLeaf)
{
yield return this;
}
else
{
foreach (Expression<T> expr in LeftExpression)
{
yield return expr;
}
foreach (Expression<T> expr in RightExpression)
{
yield return expr;
}
yield return this;
}
}
Then I can simply use a foreach to traverse the expression. You can also add a Property to change the traversal algorithm as needed.
At a previous company, I found myself writing loops like this:
for (DateTime date = schedule.StartDate; date <= schedule.EndDate;
date = date.AddDays(1))
With a very simple iterator block, I was able to change this to:
foreach (DateTime date in schedule.DateRange)
It made the code a lot easier to read, IMO.
yield was developed for C#2 (before Linq in C#3).
We used it heavily in a large enterprise C#2 web application when dealing with data access and heavily repeated calculations.
Collections are great any time you have a few elements that you're going to hit multiple times.
However in lots of data access scenarios you have large numbers of elements that you don't necessarily need to pass round in a great big collection.
This is essentially what the SqlDataReader does - it's a forward only custom enumerator.
What yield lets you do is quickly and with minimal code write your own custom enumerators.
Everything yield does could be done in C#1 - it just took reams of code to do it.
Linq really maximises the value of the yield behaviour, but it certainly isn't the only application.
Whenever your function returns IEnumerable you should use "yielding". Not in .Net > 3.0 only.
.Net 2.0 example:
public static class FuncUtils
{
public delegate T Func<T>();
public delegate T Func<A0, T>(A0 arg0);
public delegate T Func<A0, A1, T>(A0 arg0, A1 arg1);
...
public static IEnumerable<T> Filter<T>(IEnumerable<T> e, Func<T, bool> filterFunc)
{
foreach (T el in e)
if (filterFunc(el))
yield return el;
}
public static IEnumerable<R> Map<T, R>(IEnumerable<T> e, Func<T, R> mapFunc)
{
foreach (T el in e)
yield return mapFunc(el);
}
...
I'm not sure about C#'s implementation of yield(), but on dynamic languages, it's far more efficient than creating the whole collection. on many cases, it makes it easy to work with datasets much bigger than RAM.
I am a huge Yield fan in C#. This is especially true in large homegrown frameworks where often methods or properties return List that is a sub-set of another IEnumerable. The benefits that I see are:
the return value of a method that uses yield is immutable
you are only iterating over the list once
it a late or lazy execution variable, meaning the code to return the values are not executed until needed (though this can bite you if you dont know what your doing)
of the source list changes, you dont have to call to get another IEnumerable, you just iterate over IEnumeable again
many more
One other HUGE benefit of yield is when your method potentially will return millions of values. So many that there is the potential of running out of memory just building the List before the method can even return it. With yield, the method can just create and return millions of values, and as long the caller also doesnt store every value. So its good for large scale data processing / aggregating operations
Personnally, I haven't found I'm using yield in my normal day-to-day programming. However, I've recently started playing with the Robotics Studio samples and found that yield is used extensively there, so I also see it being used in conjunction with the CCR (Concurrency and Coordination Runtime) where you have async and concurrency issues.
Anyway, still trying to get my head around it as well.
Yield is useful because it saves you space. Most optimizations in programming makes a trade off between space (disk, memory, networking) and processing. Yield as a programming construct allows you to iterate over a collection many times in sequence without needing a separate copy of the collection for each iteration.
consider this example:
static IEnumerable<Person> GetAllPeople()
{
return new List<Person>()
{
new Person() { Name = "George", Surname = "Bush", City = "Washington" },
new Person() { Name = "Abraham", Surname = "Lincoln", City = "Washington" },
new Person() { Name = "Joe", Surname = "Average", City = "New York" }
};
}
static IEnumerable<Person> GetPeopleFrom(this IEnumerable<Person> people, string where)
{
foreach (var person in people)
{
if (person.City == where) yield return person;
}
yield break;
}
static IEnumerable<Person> GetPeopleWithInitial(this IEnumerable<Person> people, string initial)
{
foreach (var person in people)
{
if (person.Name.StartsWith(initial)) yield return person;
}
yield break;
}
static void Main(string[] args)
{
var people = GetAllPeople();
foreach (var p in people.GetPeopleFrom("Washington"))
{
// do something with washingtonites
}
foreach (var p in people.GetPeopleWithInitial("G"))
{
// do something with people with initial G
}
foreach (var p in people.GetPeopleWithInitial("P").GetPeopleFrom("New York"))
{
// etc
}
}
(Obviously you are not required to use yield with extension methods, it just creates a powerful paradigm to think about data.)
As you can see, if you have a lot of these "filter" methods (but it can be any kind of method that does some work on a list of people) you can chain many of them together without requiring extra storage space for each step. This is one way of raising the programming language (C#) up to express your solutions better.
The first side-effect of yield is that it delays execution of the filtering logic until you actually require it. If you therefore create a variable of type IEnumerable<> (with yields) but never iterate through it, you never execute the logic or consume the space which is a powerful and free optimization.
The other side-effect is that yield operates on the lowest common collection interface (IEnumerable<>) which enables the creation of library-like code with wide applicability.
Note that yield allows you to do things in a "lazy" way. By lazy, I mean that the evaluation of the next element in the IEnumberable is not done until the element is actually requested. This allows you the power to do a couple of different things. One is that you could yield an infinitely long list without the need to actually make infinite calculations. Second, you could return an enumeration of function applications. The functions would only be applied when you iterate through the list.
I've used yeild in non-linq code things like this (assuming functions do not live in same class):
public IEnumerable<string> GetData()
{
foreach(String name in _someInternalDataCollection)
{
yield return name;
}
}
...
public void DoSomething()
{
foreach(String value in GetData())
{
//... Do something with value that doesn't modify _someInternalDataCollection
}
}
You have to be careful not to inadvertently modify the collection that your GetData() function is iterating over though, or it will throw an exception.
Yield is very useful in general. It's in ruby among other languages that support functional style programming, so its like it's tied to linq. It's more the other way around, that linq is functional in style, so it uses yield.
I had a problem where my program was using a lot of cpu in some background tasks. What I really wanted was to still be able to write functions like normal, so that I could easily read them (i.e. the whole threading vs. event based argument). And still be able to break the functions up if they took too much cpu. Yield is perfect for this. I wrote a blog post about this and the source is available for all to grok :)
The System.Linq IEnumerable extensions are great, but sometime you want more. For example, consider the following extension:
public static class CollectionSampling
{
public static IEnumerable<T> Sample<T>(this IEnumerable<T> coll, int max)
{
var rand = new Random();
using (var enumerator = coll.GetEnumerator());
{
while (enumerator.MoveNext())
{
yield return enumerator.Current;
int currentSample = rand.Next(max);
for (int i = 1; i <= currentSample; i++)
enumerator.MoveNext();
}
}
}
}
Another interesting advantage of yielding is that the caller cannot cast the return value to the original collection type and modify your internal collection