I'm curious when execution occurs, especially when updating data and calling a second time. Is it whenever the query variable is being used, such as in the foreach statement? Or, is it when I update the list, such as nums[1] = 99?
int[] nums = { 1, -2, 3, 0, -4, 5 };
var posNums = from n in nums
where n > 0
select n;
foreach (int i in posNums)
Console.Write("" + i + " ");
Console.WriteLine();
nums[1] = 99;
foreach (int i in posNums)
Console.Write("" + i + " ");
Console.WriteLine();
Linq defers analysis until the sequence is iterated over, either by a Foreach statement or getting the iterator. Note that under the hood, .ToArray() and .ToList calls perform such an iteration. You can see this by using the method-call version and pressing F9 to breakpoint the passed in lambda.
var posNums = nums
.Where(n => n > 0);
Note that because Linq functions create Enumerators, they will also re-evaluate all your query functions each time you iterate the sequence, so it is often advantageous to copy the collection to memory using .ToArray() if you want to perform multiple (or nested!) iterations over the results of the query. If you want to perform multiple iterations over the changing data source then you wan to reuse the same Linq result.
If you're curious, you can also view the source code the .NET framework uses for various Linq statements at the Reference Source
The posNums query will be executed each time you iterate over the result in the foreach's.
A simple way to see the this in action is to introduce a side-effect in the query. The compiler converts your query expression to:
var posNums = nums.Where(n => n > 0);
We can modify your code with a bit more console output and see exactly where things are getting executed:
int[] nums = { 1, -2, 3, 0, -4, 5 };
Console.WriteLine("Before query creation");
var posNums = nums.Where(n => { Console.WriteLine(" Evaluating " + n); return n > 0; });
Console.WriteLine("Before foreach 1");
foreach (int i in posNums)
Console.WriteLine(" Writing " + i);
Console.WriteLine("Before array modification");
nums[1] = 99;
Console.WriteLine("Before foreach 2");
foreach (int i in posNums)
Console.WriteLine(" Writing " + i);
Output is:
Before query creation
Before foreach 1
Evaluating 1
Writing 1
Evaluating -2
Evaluating 3
Writing 3
Evaluating 0
Evaluating -4
Evaluating 5
Writing 5
Before array modification
Before foreach 2
Evaluating 1
Writing 1
Evaluating 99
Writing 99
Evaluating 3
Writing 3
Evaluating 0
Evaluating -4
Evaluating 5
Writing 5
The easiest way of seeing exactly what's going on is to actually build something equivalent to what Where would return and step through it. Here is an implementation that is functionally equivalent to Where (at least in as much as where the source sequence is iterated and the result).
I've omitted some of the performance optimizations to keep attention on what's important, and written out some operations "the long way" for clarity:
public static IEnumerable<T> Where<T>(
this IEnumerable<T> source,
Func<T, bool> predicate)
{
return new WhereEnumerable<T>(source, predicate);
}
public class WhereEnumerable<T> : IEnumerable<T>
{
private IEnumerable<T> source;
private Func<T, bool> predicate;
public WhereEnumerable(IEnumerable<T> source, Func<T, bool> predicate)
{
this.source = source;
this.predicate = predicate;
}
public IEnumerator<T> GetEnumerator()
{
return new WhereEnumerator<T>(source.GetEnumerator(), predicate);
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
public class WhereEnumerator<T> : IEnumerator<T>
{
private IEnumerator<T> source;
private Func<T, bool> predicate;
public WhereEnumerator(IEnumerator<T> source, Func<T, bool> predicate)
{
this.source = source;
this.predicate = predicate;
}
public T Current { get; private set; }
public void Dispose()
{
source.Dispose();
}
object IEnumerator.Current
{
get { return Current; }
}
public bool MoveNext()
{
while (source.MoveNext())
if (predicate(source.Current))
{
Current = source.Current;
return true;
}
return false;
}
public void Reset()
{
throw new NotImplementedException();
}
}
It's also worth having, for reference, what a foreach loop is equivalent to:
foreach (int i in posNums)
Console.Write("" + i + " ");
is equivalent to:
using(IEnumerator<int> iterator = posNums.GetEnumerator())
while(iterator.MoveNext())
{
int i = iterator.Current;
Console.Write("" + i + " ");
}
So now you can walk through and see when the sequence's values are actually pulled. (I'd encourage you to walk through this code with a debugger, using this Where in place of LINQ's Where in your own code, to see what's going on here.)
Calling Where on a sequences doesn't affect the sequence at all, and the sequence changing doesn't affect the result of Where at all. It is when MoveNext is actually called that the enumerable begins to pull the values from the underlying enumerable, and MoveNext is called when you have a foreach loop (among other possibilities).
Something else that we can see here is that each time we call foreach we call GetEnumerator again, which gets a brand new enumerator form Where, which gets a brand new enumerator from the underlying source sequence. This means that each time you call foreach you're iterating the underlying sequence again, from the start.
Related
This is my first time asking a question on the site so I might not be doing this right. I need to find and return the highest values within my IEnumerable list but cannot show any duplicates. I also cannot use any copies or temporary collections, or use distinct. It must be done by first by taking the list and doing OrderByDescending.
This is what I have so far:
public static IEnumerable<T> PullMax<T> (IEnumerable<T> sourcelist) where T : IComparable
{
IEnumerable<T> list = sourcelist.OrderByDescending(item => item);
foreach (T item in list)
{
yield return item;
}
}
This is the result I get from this code:
foreach (int item in Util.PullMax(Util.GenRange(iMin, iMax).Concat(Util.GenRange(iMin + 5, iMax + 5))))
{
Console.Write($"\nMaximum Value: {item}");
}
Maximum Value: 15
Maximum Value: 14
Maximum Value: 13
Maximum Value: 12
Maximum Value: 11
Maximum Value: 10
Maximum Value: 10
Maximum Value: 9
Maximum Value: 9
Maximum Value: 8
Maximum Value: 8
Maximum Value: 7
Maximum Value: 6
Maximum Value: 5
Maximum Value: 4
Maximum Value: 3
This should be the expected results:
Maximum Value: 15
Maximum Value: 14
Maximum Value: 13
Maximum Value: 12
Maximum Value: 11
Maximum Value: 10
Maximum Value: 9
Maximum Value: 8
Maximum Value: 7
Maximum Value: 6
Maximum Value: 5
Maximum Value: 4
Maximum Value: 3
I was asked to show Util.GenRange()
public static IEnumerable<int> GenRange(int iMin, int iMax)
{
while (true)
{
for (int i = iMin; i < iMax; i++)
{
yield return i;
}
yield break;
}
}
The result I am getting has duplicates and I cannot have them. How can I get rid of them without any copies or temporary collections or even using distinct?
An approach that avoids the multiple enumeration problem with the other solution (i.e. the other solution iterates more than once over the enumerable):
public static IEnumerable<T> PullMax<T> (IEnumerable<T> sourcelist) where T : IComparable
{
IEnumerable<T> list = sourcelist.OrderByDescending(item => item);
T previousValue = default(T);
bool firstIteration = true;
foreach (T item in list)
{
if (firstIteration)
firstIteration = false;
else if (item.CompareTo(previousValue) == 0)
continue;
previousValue = item;
yield return item;
}
}
Basically it always returns the first element - then for elements other than the first it returns them only if they don't match the previous element. This will be dramatically faster for some classes of data (e.g. where sourcelist is an IQueryable) since it avoids the unnecessary counting etc.
Lots and lots of different options for writing this algorithm yourself.
However it is always better to reuse code where possible, only problem is you haven't found a LINQ method that does a Distinct Sorting in only one pass.
But there is SortedSet<T>, which is basically a sorted distinct list, and it claims O(n log n) for insertion, and can take an IEnumerable. It does not throw on duplicates, it just ignores them:
static IEnumerable<T> PullMax<T>(this IEnumerable<T> source, IComparer<T> comparer)
{
return new SortedSet<T>(source, comparer);
}
To get it to go in descending order, flip the compare result in your lambda by using a -:
list.PullMax((a, b) => -a.CompareTo(b))
You wrote:
I need to find and return the highest values within my IEnumerable list but cannot show any duplicates.
From your code, I gather that you don't want only the highest value of your input sequence, but that you want the values ordered in descending order, without any duplicates.
To make usage LINQ-like, I'll create it as an extension method. If you are not familiar with extension methods, read extension methods demystified
Extension method is merely syntactic sugar. It makes it look for the reader as if the method that you call is a method of the class.
Instead of calling a method in a static class:
var customers = FetchCustomers();
var maxCustomers = ExtraCustomerFunctions.PullMax(customers);
You can write:
var customers = FetchCustomers();
var maxCustomers = customers.PullMax();
The only thing you have to do for this, is to create a static method in a static class and use the word this before the first parameter.
Back to your question
Requirement
From the input sequence of objects of class T, which implements IComparable<T>, return a sequence with as first element the largest item, then the largest but one, etc. Discard Duplicate values.
public class MyExtensionMethods
{
public static IEnumerable<T> PullMax<T> (this IEnumerable<T> source)
where T : IComparable<T>
{
IEnumerable<T> orderedSource = source.OrderByDescending(item => item);
using (IEnumerator<T> enumerator = orderedSource.GetEnumerator())
{
if (enumerator.MoveNext())
{
// The sequence is not empty.
// return the first item, and remember that you returned this value
T lastReturnedItem = enumerator.Current;
yield return lastReturnedItem;
// enumerate the rest, skip the items with a value that you just returned
while (enumerator.MoveNext())
{
// Current is either as large as the last returned one, or it is smaller
if (enumerator.Curren.Compare(lastReturnedItem) < 0)
{
// Current is smaller; return it, and remember that you returned it
lastReturnedItem = enumerator.Current;
yield return lastReturnedItem;
}
}
}
}
}
}
This seems a lot of code, but most of it is comment.
Usage:
int[] numbers = FetchNumbers();
var pulledMaxNumbers = numbers.PullMax();
There is room for improvement!
If you intend to reuse this method, consider to make it more generic, more LINQ-like. This way it will be easy to use the method for items that are not comparable.
Customers are not comparable, but you can compare them on BirthDay, or PostCode. With only a little change it is possible to PullMax customers:
// PullMax customers, oldest Customers first:
IEnumerable<Customer> customers = this.FetchCustomers();
var result = customers.PullMax(customer => customer.BirthDay);
To do this, make a method with a property selector and a generic IComparer.
Something like this:
static IEnumerable<T> PullMax<T>(this IEnumerable<T> source)
{
return source.PullMax(item => item);
}
static IEnumerable<T> PullMax<T, TKey>(this IEnumerable<T> source,
Func<T, TKey> propertySelector)
{
return source.PullMax(propertySelector, null);
}
static IEnumerable<T> PullMax<T, TKey>(this IEnumerable<T> source,
Func<T, TKey> keySelector,
IComparer<TKey> comparer)
{
// TODO: check source and keySelector not null
if (comparer == null) comparer = Comparer<TKey>.Default;
// rest of code is similar to code above, difference: check on keySelector
IEnumerable<T> orderedSource = source.OrderByDescending(keySelector);
using (IEnumerator<T> enumerator = orderedSource.GetEnumerator())
{
if (enumerator.MoveNext())
{
T currentValue = enumerator.Current;
yield return currentValue
TKey lastKeyValue = keySelector(currentValue);
while (enumerator.MoveNext())
{
currentValue = enumerator.Current;
TKey keyValue = keySelector(currentValue);
if (comparer.Compare(keyValue, lastKeyValue) < 0)
{
yield return currentValue;
lastKeyValue = keyValue;
}
etc.
You can even put your PullMax inside a LINQ concatenation:
var result = Customers.Where(customer => customer.City == "New York")
.PullMax(customer => customer.PostCode)
.Select(customer => new
{
Id = customer.Id,
Name = customer.Name,
});
Well completely without any temporary variable or distinct there is no chance. BUT the cheapest way to solve is having ONE temporary variable for holding the LAST returned value and check for that value if its the same.
maybe this method should do the trick
public static IEnumerable<T> PullMax<T>(IEnumerable<T> sourcelist) where T : IComparable
{
IEnumerable<T> list = sourcelist.OrderByDescending(item => item);
if (sourcelist.LongCount() > 0) yield return list.ElementAt(0);
T tmp = sourcelist.ElementAt(0);
foreach (T item in list)
{
if (item.CompareTo(tmp) < 1) continue;
tmp = item;
yield return item;
}
}
Is it possible to write a higher-order function that causes an IEnumerable to be consumed multiple times but in only one pass and without reading all the data into memory? [See Edit below for a clarification of what I'm looking for.]
For example, in the code below the enumerable is mynums (onto which I've tagged a .Trace() in order to see how many times we enumerate it). The goal is figure out if it has any numbers greater than 5, as well as the sum of all of the numbers. A function which processes an enumerable twice is Both_TwoPass, but it enumerates it twice. In contrast Both_NonStream only enumerates it once, but at the expense of reading it into memory. In principle it is possible carry out both of these tasks in a single pass and in a streaming fashion as shown by Any5Sum, but that is specific solution. Is it possible to write a function with the same signature as Both_* but that is the best of both worlds?
(It seems to me that this should be possible using threads. Is there a better solution using, say, async?)
Edit
Below is a clarification regarding what I'm looking for. What I've done is included a very down-to-earth description of each property in square brackets.
I'm looking for a function Both with the following characteristics:
It has signature (S1, S2) Both<T, S1, S2>(this IEnumerable<T> tt, Func<IEnumerable<T>, S1>, Func<IEnumerable<T>, S2>) (and produces the "right" output!)
It only iterates the first argument, tt, once. [What I mean by this is that when passed mynums (as defined below) it only outputs mynums: 0 1 2 ... once. This precludes function Both_TwoPass.]
It processes the data from the first argument, tt, in a streaming fashion. [What I mean by this is that, for example, there is insufficient memory to store all the items from tt in memory simultaneously, thus precluding function Both_NonStream.]
using System;
using System.Collections.Generic;
using System.Linq;
namespace ConsoleApp
{
static class Extensions
{
public static IEnumerable<T> Trace<T>(this IEnumerable<T> tt, string msg = "")
{
Console.Write(msg);
try
{
foreach (T t in tt)
{
Console.Write(" {0}", t);
yield return t;
}
}
finally
{
Console.WriteLine('.');
}
}
public static (S1, S2) Both_TwoPass<T, S1, S2>(this IEnumerable<T> tt, Func<IEnumerable<T>, S1> f1, Func<IEnumerable<T>, S2> f2)
{
return (f1(tt), f2(tt));
}
public static (S1, S2) Both_NonStream<T, S1, S2>(this IEnumerable<T> tt, Func<IEnumerable<T>, S1> f1, Func<IEnumerable<T>, S2> f2)
{
var tt2 = tt.ToList();
return (f1(tt2), f2(tt2));
}
public static (bool, int) Any5Sum(this IEnumerable<int> ii)
{
int sum = 0;
bool any5 = false;
foreach (int i in ii)
{
sum += i;
any5 |= i > 5; // or: if (!any5) any5 = i > 5;
}
return (any5, sum);
}
}
class Program
{
static void Main()
{
var mynums = Enumerable.Range(0, 10).Trace("mynums:");
Console.WriteLine("TwoPass: (any > 5, sum) = {0}", mynums.Both_TwoPass(tt => tt.Any(k => k > 5), tt => tt.Sum()));
Console.WriteLine("NonStream: (any > 5, sum) = {0}", mynums.Both_NonStream(tt => tt.Any(k => k > 5), tt => tt.Sum()));
Console.WriteLine("Manual: (any > 5, sum) = {0}", mynums.Any5Sum());
}
}
}
The way you've written your computation model (i.e. return (f1(tt), f2(tt))) there is no way to avoid multiple iterations of your enumerable. You're basically saying compute Item1 then compute Item2.
You have to either change the model from (Func<IEnumerable<T>, S1>, Func<IEnumerable<T>, S2>) to (Func<T, S1>, Func<T, S2>) or to Func<IEnumerable<T>, (S1, S2)> to be able to run the computations in parallel.
You implementation of Any5Sum is basically the second approach (Func<IEnumerable<T>, (S1, S2)>). But there's already a built-in method for that.
Try this:
Console.WriteLine("Aggregate: (any > 5, sum) = {0}",
mynums
.Aggregate<int, (bool any5, int sum)>(
(false, 0),
(a, x) => (a.any5 | x > 5, a.sum + x)));
I think you and I are describing the same thing in the comments. There is no need to create such a "special-purpose IEnumerable", though, because the BlockingCollection<> class already exists for such producer-consumer scenarios. You'd use it as follows...
Create a BlockingCollection<> for each consuming function (i.e. tt1 and tt2).
By default, a BlockingCollection<> wraps a ConcurrentQueue<>, so the elements will arrive in FIFO order.
To satisfy your requirement that only one element be held in memory at a time, 1 will be specified for the bounded capacity. Note that this capacity is per collection, so with two collections there will be up to two queued elements at any given moment.
Each collection will hold the input elements for that consumer.
Create a thread/task for each consuming function.
The thread/task will simply call GetConsumingEnumerator() for its input collection, pass the resulting IEnumerable<> to its consuming function, and return that result.
GetConsumingEnumerable() does just as its name implies: it creates an IEnumerable<> that consumes (removes) elements from the collection. If the collection is empty, enumeration will block until an element is added. CompleteAdding() is called once the producer is finished, which allows the consuming enumerator to exit when the collection empties.
The producer enumerates the IEnumerable<>, tt, and adds each element to both collections. This is the only time that tt is enumerated.
BlockingCollection<>.Add() will block if the collection has reached its capacity, preventing the entirety of tt from being buffered in-memory.
Once tt has been fully enumerated, CompleteAdding() is called on each collection.
Once each consumer thread/task has completed, their results are returned.
Here's what that looks like in code...
public static (S1, S2) Both<T, S1, S2>(this IEnumerable<T> tt, Func<IEnumerable<T>, S1> tt1, Func<IEnumerable<T>, S2> tt2)
{
const int MaxQueuedElementsPerCollection = 1;
using (BlockingCollection<T> collection1 = new BlockingCollection<T>(MaxQueuedElementsPerCollection))
using (Task<S1> task1 = StartConsumerTask(collection1, tt1))
using (BlockingCollection<T> collection2 = new BlockingCollection<T>(MaxQueuedElementsPerCollection))
using (Task<S2> task2 = StartConsumerTask(collection2, tt2))
{
foreach (T element in tt)
{
collection1.Add(element);
collection2.Add(element);
}
// Inform any enumerators created by .GetConsumingEnumerable()
// that there will be no more elements added.
collection1.CompleteAdding();
collection2.CompleteAdding();
// Accessing the Result property blocks until the Task<> is complete.
return (task1.Result, task2.Result);
}
Task<S> StartConsumerTask<S>(BlockingCollection<T> collection, Func<IEnumerable<T>, S> func)
{
return Task.Run(() => func(collection.GetConsumingEnumerable()));
}
}
Note that, for efficiency's sake, you could increase MaxQueuedElementsPerCollection to, say, 10 or 100 so that the consumers don't have to run in lock-step with each other.
There is one problem with this code, though. When a collection is empty the consumer has to wait for the producer to produce an element, and when a collection is full the producer has to wait for the consumer to consume an element. Consider what happens mid-way through the execution of your tt => tt.Any(k => k > 5) lambda...
The producer waits for the collection to be non-full and adds 5.
The consumer waits for the collection to be non-empty and removes 5.
5 > 5 returns false and enumeration continues.
The producer waits for the collection to be non-full and adds 6.
The consumer waits for the collection to be non-empty and removes 6.
6 > 5 returns true and enumeration stops. Any(), the lambda, and the consumer task all return.
The producer waits for the collection to be non-full and adds 7.
The producer waits for the collection to be non-full and...that never happens!
The consumer has already abandoned the enumeration, so it won't consume any elements to make room for the new one. Add() will never return.
The cleanest way I could come up with to prevent this deadlock is to ensure the entire collection gets enumerated even if func doesn't do so. This just requires a simple change to the StartConsumerTask<>() local method...
Task<S> StartConsumerTask<S>(BlockingCollection<T> collection, Func<IEnumerable<T>, S> func)
{
return Task.Run(
() => {
try
{
return func(collection.GetConsumingEnumerable());
}
finally
{
// Prevent BlockingCollection<>.Add() calls from
// deadlocking by ensuring the entire collection gets
// consumed even if func abandoned its enumeration early.
foreach (T element in collection.GetConsumingEnumerable())
{
// Do nothing...
}
}
}
);
}
The downside of this is that tt will always be enumerated to completion, even if both tt1 and tt2 abandon their enumerators early.
With that addressed, this...
static void Main()
{
IEnumerable<int> mynums = Enumerable.Range(0, 10).Trace("mynums:");
Console.WriteLine("Both: (any > 5, sum) = {0}", mynums.Both(tt => tt.Any(k => k > 5), tt => tt.Sum()));
}
...outputs this...
mynums: 0 1 2 3 4 5 6 7 8 9.
Both: (any > 5, sum) = (True, 45)
The core problem here is who is responsible for calling Enumeration.MoveNext() (eg by using a foreach loop). Synchronizing multiple foreach loops across threads would be slow and fiddly to get right.
Implementing IAsyncEnumerable<T>, so that multiple await foreach loops can take turns processing items would be easier. But still silly.
So the simpler solution would be to change the question. Instead of trying to call multiple methods that both try to enumerate the items, change the interface to simply visit each item.
I believe it is possible to satisfy all the requirements of the question, and one more (very natural) requirement, namely that the original enumerable be only enumerated partially if each of the two Func<IEnumerable<T>, S> consume it partially.
(This was discussed by #BACON). The approach is discussed in more detail in my GitHub repo 'CoEnumerable'. The idea is that the Barrier class provides a fairly straightforward approach to implement a proxy IEnumerable which can be consumed by each of the Func<IEnumerable<T>, S>s while the proxy consumes the real IEnumerable just once. In particular, the implementation consumes only as much of the original enumerable is as absolutely necessary (i.e., it satisfies the extra requirement mentioned above).
The proxy is:
class BarrierEnumerable<T> : IEnumerable<T>
{
private Barrier barrier;
private bool moveNext;
private readonly Func<T> src;
public BarrierEnumerable(IEnumerator<T> enumerator)
{
src = () => enumerator.Current;
}
public Barrier Barrier
{
set => barrier = value;
}
public bool MoveNext
{
set => moveNext = value;
}
public IEnumerator<T> GetEnumerator()
{
try
{
while (true)
{
barrier.SignalAndWait();
if (moveNext)
{
yield return src();
}
else
{
yield break;
}
}
}
finally
{
barrier.RemoveParticipant();
}
}
IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
}
in terms of which we can combine the two consumers
public static T Combine<S, T1, T2, T>(this IEnumerable<S> source,
Func<IEnumerable<S>, T1> coenumerable1,
Func<IEnumerable<S>, T2> coenumerable2,
Func<T1, T2, T> resultSelector)
{
using var ss = source.GetEnumerator();
var enumerable1 = new BarrierEnumerable<S>(ss);
var enumerable2 = new BarrierEnumerable<S>(ss);
using var barrier = new Barrier(2, _ => enumerable1.MoveNext = enumerable2.MoveNext = ss.MoveNext());
enumerable2.Barrier = enumerable1.Barrier = barrier;
using var t1 = Task.Run(() => coenumerable1(enumerable1));
using var t2 = Task.Run(() => coenumerable2(enumerable2));
return resultSelector(t1.Result, t2.Result);
}
The GitHub repo has a couple of examples of using the above code, and some brief design discussion (including limitations).
Given this code:
IEnumerable<object> FilteredList()
{
foreach( object item in FullList )
{
if( IsItemInPartialList( item ) )
yield return item;
}
}
Why should I not just code it this way?:
IEnumerable<object> FilteredList()
{
var list = new List<object>();
foreach( object item in FullList )
{
if( IsItemInPartialList( item ) )
list.Add(item);
}
return list;
}
I sort of understand what the yield keyword does. It tells the compiler to build a certain kind of thing (an iterator). But why use it? Apart from it being slightly less code, what's it do for me?
Using yield makes the collection lazy.
Let's say you just need the first five items. Your way, I have to loop through the entire list to get the first five items. With yield, I only loop through the first five items.
The benefit of iterator blocks is that they work lazily. So you can write a filtering method like this:
public static IEnumerable<T> Where<T>(this IEnumerable<T> source,
Func<T, bool> predicate)
{
foreach (var item in source)
{
if (predicate(item))
{
yield return item;
}
}
}
That will allow you to filter a stream as long as you like, never buffering more than a single item at a time. If you only need the first value from the returned sequence, for example, why would you want to copy everything into a new list?
As another example, you can easily create an infinite stream using iterator blocks. For example, here's a sequence of random numbers:
public static IEnumerable<int> RandomSequence(int minInclusive, int maxExclusive)
{
Random rng = new Random();
while (true)
{
yield return rng.Next(minInclusive, maxExclusive);
}
}
How would you store an infinite sequence in a list?
My Edulinq blog series gives a sample implementation of LINQ to Objects which makes heavy use of iterator blocks. LINQ is fundamentally lazy where it can be - and putting things in a list simply doesn't work that way.
With the "list" code, you have to process the full list before you can pass it on to the next step. The "yield" version passes the processed item immediately to the next step. If that "next step" contains a ".Take(10)" then the "yield" version will only process the first 10 items and forget about the rest. The "list" code would have processed everything.
This means that you see the most difference when you need to do a lot of processing and/or have long lists of items to process.
You can use yield to return items that aren't in a list. Here's a little sample that could iterate infinitely through a list until canceled.
public IEnumerable<int> GetNextNumber()
{
while (true)
{
for (int i = 0; i < 10; i++)
{
yield return i;
}
}
}
public bool Canceled { get; set; }
public void StartCounting()
{
foreach (var number in GetNextNumber())
{
if (this.Canceled) break;
Console.WriteLine(number);
}
}
This writes
0
1
2
3
4
5
6
7
8
9
0
1
2
3
4
...etc. to the console until canceled.
object jamesItem = null;
foreach(var item in FilteredList())
{
if (item.Name == "James")
{
jamesItem = item;
break;
}
}
return jamesItem;
When the above code is used to loop through FilteredList() and assuming item.Name == "James" will be satisfied on 2nd item in the list, the method using yield will yield twice. This is a lazy behavior.
Where as the method using list will add all the n objects to the list and pass the complete list to the calling method.
This is exactly a use case where difference between IEnumerable and IList can be highlighted.
The best real world example I've seen for the use of yield would be to calculate a Fibonacci sequence.
Consider the following code:
class Program
{
static void Main(string[] args)
{
Console.WriteLine(string.Join(", ", Fibonacci().Take(10)));
Console.WriteLine(string.Join(", ", Fibonacci().Skip(15).Take(1)));
Console.WriteLine(string.Join(", ", Fibonacci().Skip(10).Take(5)));
Console.WriteLine(string.Join(", ", Fibonacci().Skip(100).Take(1)));
Console.ReadKey();
}
private static IEnumerable<long> Fibonacci()
{
long a = 0;
long b = 1;
while (true)
{
long temp = a;
a = b;
yield return a;
b = temp + b;
}
}
}
This will return:
1, 1, 2, 3, 5, 8, 13, 21, 34, 55
987
89, 144, 233, 377, 610
1298777728820984005
This is nice because it allows you to calculate out an infinite series quickly and easily, giving you the ability to use the Linq extensions and query only what you need.
why use [yield]? Apart from it being slightly less code, what's it do for me?
Sometimes it is useful, sometimes not. If the entire set of data must be examined and returned then there is not going to be any benefit in using yield because all it did was introduce overhead.
When yield really shines is when only a partial set is returned. I think the best example is sorting. Assume you have a list of objects containing a date and a dollar amount from this year and you would like to see the first handful (5) records of the year.
In order to accomplish this, the list must be sorted ascending by date, and then have the first 5 taken. If this was done without yield, the entire list would have to be sorted, right up to making sure the last two dates were in order.
However, with yield, once the first 5 items have been established the sorting stops and the results are available. This can save a large amount of time.
The yield return statement allows you to return only one item at a time. You are collecting all the items in a list and again returning that list, which is a memory overhead.
I have the following code
int someCount = 0;
for ( int i =0 ; i < intarr.Length;i++ )
{
if ( intarr[i] % 2 == 0 )
{
someCount++;
continue;
}
// Some other logic for those not satisfying the condition
}
Is it possible to use any of the Array.Where or Array.SkiplWhile to achieve the same?
foreach(int i in intarr.where(<<condtion>> + increment for failures) )
{
// Some other logic for those not satisfying the condition
}
Use LINQ:
int someCount = intarr.Count(val => val % 2 == 0);
I definitely prefer #nneonneo's way for short statements (and it uses an explicit lambda), but if you want to build a more elaborate query, you can use the LINQ query syntax:
var count = ( from val in intarr
where val % 2 == 0
select val ).Count();
Obviously this is probably a poor choice when the query can be expressed with a single lambda expression, but I find it useful when composing larger queries.
More examples: http://code.msdn.microsoft.com/101-LINQ-Samples-3fb9811b
Nothing (much) prevents you from rolling your own Where that counts the failures. "Nothing much" because neither lambdas nor methods with yield return statements are allowed to reference out/ref parameters, so the desired extension with the following signature won't work:
// dead-end/bad signature, do not attempt
IEnumerable<T> Where(
this IEnumerable<T> self,
Func<T,bool> predicate,
out int failures)
However, we can declare a local variable for the failure-count and return a Func<int> that can get the failure-count, and a local variable is completely valid to reference from lambdas. Thus, here's a possible (tested) implementation:
public static class EnumerableExtensions
{
public static IEnumerable<T> Where<T>(
this IEnumerable<T> self,
Func<T,bool> predicate,
out Func<int> getFailureCount)
{
if (self == null) throw new ArgumentNullException("self");
if (predicate == null) throw new ArgumentNullException("predicate");
int failures = 0;
getFailureCount = () => failures;
return self.Where(i =>
{
bool res = predicate(i);
if (!res)
{
++failures;
}
return res;
});
}
}
...and here's some test code that exercises it:
Func<int> getFailureCount;
int[] items = { 0, 1, 2, 3, 4 };
foreach(int i in items.Where(i => i % 2 == 0, out getFailureCount))
{
Console.WriteLine(i);
}
Console.WriteLine("Failures = " + getFailureCount());
The above test, when run outputs:
0
2
4
Failures = 2
There are a couple caveats I feel obligated to warn about. Since you could break out of the loop prematurely without having walked the entire IEnumerable<>, the failure-count would only reflect encountered-failures, not the total number of failures as in #nneonneo's solution (which I prefer.) Also, if the implementation of LINQ's Where extension were to change in a way that called the predicate more than once per item, then the failure count would be incorrect. One more point of interest is that, from within your loop body you should be able to make calls to the getFailureCount Func to get the current running failure count so-far.
I presented this solution to show that we are not locked-into the existing prepackaged solutions. The language and framework provides us with lots of opportunities to extend it to suit our needs.
Context: C# 3.0, .Net 3.5
Suppose I have a method that generates random numbers (forever):
private static IEnumerable<int> RandomNumberGenerator() {
while (true) yield return GenerateRandomNumber(0, 100);
}
I need to group those numbers in groups of 10, so I would like something like:
foreach (IEnumerable<int> group in RandomNumberGenerator().Slice(10)) {
Assert.That(group.Count() == 10);
}
I have defined Slice method, but I feel there should be one already defined. Here is my Slice method, just for reference:
private static IEnumerable<T[]> Slice<T>(IEnumerable<T> enumerable, int size) {
var result = new List<T>(size);
foreach (var item in enumerable) {
result.Add(item);
if (result.Count == size) {
yield return result.ToArray();
result.Clear();
}
}
}
Question: is there an easier way to accomplish what I'm trying to do? Perhaps Linq?
Note: above example is a simplification, in my program I have an Iterator that scans given matrix in a non-linear fashion.
EDIT: Why Skip+Take is no good.
Effectively what I want is:
var group1 = RandomNumberGenerator().Skip(0).Take(10);
var group2 = RandomNumberGenerator().Skip(10).Take(10);
var group3 = RandomNumberGenerator().Skip(20).Take(10);
var group4 = RandomNumberGenerator().Skip(30).Take(10);
without the overhead of regenerating number (10+20+30+40) times. I need a solution that will generate exactly 40 numbers and break those in 4 groups by 10.
Are Skip and Take of any use to you?
Use a combination of the two in a loop to get what you want.
So,
list.Skip(10).Take(10);
Skips the first 10 records and then takes the next 10.
I have done something similar. But I would like it to be simpler:
//Remove "this" if you don't want it to be a extension method
public static IEnumerable<IList<T>> Chunks<T>(this IEnumerable<T> xs, int size)
{
var curr = new List<T>(size);
foreach (var x in xs)
{
curr.Add(x);
if (curr.Count == size)
{
yield return curr;
curr = new List<T>(size);
}
}
}
I think yours are flawed. You return the same array for all your chunks/slices so only the last chunk/slice you take would have the correct data.
Addition: Array version:
public static IEnumerable<T[]> Chunks<T>(this IEnumerable<T> xs, int size)
{
var curr = new T[size];
int i = 0;
foreach (var x in xs)
{
curr[i % size] = x;
if (++i % size == 0)
{
yield return curr;
curr = new T[size];
}
}
}
Addition: Linq version (not C# 2.0). As pointed out, it will not work on infinite sequences and will be a great deal slower than the alternatives:
public static IEnumerable<T[]> Chunks<T>(this IEnumerable<T> xs, int size)
{
return xs.Select((x, i) => new { x, i })
.GroupBy(xi => xi.i / size, xi => xi.x)
.Select(g => g.ToArray());
}
Using Skip and Take would be a very bad idea. Calling Skip on an indexed collection may be fine, but calling it on any arbitrary IEnumerable<T> is liable to result in enumeration over the number of elements skipped, which means that if you're calling it repeatedly you're enumerating over the sequence an order of magnitude more times than you need to be.
Complain of "premature optimization" all you want; but that is just ridiculous.
I think your Slice method is about as good as it gets. I was going to suggest a different approach that would provide deferred execution and obviate the intermediate array allocation, but that is a dangerous game to play (i.e., if you try something like ToList on such a resulting IEnumerable<T> implementation, without enumerating over the inner collections, you'll end up in an endless loop).
(I've removed what was originally here, as the OP's improvements since posting the question have since rendered my suggestions here redundant.)
Let's see if you even need the complexity of Slice. If your random number generates is stateless, I would assume each call to it would generate unique random numbers, so perhaps this would be sufficient:
var group1 = RandomNumberGenerator().Take(10);
var group2 = RandomNumberGenerator().Take(10);
var group3 = RandomNumberGenerator().Take(10);
var group4 = RandomNumberGenerator().Take(10);
Each call to Take returns a new group of 10 numbers.
Now, if your random number generator re-seeds itself with a specific value each time it's iterated, this won't work. You'll simply get the same 10 values for each group. So instead, you would use:
var generator = RandomNumberGenerator();
var group1 = generator.Take(10);
var group2 = generator.Take(10);
var group3 = generator.Take(10);
var group4 = generator.Take(10);
This maintains an instance of the generator so that you can continue retrieving values without re-seeding the generator.
You could use the Skip and Take methods with any Enumerable object.
For your edit :
How about a function that takes a slice number and a slice size as a parameter?
private static IEnumerable<T> Slice<T>(IEnumerable<T> enumerable, int sliceSize, int sliceNumber) {
return enumerable.Skip(sliceSize * sliceNumber).Take(sliceSize);
}
It seems like we'd prefer for an IEnumerable<T> to have a fixed position counter so that we can do
var group1 = items.Take(10);
var group2 = items.Take(10);
var group3 = items.Take(10);
var group4 = items.Take(10);
and get successive slices rather than getting the first 10 items each time. We can do that with a new implementation of IEnumerable<T> which keeps one instance of its Enumerator and returns it on every call of GetEnumerator:
public class StickyEnumerable<T> : IEnumerable<T>, IDisposable
{
private IEnumerator<T> innerEnumerator;
public StickyEnumerable( IEnumerable<T> items )
{
innerEnumerator = items.GetEnumerator();
}
public IEnumerator<T> GetEnumerator()
{
return innerEnumerator;
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return innerEnumerator;
}
public void Dispose()
{
if (innerEnumerator != null)
{
innerEnumerator.Dispose();
}
}
}
Given that class, we could implement Slice with
public static IEnumerable<IEnumerable<T>> Slices<T>(this IEnumerable<T> items, int size)
{
using (StickyEnumerable<T> sticky = new StickyEnumerable<T>(items))
{
IEnumerable<T> slice;
do
{
slice = sticky.Take(size).ToList();
yield return slice;
} while (slice.Count() == size);
}
yield break;
}
That works in this case, but StickyEnumerable<T> is generally a dangerous class to have around if the consuming code isn't expecting it. For example,
using (var sticky = new StickyEnumerable<int>(Enumerable.Range(1, 10)))
{
var first = sticky.Take(2);
var second = sticky.Take(2);
foreach (int i in second)
{
Console.WriteLine(i);
}
foreach (int i in first)
{
Console.WriteLine(i);
}
}
prints
1
2
3
4
rather than
3
4
1
2
Take a look at Take(), TakeWhile() and Skip()
I think the use of Slice() would be a bit misleading. I think of that as a means to give me a chuck of an array into a new array and not causing side effects. In this scenario you would actually move the enumerable forward 10.
A possible better approach is to just use the Linq extension Take(). I don't think you would need to use Skip() with a generator.
Edit: Dang, I have been trying to test this behavior with the following code
Note: this is wasn't really correct, I leave it here so others don't fall into the same mistake.
var numbers = RandomNumberGenerator();
var slice = numbers.Take(10);
public static IEnumerable<int> RandomNumberGenerator()
{
yield return random.Next();
}
but the Count() for slice is alway 1. I also tried running it through a foreach loop since I know that the Linq extensions are generally lazily evaluated and it only looped once. I eventually did the code below instead of the Take() and it works:
public static IEnumerable<int> Slice(this IEnumerable<int> enumerable, int size)
{
var list = new List<int>();
foreach (var count in Enumerable.Range(0, size)) list.Add(enumerable.First());
return list;
}
If you notice I am adding the First() to the list each time, but since the enumerable that is being passed in is the generator from RandomNumberGenerator() the result is different every time.
So again with a generator using Skip() is not needed since the result will be different. Looping over an IEnumerable is not always side effect free.
Edit: I'll leave the last edit just so no one falls into the same mistake, but it worked fine for me just doing this:
var numbers = RandomNumberGenerator();
var slice1 = numbers.Take(10);
var slice2 = numbers.Take(10);
The two slices were different.
I had made some mistakes in my original answer but some of the points still stand. Skip() and Take() are not going to work the same with a generator as it would a list. Looping over an IEnumerable is not always side effect free. Anyway here is my take on getting a list of slices.
public static IEnumerable<int> RandomNumberGenerator()
{
while(true) yield return random.Next();
}
public static IEnumerable<IEnumerable<int>> Slice(this IEnumerable<int> enumerable, int size, int count)
{
var slices = new List<List<int>>();
foreach (var iteration in Enumerable.Range(0, count)){
var list = new List<int>();
list.AddRange(enumerable.Take(size));
slices.Add(list);
}
return slices;
}
I got this solution for the same problem:
int[] ints = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
IEnumerable<IEnumerable<int>> chunks = Chunk(ints, 2, t => t.Dump());
//won't enumerate, so won't do anything unless you force it:
chunks.ToList();
IEnumerable<T> Chunk<T, R>(IEnumerable<R> src, int n, Func<IEnumerable<R>, T> action){
IEnumerable<R> head;
IEnumerable<R> tail = src;
while (tail.Any())
{
head = tail.Take(n);
tail = tail.Skip(n);
yield return action(head);
}
}
if you just want the chunks returned, not do anything with them, use chunks = Chunk(ints, 2, t => t). What I would really like is to have to have t=>t as default action, but I haven't found out how to do that yet.