Rx extension to keep generating events for N seconds? - c#

What is the right Rx extension method (in .NET) to keep generating events for N seconds?
By 'keep generating events for N seconds' I mean that it will keep generating events in a loop since DateTime.Now till DateTime.Now + TimeSpan.FromSeconds(N)
I'm working on genetic algorithm that will generate number of hypotheses and propagate the most successful ones to the next generation. Need to constrain this guy in some elegant way.
Added later:
I've actually realized that I need to do pull instead of push and came-up with something like this:
public static class IEnumerableExtensions
{
public static IEnumerable<T> Pull<T>(this IEnumerable<T> enumerable, int? times = null)
{
if (times == null)
return enumerable.ToArray();
else
return enumerable.Take(times.Value).ToArray();
}
public static IEnumerable<T> Pull<T>(this IEnumerable<T> enumerable, TimeSpan timeout, int? times = null)
{
var start = DateTime.Now;
if (times != null) enumerable = enumerable.Take(times.Value);
using (var iterator = enumerable.GetEnumerator())
{
while (DateTime.Now < start + timeout && iterator.MoveNext())
yield return iterator.Current;
}
}
}
And usage would be:
var results = lazySource.SelectMany(item =>
{
//processing goes here
}).Pull(timeout: TimeSpan.FromSeconds(5), times: numberOfIterations);

There may well be a cleaner way of doing this, but you could use:
// This will generate events repeatedly
var interval = Observable.Interval(...);
// This will generate one event in N seconds
var timer = Observable.Timer(TimeSpan.FromSeconds(N));
// This will combine the two, so that the interval stops when the timer
// fires
var joined = interval.TakeUntil(timer);
It's been a long time since I've done any Rx, so I apologise if this is incorrect - but it's worth a try...

Jon's post is pretty much spot on, however I noticed your edit where you suggested you would create your own extension methods to do this. I think it would be better* if you just used the built in operators.
//LinqPad sample
void Main()
{
var interval = Observable.Interval(TimeSpan.FromMilliseconds(250));
var maxTime = Observable.Timer(TimeSpan.FromSeconds(10));
IEnumerable<int> lazySource = Enumerable.Range(0, 100);
lazySource.ToObservable()
.Zip(interval, (val, tick)=>val)
.TakeUntil(maxTime)
.Dump();
}
*ie. easy for other devs to maintain and understand

Related

Functionally pure dice rolls in C#

I am writing a dice-based game in C#. I want all of my game-logic to be pure, so I have devised a dice-roll generator like this:
public static IEnumerable<int> CreateDiceStream(int seed)
{
var random = new Random(seed);
while (true)
{
yield return 1 + random.Next(5);
}
}
Now I can use this in my game logic:
var playerRolls = players.Zip(diceRolls, (player, roll) => Tuple.Create(player, roll));
The problem is that the next time I take from diceRolls I want to skip the rolls that I have already taken:
var secondPlayerRolls = players.Zip(
diceRolls.Skip(playerRolls.Count()),
(player, roll) => Tuple.Create(player, roll));
This is already quite ugly and error prone. It doesn't scale well as the code becomes more complex.
It also means that I have to be careful when using a dice roll sequence between functions:
var x = DoSomeGameLogic(diceRolls);
var nextRoll = diceRolls.Skip(x.NumberOfDiceRollsUsed).First();
Is there a good design pattern that I should be using here?
Note that it is important that my functions remain pure due to syncronisation and play-back requirements.
This question is not about correctly initializing System.Random. Please read what I have written, and leave a comment if it is unclear.
That's a very nice puzzle.
Since manipulating diceRolls's state is out of the question (otherwise, we'd have those sync and replaying issues you mentioned), we need an operation which returns both (a) the values to be consumed and (b) a new diceRolls enumerable which starts after the consumed items.
My suggestion would be to use the return value for (a) and an out parameter for (b):
static IEnumerable<int> Consume(this IEnumerable<int> rolls, int count, out IEnumerable<int> remainder)
{
remainder = rolls.Skip(count);
return rolls.Take(count);
}
Usage:
var firstRolls = diceRolls.Consume(players.Count(), out diceRolls);
var secondRolls = diceRolls.Consume(players.Count(), out diceRolls);
DoSomeGameLogic would use Consume internally and return the remaining rolls. Thus, it would need to be called as follows:
var x = DoSomeGameLogic(diceRolls, out diceRolls);
// or
var x = DoSomeGameLogic(ref diceRolls);
// or
x = DoSomeGameLogic(diceRolls);
diceRolls = x.RemainingDiceRolls;
The "classic" way to implement pure random generators is to use a specialized form of a state monad (more explanation here), which wraps the carrying around of the current state of the generator. So, instead of implementing (note that my C# is quite rusty, so please consider this as pseudocode):
Int Next() {
nextState, nextValue = NextRandom(globalState);
globalState = nextState;
return nextValue;
}
you define something like this:
class Random<T> {
private Func<Int, Tuple<Int, T>> transition;
private Tuple<Int, Int> NextRandom(Int state) { ... whatever, see below ... }
public static Random<A> Unit<A>(A a) {
return new Random<A>(s => Tuple(s, a));
}
public static Random<Int> GetRandom() {
return new Random<Int>(s => nextRandom(s));
}
public Random<U> SelectMany(Func<T, Random<U>> f) {
return new Random(s => {
nextS, a = this.transition(s);
return f(a).transition(nextS);
}
}
public T Run(Int seed) {
return this.transition(seed);
}
}
Which should be usable with LINQ, if I did everything right:
// player1 = bla, player2 = blub, ...
Random<Tuple<Player, Int>> playerOneRoll = from roll in GetRandom()
select Tuple(player1, roll);
Random<Tuple<Player, Int>> playerTwoRoll = from roll in GetRandom()
select Tuple(player2, roll);
Random<List<Tuple<Player, Int>>> randomRolls = from t1 in playerOneRoll
from t2 in playerTwoRoll
select List(t1, t2);
var actualRolls = randomRolls.Run(234324);
etc., possibly using some combinators. The trick here is to represent the whole "random action" parametrized by the current input state; but this is also the problem, since you'd need a good implementation of NextRandom.
It would be nice if you could just reuse the internals of the .NET Random implementation, but as it seems, you cannot access its internal state. However, I'm sure there are enough sufficiently good PRNG state functions around on the internet (this one looks good; you might have to change the state type).
Another disadvantage of monads is that once you start working in them (ie, construct things in Random), you need to "carry that though" the whole control flow, up to the top level, at which you should call Run once and for all. This is something one needs to get use to, and is more tedious in C# than functional languages optimized for such things.

How AsParallel extension actually works

So topic is the questions.
I get that method AsParallel returns wrapper ParallelQuery<TSource> that uses the same LINQ keywords, but from System.Linq.ParallelEnumerable instead of System.Linq.Enumerable
It's clear enough, but when i'm looking into decompiled sources, i don't understand how does it works.
Let's begin from an easiest extension : Sum() method. Code:
[__DynamicallyInvokable]
public static int Sum(this ParallelQuery<int> source)
{
if (source == null)
throw new ArgumentNullException("source");
else
return new IntSumAggregationOperator((IEnumerable<int>) source).Aggregate();
}
it's clear, let's go to Aggregate() method. It's a wrapper on InternalAggregate method that traps some exceptions. Now let's take a look on it.
protected override int InternalAggregate(ref Exception singularExceptionToThrow)
{
using (IEnumerator<int> enumerator = this.GetEnumerator(new ParallelMergeOptions?(ParallelMergeOptions.FullyBuffered), true))
{
int num = 0;
while (enumerator.MoveNext())
checked { num += enumerator.Current; }
return num;
}
}
and here is the question: how it works? I see no concurrence safety for a variable, modified by many threads, we see only iterator and summing. Is it magic enumerator? Or how does it works? GetEnumerator() returns QueryOpeningEnumerator<TOutput>, but it's code is too complicated.
Finally in my second PLINQ assault I found an answer. And it's pretty clear.
Problem is that enumerator is not simple. It's a special multithreading one. So how it works? Answer is that enumerator doesn't return a next value of source, it returns a whole sum of next partition. So this code is only executed 2,4,6,8... times (based on Environment.ProcessorCount), when actual summation work is performed inside enumerator.MoveNext in enumerator.OpenQuery method.
So TPL obviosly partition the source enumerable, then sum independently each partition and then pefrorm this summation, see IntSumAggregationOperatorEnumerator<TKey>. No magic here, just could plunge deeper.
The Sum operator aggregates all values in a single thread. There is no multi-threading here. The trick is that multi-threading is happening somewhere else.
The PLINQ Sum method can handle PLINQ enumerables. Those enumerables could be built up using other constructs (such as where) that allows a collection to be processed over multiple threads.
The Sum operator is always the last operator in a chain. Although it is possible to process this sum over multiple threads, the TPL team probably found out that this had a negative impact on performance, which is reasonable, since the only thing this method has to do is a simple integer addition.
So this method processes all results that come available from other threads and processes them on a single thread and returns that value. The real trick is in other PLINQ extension methods.
protected override int InternalAggregate(ref Exception singularExceptionToThrow)
{
using (IEnumerator<int> enumerator = this.GetEnumerator(new ParallelMergeOptions? (ParallelMergeOptions.FullyBuffered), true))
{
int num = 0;
while (enumerator.MoveNext())
checked { num += enumerator.Current; }
return num;
}
}
This code won't be executed parallel, the while will be sequentially execute it's innerscope.
Try this instead
List<int> list = new List<int>();
int num = 0;
Parallel.ForEach(list, (item) =>
{
checked { num += item; }
});
The inner action will be spread on the ThreadPool and the ForEach statement will be complete when all items are handled.
Here you need threadsafety:
List<int> list = new List<int>();
int num = 0;
Parallel.ForEach(list, (item) =>
{
Interlocked.Add(ref num, item);
});

intersect and any or contains and any. Which is more efficient to find at least one common element?

If I have two list and I want to know if there are at least one common element, I have this two options:
lst1.Intersect(lst2).Any();
Lst1.Any(x => lst2.Contains(x));
The two options give me the result that I expect, however I don't know what is the best option. Which is more efficient? And why?
Thanks.
EDIT: when I created this post, apart of the solution, I was looking the reason. I know that I can run tests, but I wouldn't know the reason of the result. One is faster than the other? Is always one solution best than the other?
So for this reason, I hace accepted the answer of Matthew, not only for the test code, but also he explain when one is better than other and why. I appreciate a lot the contributions of Nicholas and Oren too.
Thanks.
Oren's answer has an error in the way the stopwatch is being used. It isn't being reset at the end of the loop after the time taken by Any() has been measured.
Note how it goes back to the start of the loop with the stopwatch never being Reset() so that the time that is added to intersect includes the time taken by Any().
Following is a corrected version.
A release build run outside any debugger gives this result on my PC:
Intersect: 1ms
Any: 6743ms
Note how I'm making two non-overlapping string lists for this test. Also note that this is a worst-case test.
Where there are many intersections (or intersections that happen to occur near the start of the data) then Oren is quite correct to say that Any() should be faster.
If the real data usually contains intersections then it's likely that it is better to use Any(). Otherwise, use Intersect(). It's very data dependent.
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
namespace Demo
{
class Program
{
void run()
{
double intersect = 0;
double any = 0;
Stopwatch stopWatch = new Stopwatch();
List<string> L1 = Enumerable.Range(0, 10000).Select(x => x.ToString()).ToList();
List<string> L2 = Enumerable.Range(10000, 10000).Select(x => x.ToString()).ToList();
for (int i = 0; i < 10; i++)
{
stopWatch.Restart();
Intersect(L1, L2);
stopWatch.Stop();
intersect += stopWatch.ElapsedMilliseconds;
stopWatch.Restart();
Any(L1, L2);
stopWatch.Stop();
any += stopWatch.ElapsedMilliseconds;
}
Console.WriteLine("Intersect: " + intersect + "ms");
Console.WriteLine("Any: " + any + "ms");
}
private static bool Any(List<string> lst1, List<string> lst2)
{
return lst1.Any(lst2.Contains);
}
private static bool Intersect(List<string> lst1, List<string> lst2)
{
return lst1.Intersect(lst2).Any();
}
static void Main()
{
new Program().run();
}
}
}
For comparative purposes, I wrote my own test comparing int sequences:
intersect took 00:00:00.0065928
any took 00:00:08.6706195
The code:
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
namespace Demo
{
class Program
{
void run()
{
var lst1 = Enumerable.Range(0, 10000);
var lst2 = Enumerable.Range(10000, 10000);
int count = 10;
DemoUtil.Time(() => lst1.Intersect(lst2).Any(), "intersect", count);
DemoUtil.Time(() => lst1.Any(lst2.Contains), "any", count);
}
static void Main()
{
new Program().run();
}
}
static class DemoUtil
{
public static void Print(this object self)
{
Console.WriteLine(self);
}
public static void Print(this string self)
{
Console.WriteLine(self);
}
public static void Print<T>(this IEnumerable<T> self)
{
foreach (var item in self)
Console.WriteLine(item);
}
public static void Time(Action action, string title, int count)
{
var sw = Stopwatch.StartNew();
for (int i = 0; i < count; ++i)
action();
(title + " took " + sw.Elapsed).Print();
}
}
}
If I also time this for overlapping ranges by changing the lists to this and upping the count to 10000:
var lst1 = Enumerable.Range(10000, 10000);
var lst2 = Enumerable.Range(10000, 10000);
I get these results:
intersect took 00:00:03.2607476
any took 00:00:00.0019170
In this case Any() is clearly much faster.
Conclusion
The worst-case performance is very bad for Any() but acceptible for Intersect().
The best-case performance is extremely good for Any() and bad for Intersect().
(and best-case for Any() is probably worst-case for Intersect()!)
The Any() approach is O(N^2) in the worst case and O(1) in the best case.
The Intersect() approach is always O(N) (since it uses hashing, not sorting, otherwise it would be O(N(Log(N))).
You must also consider the memory usage: the Intersect() method needs to take a copy of one of the inputs, whereas Any() doesn't.
Therefore to make the best decision you really need to know the characteristics of the real data, and actually perform tests.
If you really don't want the Any() to turn into an O(N^2) in the worst case, then you should use Intersect(). However, the chances are that you will be best off using Any().
And of course, most of the time none of this matters!
Unless you've discovered this part of the code to be a bottleneck, this is of merely academic interest. You shouldn't waste your time with this kind of analysis if there's no problem. :)
It depends on the implementation of your IEnumerables.
Your first try (Intersect/Any), finds all the matches and then determines if the set is empty or not. From the documentation, this looks to be something like O(n) operation:
When the object returned by this method is enumerated, Intersect enumerates first,
collecting all distinct elements of that sequence. It then enumerates [the]
second, marking those elements that occur in both sequences. Finally, the marked
elements are yielded in the order in which they were collected.
Your second try ( Any/Contains ) enumerates over the first collection, an O(n) operation, and for each item in the first collection, enumerates over the second, another O(n) operation, to see if a matching element is found. This makes it something like an O(n2) operation, does it not? Which do you think might be faster?
One thing to consider, though, is that the Contains() lookup for certain collection or set types (e.g., dictionaries, binary trees or ordered collections that allow a binary search or hashtable lookup) might be a cheap operation if the Contains() implementation is smart enough to take advantage of the semantics of the collection upon which it is operating.
But you'll need to experiment with your collection types to find out which works better.
See Matthew's answer for a complete and accurate breakdown.
Relatively easy to mock up and try yourself:
bool found;
double intersect = 0;
double any = 0;
for (int i = 0; i < 100; i++)
{
List<string> L1 = GenerateNumberStrings(200000);
List<string> L2 = GenerateNumberStrings(60000);
Stopwatch stopWatch = new Stopwatch();
stopWatch.Start();
found = Intersect(L1, L2);
stopWatch.Stop();
intersect += stopWatch.ElapsedMilliseconds;
stopWatch.Reset();
stopWatch.Start();
found = Any(L1, L2);
stopWatch.Stop();
any += stopWatch.ElapsedMilliseconds;
}
Console.WriteLine("Intersect: " + intersect + "ms");
Console.WriteLine("Any: " + any + "ms");
}
private static bool Any(List<string> lst1, List<string> lst2)
{
return lst1.Any(x => lst2.Contains(x));
}
private static bool Intersect(List<string> lst1, List<string> lst2)
{
return lst1.Intersect(lst2).Any();
}
You'll find that the Any method is significantly faster in the long run, likely because it does not require the memory allocations and setup that intersect requires (Any stops and returns true as soon as it finds a match whereas Intersect actually needs to store the matches in a new List<T>).

Using LINQ to create an IEnumerable<> of delta values

I've got a list of timestamps (in ticks), and from this list I'd like to create another one that represents the delta time between entries.
Let's just say, for example, that my master timetable looks like this:
10
20
30
50
60
70
What I want back is this:
10
10
20
10
10
What I'm trying to accomplish here is detect that #3 in the output table is an outlier by calculating the standard deviation. I've not taken statistics before, but I think if I look for the prevalent value in the output list and throw out anything outside of 1 sigma that this will work adequately for me.
I'd love to be able to create the output list with a single LINQ query, but I haven't figured it out yet. Currently I'm just brute forcing it with a loop.
If you are running .NET 4.0, this should work fine:
var deltas = list.Zip(list.Skip(1), (current, next) => next - current);
Apart from the multiple enumerators, this is quite efficient; it should work well on any kind of sequence.
Here's an alternative for .NET 3.5:
var deltas = list.Skip(1)
.Select((next, index) => next - list[index]);
Obviously, this idea will only be efficient when the list's indexer is employed. Modifying it to use ElementAt may not be a good idea: quadratic run-time will occur for non IList<T> sequences. In this case, writing a custom iterator is a good solution.
EDIT: If you don't like the Zip + Skip(1) idea, writing an extension such as this (untested) maybe useful in these sorts of circumstances:
public class CurrentNext<T>
{
public T Current { get; private set; }
public T Next { get; private set; }
public CurrentNext(T current, T next)
{
Current = current;
Next = next;
}
}
...
public static IEnumerable<CurrentNext<T>> ToCurrentNextEnumerable<T>(this IEnumerable<T> source)
{
if (source == null)
throw new ArgumentException("source");
using (var source = enumerable.GetEnumerator())
{
if (!enumerator.MoveNext())
yield break;
T current = enumerator.Current;
while (enumerator.MoveNext())
{
yield return new CurrentNext<T>(current, enumerator.Current);
current = enumerator.Current;
}
}
}
Which you could then use as:
var deltas = list.ToCurrentNextEnumerable()
.Select(c=> c.Next - c.Current);
You can use Ani's answer:-
var deltas = list.Zip(list.Skip(1), (current, next) => next - current);
With a super-simple implementation of the Zip extension method:-
public static IEnumerable<TResult> Zip<TFirst, TSecond, TResult>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TSecond, TResult> func)
{
var ie1 = first.GetEnumerator();
var ie2 = second.GetEnumerator();
while (ie1.MoveNext() && ie2.MoveNext())
yield return func(ie1.Current, ie2.Current);
}
That'll work with 3.5.
This should do the trick:
static IEnumerable<int> GetDeltas(IEnumerable<int> collection)
{
int? previous = null;
foreach (int value in collection)
{
if (previous != null)
{
yield return value - (int)previous;
}
previous = value;
}
}
Now you can call your collection like this:
var masterTimetable = GetMasterTimeTable();
var deltas = GetDeltas(masterTimetable);
It's not really LINQ, but will effectively do the trick.
It looks like there are sufficient answers to get you going already, but I asked a similar question back in the spring:
How to zip one ienumerable with itself
In the responses to my question, I learned about "Pairwise" and "Pairwise"
As I recall, explicitly implementing your own "Pairwise" enumerator does mean that you iterate through you list exactly once whereas implementing "Pairwise" in terms of .Zip + .Skip(1) means that you will ultimately iterate over your list twice.
In my post I also include several examples of geometry (operating on lists of points) processing code such as Length/Distance, Area, Centroid.
Not that I recommend this, but totally abusing LINQ the following would work:
var vals = new[] {10, 20, 30, 50, 60, 70};
int previous = 0;
var newvals = vals.Select(i =>
{
int dif = i - previous;
previous = i;
return dif;
});
foreach (var newval in newvals)
{
Console.WriteLine(newval);
}
One liner for you:
int[] i = new int[] { 10, 20, 30, 50, 60, 70 };
IEnumerable<int> x = Enumerable.Range(1, i.Count()-1).Select(W => i[W] - i[W - 1]);
LINQ is not really designed for what you're trying to do here, because it usually evaluates value by value, much like an extremely efficient combination of for-loops.
You'd have to know your current index, something you don't, without some kind of workaround.

Get next N elements from enumerable

Context: C# 3.0, .Net 3.5
Suppose I have a method that generates random numbers (forever):
private static IEnumerable<int> RandomNumberGenerator() {
while (true) yield return GenerateRandomNumber(0, 100);
}
I need to group those numbers in groups of 10, so I would like something like:
foreach (IEnumerable<int> group in RandomNumberGenerator().Slice(10)) {
Assert.That(group.Count() == 10);
}
I have defined Slice method, but I feel there should be one already defined. Here is my Slice method, just for reference:
private static IEnumerable<T[]> Slice<T>(IEnumerable<T> enumerable, int size) {
var result = new List<T>(size);
foreach (var item in enumerable) {
result.Add(item);
if (result.Count == size) {
yield return result.ToArray();
result.Clear();
}
}
}
Question: is there an easier way to accomplish what I'm trying to do? Perhaps Linq?
Note: above example is a simplification, in my program I have an Iterator that scans given matrix in a non-linear fashion.
EDIT: Why Skip+Take is no good.
Effectively what I want is:
var group1 = RandomNumberGenerator().Skip(0).Take(10);
var group2 = RandomNumberGenerator().Skip(10).Take(10);
var group3 = RandomNumberGenerator().Skip(20).Take(10);
var group4 = RandomNumberGenerator().Skip(30).Take(10);
without the overhead of regenerating number (10+20+30+40) times. I need a solution that will generate exactly 40 numbers and break those in 4 groups by 10.
Are Skip and Take of any use to you?
Use a combination of the two in a loop to get what you want.
So,
list.Skip(10).Take(10);
Skips the first 10 records and then takes the next 10.
I have done something similar. But I would like it to be simpler:
//Remove "this" if you don't want it to be a extension method
public static IEnumerable<IList<T>> Chunks<T>(this IEnumerable<T> xs, int size)
{
var curr = new List<T>(size);
foreach (var x in xs)
{
curr.Add(x);
if (curr.Count == size)
{
yield return curr;
curr = new List<T>(size);
}
}
}
I think yours are flawed. You return the same array for all your chunks/slices so only the last chunk/slice you take would have the correct data.
Addition: Array version:
public static IEnumerable<T[]> Chunks<T>(this IEnumerable<T> xs, int size)
{
var curr = new T[size];
int i = 0;
foreach (var x in xs)
{
curr[i % size] = x;
if (++i % size == 0)
{
yield return curr;
curr = new T[size];
}
}
}
Addition: Linq version (not C# 2.0). As pointed out, it will not work on infinite sequences and will be a great deal slower than the alternatives:
public static IEnumerable<T[]> Chunks<T>(this IEnumerable<T> xs, int size)
{
return xs.Select((x, i) => new { x, i })
.GroupBy(xi => xi.i / size, xi => xi.x)
.Select(g => g.ToArray());
}
Using Skip and Take would be a very bad idea. Calling Skip on an indexed collection may be fine, but calling it on any arbitrary IEnumerable<T> is liable to result in enumeration over the number of elements skipped, which means that if you're calling it repeatedly you're enumerating over the sequence an order of magnitude more times than you need to be.
Complain of "premature optimization" all you want; but that is just ridiculous.
I think your Slice method is about as good as it gets. I was going to suggest a different approach that would provide deferred execution and obviate the intermediate array allocation, but that is a dangerous game to play (i.e., if you try something like ToList on such a resulting IEnumerable<T> implementation, without enumerating over the inner collections, you'll end up in an endless loop).
(I've removed what was originally here, as the OP's improvements since posting the question have since rendered my suggestions here redundant.)
Let's see if you even need the complexity of Slice. If your random number generates is stateless, I would assume each call to it would generate unique random numbers, so perhaps this would be sufficient:
var group1 = RandomNumberGenerator().Take(10);
var group2 = RandomNumberGenerator().Take(10);
var group3 = RandomNumberGenerator().Take(10);
var group4 = RandomNumberGenerator().Take(10);
Each call to Take returns a new group of 10 numbers.
Now, if your random number generator re-seeds itself with a specific value each time it's iterated, this won't work. You'll simply get the same 10 values for each group. So instead, you would use:
var generator = RandomNumberGenerator();
var group1 = generator.Take(10);
var group2 = generator.Take(10);
var group3 = generator.Take(10);
var group4 = generator.Take(10);
This maintains an instance of the generator so that you can continue retrieving values without re-seeding the generator.
You could use the Skip and Take methods with any Enumerable object.
For your edit :
How about a function that takes a slice number and a slice size as a parameter?
private static IEnumerable<T> Slice<T>(IEnumerable<T> enumerable, int sliceSize, int sliceNumber) {
return enumerable.Skip(sliceSize * sliceNumber).Take(sliceSize);
}
It seems like we'd prefer for an IEnumerable<T> to have a fixed position counter so that we can do
var group1 = items.Take(10);
var group2 = items.Take(10);
var group3 = items.Take(10);
var group4 = items.Take(10);
and get successive slices rather than getting the first 10 items each time. We can do that with a new implementation of IEnumerable<T> which keeps one instance of its Enumerator and returns it on every call of GetEnumerator:
public class StickyEnumerable<T> : IEnumerable<T>, IDisposable
{
private IEnumerator<T> innerEnumerator;
public StickyEnumerable( IEnumerable<T> items )
{
innerEnumerator = items.GetEnumerator();
}
public IEnumerator<T> GetEnumerator()
{
return innerEnumerator;
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return innerEnumerator;
}
public void Dispose()
{
if (innerEnumerator != null)
{
innerEnumerator.Dispose();
}
}
}
Given that class, we could implement Slice with
public static IEnumerable<IEnumerable<T>> Slices<T>(this IEnumerable<T> items, int size)
{
using (StickyEnumerable<T> sticky = new StickyEnumerable<T>(items))
{
IEnumerable<T> slice;
do
{
slice = sticky.Take(size).ToList();
yield return slice;
} while (slice.Count() == size);
}
yield break;
}
That works in this case, but StickyEnumerable<T> is generally a dangerous class to have around if the consuming code isn't expecting it. For example,
using (var sticky = new StickyEnumerable<int>(Enumerable.Range(1, 10)))
{
var first = sticky.Take(2);
var second = sticky.Take(2);
foreach (int i in second)
{
Console.WriteLine(i);
}
foreach (int i in first)
{
Console.WriteLine(i);
}
}
prints
1
2
3
4
rather than
3
4
1
2
Take a look at Take(), TakeWhile() and Skip()
I think the use of Slice() would be a bit misleading. I think of that as a means to give me a chuck of an array into a new array and not causing side effects. In this scenario you would actually move the enumerable forward 10.
A possible better approach is to just use the Linq extension Take(). I don't think you would need to use Skip() with a generator.
Edit: Dang, I have been trying to test this behavior with the following code
Note: this is wasn't really correct, I leave it here so others don't fall into the same mistake.
var numbers = RandomNumberGenerator();
var slice = numbers.Take(10);
public static IEnumerable<int> RandomNumberGenerator()
{
yield return random.Next();
}
but the Count() for slice is alway 1. I also tried running it through a foreach loop since I know that the Linq extensions are generally lazily evaluated and it only looped once. I eventually did the code below instead of the Take() and it works:
public static IEnumerable<int> Slice(this IEnumerable<int> enumerable, int size)
{
var list = new List<int>();
foreach (var count in Enumerable.Range(0, size)) list.Add(enumerable.First());
return list;
}
If you notice I am adding the First() to the list each time, but since the enumerable that is being passed in is the generator from RandomNumberGenerator() the result is different every time.
So again with a generator using Skip() is not needed since the result will be different. Looping over an IEnumerable is not always side effect free.
Edit: I'll leave the last edit just so no one falls into the same mistake, but it worked fine for me just doing this:
var numbers = RandomNumberGenerator();
var slice1 = numbers.Take(10);
var slice2 = numbers.Take(10);
The two slices were different.
I had made some mistakes in my original answer but some of the points still stand. Skip() and Take() are not going to work the same with a generator as it would a list. Looping over an IEnumerable is not always side effect free. Anyway here is my take on getting a list of slices.
public static IEnumerable<int> RandomNumberGenerator()
{
while(true) yield return random.Next();
}
public static IEnumerable<IEnumerable<int>> Slice(this IEnumerable<int> enumerable, int size, int count)
{
var slices = new List<List<int>>();
foreach (var iteration in Enumerable.Range(0, count)){
var list = new List<int>();
list.AddRange(enumerable.Take(size));
slices.Add(list);
}
return slices;
}
I got this solution for the same problem:
int[] ints = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
IEnumerable<IEnumerable<int>> chunks = Chunk(ints, 2, t => t.Dump());
//won't enumerate, so won't do anything unless you force it:
chunks.ToList();
IEnumerable<T> Chunk<T, R>(IEnumerable<R> src, int n, Func<IEnumerable<R>, T> action){
IEnumerable<R> head;
IEnumerable<R> tail = src;
while (tail.Any())
{
head = tail.Take(n);
tail = tail.Skip(n);
yield return action(head);
}
}
if you just want the chunks returned, not do anything with them, use chunks = Chunk(ints, 2, t => t). What I would really like is to have to have t=>t as default action, but I haven't found out how to do that yet.

Categories

Resources