Pair-wise iteration in C#, or sliding window enumerator - c#

If I have an IEnumerable like:
string[] items = new string[] { "a", "b", "c", "d" };
I would like to loop thru all the pairs of consecutive items (sliding window of size 2). Which would be
("a", "b"), ("b", "c"), ("c", "d")
My solution was is this
public static IEnumerable<Pair<T, T>> Pairs(IEnumerable<T> enumerable) {
IEnumerator<T> e = enumerable.GetEnumerator(); e.MoveNext();
T current = e.Current;
while ( e.MoveNext() ) {
T next = e.Current;
yield return new Pair<T, T>(current, next);
current = next;
}
}
// used like this :
foreach (Pair<String,String> pair in IterTools<String>.Pairs(items)) {
System.Out.PrintLine("{0}, {1}", pair.First, pair.Second)
}
When I wrote this code, I wondered if there are already functions in the .NET framework that do the same thing and do it not just for pairs but for any size tuples.
IMHO there should be a nice way to do this kind of sliding window operations.
I use C# 2.0 and I can imagine that with C# 3.0 (using LINQ) there are more and nicer ways to do this, but I'm primarily interested in C# 2.0 solutions. Though, I will also appreciate C# 3.0 solutions.

In .NET 4 this becomes even easier:-
var input = new[] { "a", "b", "c", "d", "e", "f" };
var result = input.Zip(input.Skip(1), (a, b) => Tuple.Create(a, b));

Rather than require a tuple (pair) type, why not just accept a selector:
public static IEnumerable<TResult> Pairwise<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TSource, TResult> resultSelector)
{
TSource previous = default(TSource);
using (var it = source.GetEnumerator())
{
if (it.MoveNext())
previous = it.Current;
while (it.MoveNext())
yield return resultSelector(previous, previous = it.Current);
}
}
Which allows you to skip the intermediate object if you want:
string[] items = new string[] { "a", "b", "c", "d" };
var pairs = items.Pairwise((x, y) => string.Format("{0},{1}", x, y));
foreach(var pair in pairs)
Console.WriteLine(pair);
Or you can use an anonymous type:
var pairs = items.Pairwise((x, y) => new { First = x, Second = y });
Update: I just implemented this on a real project and used C# 7.0 ValueTuple instead:
public static IEnumerable<(T, T)> Pairwise<T>(this IEnumerable<T> source)
{
var previous = default(T);
using (var it = source.GetEnumerator())
{
if (it.MoveNext())
previous = it.Current;
while (it.MoveNext())
yield return (previous, previous = it.Current);
}
}

The easiest way is to use ReactiveExtensions
using System.Reactive;
using System.Reactive.Linq;
and make yourself an extension method to kit bash this together
public static IEnumerable<IList<T>> Buffer<T>(this IEnumerable<T> seq, int bufferSize, int stepSize)
{
return seq.ToObservable().Buffer(bufferSize, stepSize).ToEnumerable();
}

Just for convenience, here is a selector-less version of #dahlbyk's answer.
public static IEnumerable<Tuple<T, T>> Pairwise<T>(this IEnumerable<T> enumerable)
{
var previous = default(T);
using (var e = enumerable.GetEnumerator())
{
if (e.MoveNext())
previous = e.Current;
while (e.MoveNext())
yield return Tuple.Create(previous, previous = e.Current);
}
}

A little late to the party, but as an alternative to all these extension methods, one might use an actual "sliding" Collection to hold (and discard) the data.
Here is one I ended up making today:
public class SlidingWindowCollection<T> : ICollection<T>
{
private int _windowSize;
private Queue<T> _source;
public SlidingWindowCollection(int windowSize)
{
_windowSize = windowSize;
_source = new Queue<T>(windowSize);
}
public void Add(T item)
{
if (_source.Count == _windowSize)
{
_source.Dequeue();
}
_source.Enqueue(item);
}
public void Clear()
{
_source.Clear();
}
...and just keep forwarding all other ICollection<T> methods to _source.
}
Usage:
int pairSize = 2;
var slider = new SlidingWindowCollection<string>(pairSize);
foreach(var item in items)
{
slider.Add(item);
Console.WriteLine(string.Join(", ", slider));
}

Here is my solution using a Stack. It is short and concise.
string[] items = new string[] { "a", "b", "c", "d" };
Stack<string> stack = new Stack<string>(items.Reverse());
while(stack.Count > 1)
{
Console.WriteLine("{0},{1}", stack.Pop(), stack.Peek());
}
You can take the same concept and use a queue which avoids the need for reversing the items and is even simpler:
var queue = new Queue<string>(items);
while (queue.Count > 1)
{
Console.WriteLine("{0},{1}", queue.Dequeue(), queue.Peek());
}
A short word about performance:
I believe it's important to realize that unless you know that a task is causing a bottleneck in your real application, it's probably not worth working out what the truly fastest way of doing is. Instead, write the code which does the job for you. Also, use code you can remember, so it easily flows out of your hand the next time you need it.
Nevertheless, in case you care for some performance data for 10.000.000 random strings:
Run #1
InputZip 00:00:00.7355567
PairwiseExtension 00:00:00.5290042
Stack 00:00:00.6451204
Queue 00:00:00.3245580
ForLoop 00:00:00.7808004
TupleExtension 00:00:03.9661995
Run #2
InputZip 00:00:00.7386347
PairwiseExtension 00:00:00.5369850
Stack 00:00:00.6910079
Queue 00:00:00.3246276
ForLoop 00:00:00.8272945
TupleExtension 00:00:03.9415258
Tested using Jon Skeet's micro benchmarking tool.
If you want to take a look at the source for the test go here: gist here

Something like this:
public static IEnumerable<TResult> Pairwise<T, TResult>(this IEnumerable<T> enumerable, Func<T, T, TResult> selector)
{
var previous = enumerable.First();
foreach (var item in enumerable.Skip(1))
{
yield return selector(previous, item);
previous = item;
}
}

Expanding on the previous answer to avoid of O(n2) approach by explicitly using the passed iterator:
public static IEnumerable<IEnumerable<T>> Tuples<T>(this IEnumerable<T> input, int groupCount) {
if (null == input) throw new ArgumentException("input");
if (groupCount < 1) throw new ArgumentException("groupCount");
var e = input.GetEnumerator();
bool done = false;
while (!done) {
var l = new List<T>();
for (var n = 0; n < groupCount; ++n) {
if (!e.MoveNext()) {
if (n != 0) {
yield return l;
}
yield break;
}
l.Add(e.Current);
}
yield return l;
}
}
For C# 2, before extension methods, drop the "this" from the input parameter and call as a static method.

Forgive me if I'm overlooking something, but why not something simple, like a for loop?:
public static List <int []> ListOfPairs (int [] items)
{
List <int> output = new List <int>();
for (int i=0; i < items.Length-1; i++)
{
Int [] pair = new int [2];
pair [0]=items [i];
pair [1]=items [i+1];
output.Add (pair);
}
return output;
}

C# 3.0 solution (sorry:)
public static IEnumerable<IEnumerable<T>> Tuples<T>(this IEnumerable<T> sequence, int nTuple)
{
if(nTuple <= 0) throw new ArgumentOutOfRangeException("nTuple");
for(int i = 0; i <= sequence.Count() - nTuple; i++)
yield return sequence.Skip(i).Take(nTuple);
}
This isn't the most performant in the world, but it's sure pleasant to look at.
Really, the only thing making this a C# 3.0 solution is the .Skip.Take construct, so if you just change that to adding the elements in that range to a list instead, it should be golden for 2.0. That said, it's still not performant.

Alternate Pairs implementation, using last pair to store previous value:
static IEnumerable<Pair<T, T>> Pairs( IEnumerable<T> collection ) {
Pair<T, T> pair = null;
foreach( T item in collection ) {
if( pair == null )
pair = Pair.Create( default( T ), item );
else
yield return pair = Pair.Create( pair.Second, item );
}
}
Simple Window implementation (only safe for private use, if caller does not save returned arrays; see note):
static IEnumerable<T[]> Window( IEnumerable<T> collection, int windowSize ) {
if( windowSize < 1 )
yield break;
int index = 0;
T[] window = new T[windowSize];
foreach( var item in collection ) {
bool initializing = index < windowSize;
// Shift initialized window to accomodate new item.
if( !initializing )
Array.Copy( window, 1, window, 0, windowSize - 1 );
// Add current item to window.
int itemIndex = initializing ? index : windowSize - 1;
window[itemIndex] = item;
index++;
bool initialized = index >= windowSize;
if( initialized )
//NOTE: For public API, should return array copy to prevent
// modifcation by user, or use a different type for the window.
yield return window;
}
}
Example use:
for( int i = 0; i <= items.Length; ++i ) {
Console.WriteLine( "Window size {0}:", i );
foreach( string[] window in IterTools<string>.Window( items, i ) )
Console.WriteLine( string.Join( ", ", window ) );
Console.WriteLine( );
}

The F# Seq module defines the pairwise function over IEnumerable<T>, but this function is not in the .NET framework.
If it were already in the .NET framework, instead of returning pairs it would probably accept a selector function due to the lack of support for tuples in languages like C# and VB.
var pairs = ns.Pairwise( (a, b) => new { First = a, Second = b };
I don't think any of the answers here really improve on your simple iterator implementation, which seemed the most natural to me (and the poster dahlbyk by the looks of things!) too.

I created a slightly modified version of the late-2020-updated code in #dahlbyk's answer. It is better suited for projects with nullable reference types enabled (<Nullable>enable</Nullable>). I also added basic docs.
/// <summary>
/// Enumerates over tuples of pairs of the elements from the original sequence. I.e. { 1, 2, 3 } becomes { (1, 2), (2, 3) }. Note that { 1 } becomes { }.
/// </summary>
public static IEnumerable<(T, T)> Pairwise<T>(this IEnumerable<T> source)
{
using var it = source.GetEnumerator();
if (!it.MoveNext())
yield break;
var previous = it.Current;
while (it.MoveNext())
yield return (previous, previous = it.Current);
}

This will make a single pass through the IEnumerable.
items.Aggregate(new List<string[]>(), (list, next) => { if (list.Count > 0) list[^1][1] = next; list.Add(new[] { next, next }); return list; }, (list) => { list.RemoveAt(list.Count -1); return list; });

New C# language allows to do something like this:
var pairlist = new (string, string)[] { ("a", "b"), ("b", "c"), ("c", "d") };
foreach (var pair in pairlist)
{
//do something with pair.Item1 & pair.Item2

Related

How to compare adjacent deque elements in C#?

I am trying to port some code from Python to C#.
The goal is to compare adjacent elements and see if four consecutive values are bigger than one another.
Following this post (Compare two adjacent elements in same list) I was able to use pairwise on a deque to succesfully compare adjacent elements.
//In python
from more_itertools import pairwise
for x, y in pairwise(a_deque):
if (x < y):
i += 1
if (i == 4)
print("Hello world!")
The problem is that C# does not contain the more_itertools library so I am currently looking for a similar technique to achieve the same solution.
FYI. The deque implementation is from https://www.nuget.org/packages/Nito.Collections.Deque/
Any suggestions?
You can implement the python pairwise method yourself like this:
public static IEnumerable<(T, T)> Pairwise<T>(IEnumerable<T> collection)
{
using (var enumerator = collection.GetEnumerator())
{
enumerator.MoveNext();
var previous = enumerator.Current;
while (enumerator.MoveNext())
{
yield return (previous, enumerator.Current);
previous = enumerator.Current;
}
}
}
Then the algorithm in c# is structural very similar to the python version:
static void Main(string[] args)
{
var values = new[] { 5, 7, 8, 9, 10, 10, 8, 2, 1 };
var i = 0;
foreach (var (x, y) in Pairwise(values))
{
if (x < y)
{
i++;
if (i == 4)
Console.WriteLine("Hello world!");
}
}
Console.ReadLine();
}
There is a project called MoreLINQ with clever LINQ extensions. Most of the time though the code is really simple thanks to LINQ's simplicity. You can add it as a NuGet package or as individual source packages that add just the operators you want.
Pairwise.cs implements an operator that can apply a function to pairs of elements :
int[] numbers = { 123, 456, 789 };
var result = numbers.Pairwise((a, b) => a + b);
The source is really simple - retrieve an item and if we haven't reached the end, retrieve another one and apply the function :
public static IEnumerable<TResult> Pairwise<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TSource, TResult> resultSelector)
{
if (source == null) throw new ArgumentNullException(nameof(source));
if (resultSelector == null) throw new ArgumentNullException(nameof(resultSelector));
return _();
IEnumerable<TResult> _()
{
using (var e = source.GetEnumerator())
{
if (!e.MoveNext())
yield break;
var previous = e.Current;
while (e.MoveNext())
{
yield return resultSelector(previous, e.Current);
previous = e.Current;
}
}
}
}
The only "trick" is the use of the local iterator function named ... _
You can thing of the Pairwise operator as an optimized Window operator for just 2 items.
There's another Window operator that can return sequences of N items.
This expression :
var x=Enumerable.Range(1,5).Window(3);
Produces the following arrays :
{1,2,3}
{2,3,4}
{3,4,5}
Just create the function:
static IEnumerable<(T, T)> PairWise<T>(IEnumerable<T> collection)
{
var queue = new Queue<T>();
foreach (T item in collection)
{
queue.Enqueue(item);
if (queue.Count == 2)
{
T x = queue.Dequeue();
T y = queue.Peek();
yield return (x, y);
}
}
}
and use it:
foreach ((int x, int y) in PairWise(new[] {1, 2, 3, 4, 5}))
{
Console.WriteLine($"{x} {y}");
}
You could try like this:
for (int i = 0; i < arr.Length - 1; ++i)
{
int a = arr[i];
int b = arr[i + 1];
Console.WriteLine($"{a} {b}");
}

Lambda for List within a List [duplicate]

Is there any way I can separate a List<SomeObject> into several separate lists of SomeObject, using the item index as the delimiter of each split?
Let me exemplify:
I have a List<SomeObject> and I need a List<List<SomeObject>> or List<SomeObject>[], so that each of these resulting lists will contain a group of 3 items of the original list (sequentially).
eg.:
Original List: [a, g, e, w, p, s, q, f, x, y, i, m, c]
Resulting lists: [a, g, e], [w, p, s], [q, f, x], [y, i, m], [c]
I'd also need the resulting lists size to be a parameter of this function.
Try the following code.
public static List<List<T>> Split<T>(IList<T> source)
{
return source
.Select((x, i) => new { Index = i, Value = x })
.GroupBy(x => x.Index / 3)
.Select(x => x.Select(v => v.Value).ToList())
.ToList();
}
The idea is to first group the elements by indexes. Dividing by three has the effect of grouping them into groups of 3. Then convert each group to a list and the IEnumerable of List to a List of Lists
I just wrote this, and I think it's a little more elegant than the other proposed solutions:
/// <summary>
/// Break a list of items into chunks of a specific size
/// </summary>
public static IEnumerable<IEnumerable<T>> Chunk<T>(this IEnumerable<T> source, int chunksize)
{
while (source.Any())
{
yield return source.Take(chunksize);
source = source.Skip(chunksize);
}
}
In general the approach suggested by CaseyB works fine, in fact if you are passing in a List<T> it is hard to fault it, perhaps I would change it to:
public static IEnumerable<IEnumerable<T>> ChunkTrivialBetter<T>(this IEnumerable<T> source, int chunksize)
{
var pos = 0;
while (source.Skip(pos).Any())
{
yield return source.Skip(pos).Take(chunksize);
pos += chunksize;
}
}
Which will avoid massive call chains. Nonetheless, this approach has a general flaw. It materializes two enumerations per chunk, to highlight the issue try running:
foreach (var item in Enumerable.Range(1, int.MaxValue).Chunk(8).Skip(100000).First())
{
Console.WriteLine(item);
}
// wait forever
To overcome this we can try Cameron's approach, which passes the above test in flying colors as it only walks the enumeration once.
Trouble is that it has a different flaw, it materializes every item in each chunk, the trouble with that approach is that you run high on memory.
To illustrate that try running:
foreach (var item in Enumerable.Range(1, int.MaxValue)
.Select(x => x + new string('x', 100000))
.Clump(10000).Skip(100).First())
{
Console.Write('.');
}
// OutOfMemoryException
Finally, any implementation should be able to handle out of order iteration of chunks, for example:
Enumerable.Range(1,3).Chunk(2).Reverse().ToArray()
// should return [3],[1,2]
Many highly optimal solutions like my first revision of this answer failed there. The same issue can be seen in casperOne's optimized answer.
To address all these issues you can use the following:
namespace ChunkedEnumerator
{
public static class Extensions
{
class ChunkedEnumerable<T> : IEnumerable<T>
{
class ChildEnumerator : IEnumerator<T>
{
ChunkedEnumerable<T> parent;
int position;
bool done = false;
T current;
public ChildEnumerator(ChunkedEnumerable<T> parent)
{
this.parent = parent;
position = -1;
parent.wrapper.AddRef();
}
public T Current
{
get
{
if (position == -1 || done)
{
throw new InvalidOperationException();
}
return current;
}
}
public void Dispose()
{
if (!done)
{
done = true;
parent.wrapper.RemoveRef();
}
}
object System.Collections.IEnumerator.Current
{
get { return Current; }
}
public bool MoveNext()
{
position++;
if (position + 1 > parent.chunkSize)
{
done = true;
}
if (!done)
{
done = !parent.wrapper.Get(position + parent.start, out current);
}
return !done;
}
public void Reset()
{
// per http://msdn.microsoft.com/en-us/library/system.collections.ienumerator.reset.aspx
throw new NotSupportedException();
}
}
EnumeratorWrapper<T> wrapper;
int chunkSize;
int start;
public ChunkedEnumerable(EnumeratorWrapper<T> wrapper, int chunkSize, int start)
{
this.wrapper = wrapper;
this.chunkSize = chunkSize;
this.start = start;
}
public IEnumerator<T> GetEnumerator()
{
return new ChildEnumerator(this);
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
class EnumeratorWrapper<T>
{
public EnumeratorWrapper (IEnumerable<T> source)
{
SourceEumerable = source;
}
IEnumerable<T> SourceEumerable {get; set;}
Enumeration currentEnumeration;
class Enumeration
{
public IEnumerator<T> Source { get; set; }
public int Position { get; set; }
public bool AtEnd { get; set; }
}
public bool Get(int pos, out T item)
{
if (currentEnumeration != null && currentEnumeration.Position > pos)
{
currentEnumeration.Source.Dispose();
currentEnumeration = null;
}
if (currentEnumeration == null)
{
currentEnumeration = new Enumeration { Position = -1, Source = SourceEumerable.GetEnumerator(), AtEnd = false };
}
item = default(T);
if (currentEnumeration.AtEnd)
{
return false;
}
while(currentEnumeration.Position < pos)
{
currentEnumeration.AtEnd = !currentEnumeration.Source.MoveNext();
currentEnumeration.Position++;
if (currentEnumeration.AtEnd)
{
return false;
}
}
item = currentEnumeration.Source.Current;
return true;
}
int refs = 0;
// needed for dispose semantics
public void AddRef()
{
refs++;
}
public void RemoveRef()
{
refs--;
if (refs == 0 && currentEnumeration != null)
{
var copy = currentEnumeration;
currentEnumeration = null;
copy.Source.Dispose();
}
}
}
public static IEnumerable<IEnumerable<T>> Chunk<T>(this IEnumerable<T> source, int chunksize)
{
if (chunksize < 1) throw new InvalidOperationException();
var wrapper = new EnumeratorWrapper<T>(source);
int currentPos = 0;
T ignore;
try
{
wrapper.AddRef();
while (wrapper.Get(currentPos, out ignore))
{
yield return new ChunkedEnumerable<T>(wrapper, chunksize, currentPos);
currentPos += chunksize;
}
}
finally
{
wrapper.RemoveRef();
}
}
}
class Program
{
static void Main(string[] args)
{
int i = 10;
foreach (var group in Enumerable.Range(1, int.MaxValue).Skip(10000000).Chunk(3))
{
foreach (var n in group)
{
Console.Write(n);
Console.Write(" ");
}
Console.WriteLine();
if (i-- == 0) break;
}
var stuffs = Enumerable.Range(1, 10).Chunk(2).ToArray();
foreach (var idx in new [] {3,2,1})
{
Console.Write("idx " + idx + " ");
foreach (var n in stuffs[idx])
{
Console.Write(n);
Console.Write(" ");
}
Console.WriteLine();
}
/*
10000001 10000002 10000003
10000004 10000005 10000006
10000007 10000008 10000009
10000010 10000011 10000012
10000013 10000014 10000015
10000016 10000017 10000018
10000019 10000020 10000021
10000022 10000023 10000024
10000025 10000026 10000027
10000028 10000029 10000030
10000031 10000032 10000033
idx 3 7 8
idx 2 5 6
idx 1 3 4
*/
Console.ReadKey();
}
}
}
There is also a round of optimisations you could introduce for out-of-order iteration of chunks, which is out of scope here.
As to which method you should choose? It totally depends on the problem you are trying to solve. If you are not concerned with the first flaw the simple answer is incredibly appealing.
Note as with most methods, this is not safe for multi threading, stuff can get weird if you wish to make it thread safe you would need to amend EnumeratorWrapper.
You could use a number of queries that use Take and Skip, but that would add too many iterations on the original list, I believe.
Rather, I think you should create an iterator of your own, like so:
public static IEnumerable<IEnumerable<T>> GetEnumerableOfEnumerables<T>(
IEnumerable<T> enumerable, int groupSize)
{
// The list to return.
List<T> list = new List<T>(groupSize);
// Cycle through all of the items.
foreach (T item in enumerable)
{
// Add the item.
list.Add(item);
// If the list has the number of elements, return that.
if (list.Count == groupSize)
{
// Return the list.
yield return list;
// Set the list to a new list.
list = new List<T>(groupSize);
}
}
// Return the remainder if there is any,
if (list.Count != 0)
{
// Return the list.
yield return list;
}
}
You can then call this and it is LINQ enabled so you can perform other operations on the resulting sequences.
In light of Sam's answer, I felt there was an easier way to do this without:
Iterating through the list again (which I didn't do originally)
Materializing the items in groups before releasing the chunk (for large chunks of items, there would be memory issues)
All of the code that Sam posted
That said, here's another pass, which I've codified in an extension method to IEnumerable<T> called Chunk:
public static IEnumerable<IEnumerable<T>> Chunk<T>(this IEnumerable<T> source,
int chunkSize)
{
// Validate parameters.
if (source == null) throw new ArgumentNullException(nameof(source));
if (chunkSize <= 0) throw new ArgumentOutOfRangeException(nameof(chunkSize),
"The chunkSize parameter must be a positive value.");
// Call the internal implementation.
return source.ChunkInternal(chunkSize);
}
Nothing surprising up there, just basic error checking.
Moving on to ChunkInternal:
private static IEnumerable<IEnumerable<T>> ChunkInternal<T>(
this IEnumerable<T> source, int chunkSize)
{
// Validate parameters.
Debug.Assert(source != null);
Debug.Assert(chunkSize > 0);
// Get the enumerator. Dispose of when done.
using (IEnumerator<T> enumerator = source.GetEnumerator())
do
{
// Move to the next element. If there's nothing left
// then get out.
if (!enumerator.MoveNext()) yield break;
// Return the chunked sequence.
yield return ChunkSequence(enumerator, chunkSize);
} while (true);
}
Basically, it gets the IEnumerator<T> and manually iterates through each item. It checks to see if there any items currently to be enumerated. After each chunk is enumerated through, if there aren't any items left, it breaks out.
Once it detects there are items in the sequence, it delegates the responsibility for the inner IEnumerable<T> implementation to ChunkSequence:
private static IEnumerable<T> ChunkSequence<T>(IEnumerator<T> enumerator,
int chunkSize)
{
// Validate parameters.
Debug.Assert(enumerator != null);
Debug.Assert(chunkSize > 0);
// The count.
int count = 0;
// There is at least one item. Yield and then continue.
do
{
// Yield the item.
yield return enumerator.Current;
} while (++count < chunkSize && enumerator.MoveNext());
}
Since MoveNext was already called on the IEnumerator<T> passed to ChunkSequence, it yields the item returned by Current and then increments the count, making sure never to return more than chunkSize items and moving to the next item in the sequence after every iteration (but short-circuited if the number of items yielded exceeds the chunk size).
If there are no items left, then the InternalChunk method will make another pass in the outer loop, but when MoveNext is called a second time, it will still return false, as per the documentation (emphasis mine):
If MoveNext passes the end of the collection, the enumerator is
positioned after the last element in the collection and MoveNext
returns false. When the enumerator is at this position, subsequent
calls to MoveNext also return false until Reset is called.
At this point, the loop will break, and the sequence of sequences will terminate.
This is a simple test:
static void Main()
{
string s = "agewpsqfxyimc";
int count = 0;
// Group by three.
foreach (IEnumerable<char> g in s.Chunk(3))
{
// Print out the group.
Console.Write("Group: {0} - ", ++count);
// Print the items.
foreach (char c in g)
{
// Print the item.
Console.Write(c + ", ");
}
// Finish the line.
Console.WriteLine();
}
}
Output:
Group: 1 - a, g, e,
Group: 2 - w, p, s,
Group: 3 - q, f, x,
Group: 4 - y, i, m,
Group: 5 - c,
An important note, this will not work if you don't drain the entire child sequence or break at any point in the parent sequence. This is an important caveat, but if your use case is that you will consume every element of the sequence of sequences, then this will work for you.
Additionally, it will do strange things if you play with the order, just as Sam's did at one point.
Ok, here's my take on it:
completely lazy: works on infinite enumerables
no intermediate copying/buffering
O(n) execution time
works also when inner sequences are only partially consumed
public static IEnumerable<IEnumerable<T>> Chunks<T>(this IEnumerable<T> enumerable,
int chunkSize)
{
if (chunkSize < 1) throw new ArgumentException("chunkSize must be positive");
using (var e = enumerable.GetEnumerator())
while (e.MoveNext())
{
var remaining = chunkSize; // elements remaining in the current chunk
var innerMoveNext = new Func<bool>(() => --remaining > 0 && e.MoveNext());
yield return e.GetChunk(innerMoveNext);
while (innerMoveNext()) {/* discard elements skipped by inner iterator */}
}
}
private static IEnumerable<T> GetChunk<T>(this IEnumerator<T> e,
Func<bool> innerMoveNext)
{
do yield return e.Current;
while (innerMoveNext());
}
Example Usage
var src = new [] {1, 2, 3, 4, 5, 6};
var c3 = src.Chunks(3); // {{1, 2, 3}, {4, 5, 6}};
var c4 = src.Chunks(4); // {{1, 2, 3, 4}, {5, 6}};
var sum = c3.Select(c => c.Sum()); // {6, 15}
var count = c3.Count(); // 2
var take2 = c3.Select(c => c.Take(2)); // {{1, 2}, {4, 5}}
Explanations
The code works by nesting two yield based iterators.
The outer iterator must keep track of how many elements have been effectively consumed by the inner (chunk) iterator. This is done by closing over remaining with innerMoveNext(). Unconsumed elements of a chunk are discarded before the next chunk is yielded by the outer iterator.
This is necessary because otherwise you get inconsistent results, when the inner enumerables are not (completely) consumed (e.g. c3.Count() would return 6).
Note: The answer has been updated to address the shortcomings pointed out by #aolszowka.
Update .NET 6.0
.NET 6.0 added a new native Chunk method to the System.Linq namespace:
public static System.Collections.Generic.IEnumerable<TSource[]> Chunk<TSource> (
this System.Collections.Generic.IEnumerable<TSource> source, int size);
Using this new method every chunk except the last will be of size size. The last chunk will contain the remaining elements and may be of a smaller size.
Here is an example:
var list = Enumerable.Range(1, 100);
var chunkSize = 10;
foreach(var chunk in list.Chunk(chunkSize)) //Returns a chunk with the correct size.
{
Parallel.ForEach(chunk, (item) =>
{
//Do something Parallel here.
Console.WriteLine(item);
});
}
You’re probably thinking, well why not use Skip and Take? Which is true, I think this is just a bit more concise and makes things just that little bit more readable.
completely lazy, no counting or copying:
public static class EnumerableExtensions
{
public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> source, int len)
{
if (len == 0)
throw new ArgumentNullException();
var enumer = source.GetEnumerator();
while (enumer.MoveNext())
{
yield return Take(enumer.Current, enumer, len);
}
}
private static IEnumerable<T> Take<T>(T head, IEnumerator<T> tail, int len)
{
while (true)
{
yield return head;
if (--len == 0)
break;
if (tail.MoveNext())
head = tail.Current;
else
break;
}
}
}
I think the following suggestion would be the fastest. I am sacrificing the lazyness of the source Enumerable for the ability to use Array.Copy and knowing ahead of the time the length of each of my sublists.
public static IEnumerable<T[]> Chunk<T>(this IEnumerable<T> items, int size)
{
T[] array = items as T[] ?? items.ToArray();
for (int i = 0; i < array.Length; i+=size)
{
T[] chunk = new T[Math.Min(size, array.Length - i)];
Array.Copy(array, i, chunk, 0, chunk.Length);
yield return chunk;
}
}
For anyone interested in a packaged/maintained solution, the MoreLINQ library provides the Batch extension method which matches your requested behavior:
IEnumerable<char> source = "Example string";
IEnumerable<IEnumerable<char>> chunksOfThreeChars = source.Batch(3);
The Batch implementation is similar to Cameron MacFarland's answer, with the addition of an overload for transforming the chunk/batch before returning, and performs quite well.
I wrote a Clump extension method several years ago. Works great, and is the fastest implementation here. :P
/// <summary>
/// Clumps items into same size lots.
/// </summary>
/// <typeparam name="T"></typeparam>
/// <param name="source">The source list of items.</param>
/// <param name="size">The maximum size of the clumps to make.</param>
/// <returns>A list of list of items, where each list of items is no bigger than the size given.</returns>
public static IEnumerable<IEnumerable<T>> Clump<T>(this IEnumerable<T> source, int size)
{
if (source == null)
throw new ArgumentNullException("source");
if (size < 1)
throw new ArgumentOutOfRangeException("size", "size must be greater than 0");
return ClumpIterator<T>(source, size);
}
private static IEnumerable<IEnumerable<T>> ClumpIterator<T>(IEnumerable<T> source, int size)
{
Debug.Assert(source != null, "source is null.");
T[] items = new T[size];
int count = 0;
foreach (var item in source)
{
items[count] = item;
count++;
if (count == size)
{
yield return items;
items = new T[size];
count = 0;
}
}
if (count > 0)
{
if (count == size)
yield return items;
else
{
T[] tempItems = new T[count];
Array.Copy(items, tempItems, count);
yield return tempItems;
}
}
}
We can improve #JaredPar's solution to do true lazy evaluation. We use a GroupAdjacentBy method that yields groups of consecutive elements with the same key:
sequence
.Select((x, i) => new { Value = x, Index = i })
.GroupAdjacentBy(x=>x.Index/3)
.Select(g=>g.Select(x=>x.Value))
Because the groups are yielded one-by-one, this solution works efficiently with long or infinite sequences.
System.Interactive provides Buffer() for this purpose. Some quick testing shows performance is similar to Sam's solution.
I find this little snippet does the job quite nicely.
public static IEnumerable<List<T>> Chunked<T>(this List<T> source, int chunkSize)
{
var offset = 0;
while (offset < source.Count)
{
yield return source.GetRange(offset, Math.Min(source.Count - offset, chunkSize));
offset += chunkSize;
}
}
Here's a list splitting routine I wrote a couple months ago:
public static List<List<T>> Chunk<T>(
List<T> theList,
int chunkSize
)
{
List<List<T>> result = theList
.Select((x, i) => new {
data = x,
indexgroup = i / chunkSize
})
.GroupBy(x => x.indexgroup, x => x.data)
.Select(g => new List<T>(g))
.ToList();
return result;
}
We found David B's solution worked the best. But we adapted it to a more general solution:
list.GroupBy(item => item.SomeProperty)
.Select(group => new List<T>(group))
.ToArray();
What about this one?
var input = new List<string> { "a", "g", "e", "w", "p", "s", "q", "f", "x", "y", "i", "m", "c" };
var k = 3
var res = Enumerable.Range(0, (input.Count - 1) / k + 1)
.Select(i => input.GetRange(i * k, Math.Min(k, input.Count - i * k)))
.ToList();
As far as I know, GetRange() is linear in terms of number of items taken. So this should perform well.
This is an old question but this is what I ended up with; it enumerates the enumerable only once, but does create lists for each of the partitions. It doesn't suffer from unexpected behavior when ToArray() is called as some of the implementations do:
public static IEnumerable<IEnumerable<T>> Partition<T>(IEnumerable<T> source, int chunkSize)
{
if (source == null)
{
throw new ArgumentNullException("source");
}
if (chunkSize < 1)
{
throw new ArgumentException("Invalid chunkSize: " + chunkSize);
}
using (IEnumerator<T> sourceEnumerator = source.GetEnumerator())
{
IList<T> currentChunk = new List<T>();
while (sourceEnumerator.MoveNext())
{
currentChunk.Add(sourceEnumerator.Current);
if (currentChunk.Count == chunkSize)
{
yield return currentChunk;
currentChunk = new List<T>();
}
}
if (currentChunk.Any())
{
yield return currentChunk;
}
}
}
Old code, but this is what I've been using:
public static IEnumerable<List<T>> InSetsOf<T>(this IEnumerable<T> source, int max)
{
var toReturn = new List<T>(max);
foreach (var item in source)
{
toReturn.Add(item);
if (toReturn.Count == max)
{
yield return toReturn;
toReturn = new List<T>(max);
}
}
if (toReturn.Any())
{
yield return toReturn;
}
}
This following solution is the most compact I could come up with that is O(n).
public static IEnumerable<T[]> Chunk<T>(IEnumerable<T> source, int chunksize)
{
var list = source as IList<T> ?? source.ToList();
for (int start = 0; start < list.Count; start += chunksize)
{
T[] chunk = new T[Math.Min(chunksize, list.Count - start)];
for (int i = 0; i < chunk.Length; i++)
chunk[i] = list[start + i];
yield return chunk;
}
}
If the list is of type system.collections.generic you can use the "CopyTo" method available to copy elements of your array to other sub arrays. You specify the start element and number of elements to copy.
You could also make 3 clones of your original list and use the "RemoveRange" on each list to shrink the list to the size you want.
Or just create a helper method to do it for you.
It's an old solution but I had a different approach. I use Skip to move to desired offset and Take to extract desired number of elements:
public static IEnumerable<IEnumerable<T>> Chunk<T>(this IEnumerable<T> source,
int chunkSize)
{
if (chunkSize <= 0)
throw new ArgumentOutOfRangeException($"{nameof(chunkSize)} should be > 0");
var nbChunks = (int)Math.Ceiling((double)source.Count()/chunkSize);
return Enumerable.Range(0, nbChunks)
.Select(chunkNb => source.Skip(chunkNb*chunkSize)
.Take(chunkSize));
}
Another way is using Rx Buffer operator
//using System.Linq;
//using System.Reactive.Linq;
//using System.Reactive.Threading.Tasks;
var observableBatches = anAnumerable.ToObservable().Buffer(size);
var batches = aList.ToObservable().Buffer(size).ToList().ToTask().GetAwaiter().GetResult();
The question was how to "Split List into Sublists with LINQ", but sometimes you may want those sub-lists to be references to the original list, not copies. This allows you to modify the original list from the sub-lists. In that case, this may work for you.
public static IEnumerable<Memory<T>> RefChunkBy<T>(this T[] array, int size)
{
if (size < 1 || array is null)
{
throw new ArgumentException("chunkSize must be positive");
}
var index = 0;
var counter = 0;
for (int i = 0; i < array.Length; i++)
{
if (counter == size)
{
yield return new Memory<T>(array, index, size);
index = i;
counter = 0;
}
counter++;
if (i + 1 == array.Length)
{
yield return new Memory<T>(array, index, array.Length - index);
}
}
}
Usage:
var src = new[] { 1, 2, 3, 4, 5, 6 };
var c3 = RefChunkBy(src, 3); // {{1, 2, 3}, {4, 5, 6}};
var c4 = RefChunkBy(src, 4); // {{1, 2, 3, 4}, {5, 6}};
// as extension method
var c3 = src.RefChunkBy(3); // {{1, 2, 3}, {4, 5, 6}};
var c4 = src.RefChunkBy(4); // {{1, 2, 3, 4}, {5, 6}};
var sum = c3.Select(c => c.Span.ToArray().Sum()); // {6, 15}
var count = c3.Count(); // 2
var take2 = c3.Select(c => c.Span.ToArray().Take(2)); // {{1, 2}, {4, 5}}
Feel free to make this code better.
Using modular partitioning:
public IEnumerable<IEnumerable<string>> Split(IEnumerable<string> input, int chunkSize)
{
var chunks = (int)Math.Ceiling((double)input.Count() / (double)chunkSize);
return Enumerable.Range(0, chunks).Select(id => input.Where(s => s.GetHashCode() % chunks == id));
}
Just putting in my two cents. If you wanted to "bucket" the list (visualize left to right), you could do the following:
public static List<List<T>> Buckets<T>(this List<T> source, int numberOfBuckets)
{
List<List<T>> result = new List<List<T>>();
for (int i = 0; i < numberOfBuckets; i++)
{
result.Add(new List<T>());
}
int count = 0;
while (count < source.Count())
{
var mod = count % numberOfBuckets;
result[mod].Add(source[count]);
count++;
}
return result;
}
public static List<List<T>> GetSplitItemsList<T>(List<T> originalItemsList, short number)
{
var listGroup = new List<List<T>>();
int j = number;
for (int i = 0; i < originalItemsList.Count; i += number)
{
var cList = originalItemsList.Take(j).Skip(i).ToList();
j += number;
listGroup.Add(cList);
}
return listGroup;
}
To insert my two cents...
By using the list type for the source to be chunked, I found another very compact solution:
public static IEnumerable<IEnumerable<TSource>> Chunk<TSource>(this IEnumerable<TSource> source, int chunkSize)
{
// copy the source into a list
var chunkList = source.ToList();
// return chunks of 'chunkSize' items
while (chunkList.Count > chunkSize)
{
yield return chunkList.GetRange(0, chunkSize);
chunkList.RemoveRange(0, chunkSize);
}
// return the rest
yield return chunkList;
}
I took the primary answer and made it to be an IOC container to determine where to split. (For who is really looking to only split on 3 items, in reading this post while searching for an answer?)
This method allows one to split on any type of item as needed.
public static List<List<T>> SplitOn<T>(List<T> main, Func<T, bool> splitOn)
{
int groupIndex = 0;
return main.Select( item => new
{
Group = (splitOn.Invoke(item) ? ++groupIndex : groupIndex),
Value = item
})
.GroupBy( it2 => it2.Group)
.Select(x => x.Select(v => v.Value).ToList())
.ToList();
}
So for the OP the code would be
var it = new List<string>()
{ "a", "g", "e", "w", "p", "s", "q", "f", "x", "y", "i", "m", "c" };
int index = 0;
var result = SplitOn(it, (itm) => (index++ % 3) == 0 );
So performatic as the Sam Saffron's approach.
public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> source, int size)
{
if (source == null) throw new ArgumentNullException(nameof(source));
if (size <= 0) throw new ArgumentOutOfRangeException(nameof(size), "Size must be greater than zero.");
return BatchImpl(source, size).TakeWhile(x => x.Any());
}
static IEnumerable<IEnumerable<T>> BatchImpl<T>(this IEnumerable<T> source, int size)
{
var values = new List<T>();
var group = 1;
var disposed = false;
var e = source.GetEnumerator();
try
{
while (!disposed)
{
yield return GetBatch(e, values, group, size, () => { e.Dispose(); disposed = true; });
group++;
}
}
finally
{
if (!disposed)
e.Dispose();
}
}
static IEnumerable<T> GetBatch<T>(IEnumerator<T> e, List<T> values, int group, int size, Action dispose)
{
var min = (group - 1) * size + 1;
var max = group * size;
var hasValue = false;
while (values.Count < min && e.MoveNext())
{
values.Add(e.Current);
}
for (var i = min; i <= max; i++)
{
if (i <= values.Count)
{
hasValue = true;
}
else if (hasValue = e.MoveNext())
{
values.Add(e.Current);
}
else
{
dispose();
}
if (hasValue)
yield return values[i - 1];
else
yield break;
}
}
}
Can work with infinite generators:
a.Zip(a.Skip(1), (x, y) => Enumerable.Repeat(x, 1).Concat(Enumerable.Repeat(y, 1)))
.Zip(a.Skip(2), (xy, z) => xy.Concat(Enumerable.Repeat(z, 1)))
.Where((x, i) => i % 3 == 0)
Demo code: https://ideone.com/GKmL7M
using System;
using System.Collections.Generic;
using System.Linq;
public class Test
{
private static void DoIt(IEnumerable<int> a)
{
Console.WriteLine(String.Join(" ", a));
foreach (var x in a.Zip(a.Skip(1), (x, y) => Enumerable.Repeat(x, 1).Concat(Enumerable.Repeat(y, 1))).Zip(a.Skip(2), (xy, z) => xy.Concat(Enumerable.Repeat(z, 1))).Where((x, i) => i % 3 == 0))
Console.WriteLine(String.Join(" ", x));
Console.WriteLine();
}
public static void Main()
{
DoIt(new int[] {1});
DoIt(new int[] {1, 2});
DoIt(new int[] {1, 2, 3});
DoIt(new int[] {1, 2, 3, 4});
DoIt(new int[] {1, 2, 3, 4, 5});
DoIt(new int[] {1, 2, 3, 4, 5, 6});
}
}
1
1 2
1 2 3
1 2 3
1 2 3 4
1 2 3
1 2 3 4 5
1 2 3
1 2 3 4 5 6
1 2 3
4 5 6
But actually I would prefer to write corresponding method without linq.

Just in time computation of combinations [duplicate]

This question already has answers here:
System.OutOfMemoryException when generating permutations
(4 answers)
Closed 6 years ago.
I want to do something with every combination of ternaries for N variables:
example with 1:
T
F
U
example with 2:
TT
FT
UT
TF
FF
UF
UU
Is there a way to compute this but only as needed: For example:
var combinator = new Combinator<string>(2, {"T","F","U"});
List<String> tt = combinator.Next();
//tt contains {"T","T"}
You can implement it in an iterator method:
private IEnumerable<List<T>> Combinations<T>(int n, T[] values)
{
if (n == 0) yield return new List<T>();
else
{
foreach (var list in Combinations(n - 1, values))
foreach (var item in values)
{
var items = new List<T>(list);
items.Add(item);
yield return items;
}
}
}
This creates all combinations, but does it in a lazy way.
If you want you can create Combinator class like this:
class Combinator<T>
{
IEnumerator<List<T>> enumerator;
public Combinator(int n, T[] values)
{
enumerator = Combinations(n, values).GetEnumerator();
}
public List<T> Next()
{
return enumerator.MoveNext() ? enumerator.Current : null;
}
private IEnumerable<List<T>> Combinations<T>(int n, T[] values) { ... }
}
Probably not the most computationally efficient, but it's combinatorics, so the complexity is probably stuck being awful:
public static IEnumerable<List<T>> Combinations<T>( int count, IEnumerable<T> items )
{
if( count <= 0 ) yield break;
if( count == 1 )
{
foreach( var item in items ) yield return new List<T> { item };
yield break;
}
foreach( var item in items )
{
foreach( var combo in Combinations<T>( count - 1, items ) )
{
var result = new List<T> { item };
result.AddRange( combo );
yield return result;
}
}
}
It is not clear how you are getting your combinations of TFU.
You list only the following:
TT
FT
UT
TF
FF
UF
UU
However that is missing two combinations, and it should be like this (as far as I can work out):
TT
FT
UT
TF
FF
UF
TU
FU
UU
Assuming that the latter is actually the correct list, then you can compute it "on demand" like so:
using System;
using System.Collections.Generic;
using System.Linq;
namespace ConsoleApplication1
{
class Program
{
public static void Main()
{
foreach (var combination in Combinator(new [] { "T", "F", "U" }, 2))
Console.WriteLine(string.Concat(combination));
}
public static IEnumerable<IEnumerable<T>> Combinator<T>(IEnumerable<T> sequence, int count)
{
if (count == 0)
{
yield return Enumerable.Empty<T>();
yield break;
}
foreach (T startingElement in sequence)
{
IEnumerable<T> remainingItems = sequence;
foreach (IEnumerable<T> permutationOfRemainder in Combinator(remainingItems, count - 1))
yield return permutationOfRemainder.Concat(new [] { startingElement});
}
}
}
}
You could achieve this by giving the cominations some ordering, so you can basically assign a mapping between them and the numbers from 1 to n^m (where n is the length of permutations and m is the number of strings). Then saving the state.
However what this does is bassically reimplementing IEnumerable. https://msdn.microsoft.com/de-de/library/system.collections.ienumerable(v=vs.110).aspx
Even simpler, if you need this in some sort of foreach loop, would be to use just a method that returns IEnumerable. IEnumerable will be lazily evaluated if you use the yield syntax.
http://www.dotnetperls.com/yield

Split List into Sublists with LINQ

Is there any way I can separate a List<SomeObject> into several separate lists of SomeObject, using the item index as the delimiter of each split?
Let me exemplify:
I have a List<SomeObject> and I need a List<List<SomeObject>> or List<SomeObject>[], so that each of these resulting lists will contain a group of 3 items of the original list (sequentially).
eg.:
Original List: [a, g, e, w, p, s, q, f, x, y, i, m, c]
Resulting lists: [a, g, e], [w, p, s], [q, f, x], [y, i, m], [c]
I'd also need the resulting lists size to be a parameter of this function.
Try the following code.
public static List<List<T>> Split<T>(IList<T> source)
{
return source
.Select((x, i) => new { Index = i, Value = x })
.GroupBy(x => x.Index / 3)
.Select(x => x.Select(v => v.Value).ToList())
.ToList();
}
The idea is to first group the elements by indexes. Dividing by three has the effect of grouping them into groups of 3. Then convert each group to a list and the IEnumerable of List to a List of Lists
I just wrote this, and I think it's a little more elegant than the other proposed solutions:
/// <summary>
/// Break a list of items into chunks of a specific size
/// </summary>
public static IEnumerable<IEnumerable<T>> Chunk<T>(this IEnumerable<T> source, int chunksize)
{
while (source.Any())
{
yield return source.Take(chunksize);
source = source.Skip(chunksize);
}
}
In general the approach suggested by CaseyB works fine, in fact if you are passing in a List<T> it is hard to fault it, perhaps I would change it to:
public static IEnumerable<IEnumerable<T>> ChunkTrivialBetter<T>(this IEnumerable<T> source, int chunksize)
{
var pos = 0;
while (source.Skip(pos).Any())
{
yield return source.Skip(pos).Take(chunksize);
pos += chunksize;
}
}
Which will avoid massive call chains. Nonetheless, this approach has a general flaw. It materializes two enumerations per chunk, to highlight the issue try running:
foreach (var item in Enumerable.Range(1, int.MaxValue).Chunk(8).Skip(100000).First())
{
Console.WriteLine(item);
}
// wait forever
To overcome this we can try Cameron's approach, which passes the above test in flying colors as it only walks the enumeration once.
Trouble is that it has a different flaw, it materializes every item in each chunk, the trouble with that approach is that you run high on memory.
To illustrate that try running:
foreach (var item in Enumerable.Range(1, int.MaxValue)
.Select(x => x + new string('x', 100000))
.Clump(10000).Skip(100).First())
{
Console.Write('.');
}
// OutOfMemoryException
Finally, any implementation should be able to handle out of order iteration of chunks, for example:
Enumerable.Range(1,3).Chunk(2).Reverse().ToArray()
// should return [3],[1,2]
Many highly optimal solutions like my first revision of this answer failed there. The same issue can be seen in casperOne's optimized answer.
To address all these issues you can use the following:
namespace ChunkedEnumerator
{
public static class Extensions
{
class ChunkedEnumerable<T> : IEnumerable<T>
{
class ChildEnumerator : IEnumerator<T>
{
ChunkedEnumerable<T> parent;
int position;
bool done = false;
T current;
public ChildEnumerator(ChunkedEnumerable<T> parent)
{
this.parent = parent;
position = -1;
parent.wrapper.AddRef();
}
public T Current
{
get
{
if (position == -1 || done)
{
throw new InvalidOperationException();
}
return current;
}
}
public void Dispose()
{
if (!done)
{
done = true;
parent.wrapper.RemoveRef();
}
}
object System.Collections.IEnumerator.Current
{
get { return Current; }
}
public bool MoveNext()
{
position++;
if (position + 1 > parent.chunkSize)
{
done = true;
}
if (!done)
{
done = !parent.wrapper.Get(position + parent.start, out current);
}
return !done;
}
public void Reset()
{
// per http://msdn.microsoft.com/en-us/library/system.collections.ienumerator.reset.aspx
throw new NotSupportedException();
}
}
EnumeratorWrapper<T> wrapper;
int chunkSize;
int start;
public ChunkedEnumerable(EnumeratorWrapper<T> wrapper, int chunkSize, int start)
{
this.wrapper = wrapper;
this.chunkSize = chunkSize;
this.start = start;
}
public IEnumerator<T> GetEnumerator()
{
return new ChildEnumerator(this);
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
class EnumeratorWrapper<T>
{
public EnumeratorWrapper (IEnumerable<T> source)
{
SourceEumerable = source;
}
IEnumerable<T> SourceEumerable {get; set;}
Enumeration currentEnumeration;
class Enumeration
{
public IEnumerator<T> Source { get; set; }
public int Position { get; set; }
public bool AtEnd { get; set; }
}
public bool Get(int pos, out T item)
{
if (currentEnumeration != null && currentEnumeration.Position > pos)
{
currentEnumeration.Source.Dispose();
currentEnumeration = null;
}
if (currentEnumeration == null)
{
currentEnumeration = new Enumeration { Position = -1, Source = SourceEumerable.GetEnumerator(), AtEnd = false };
}
item = default(T);
if (currentEnumeration.AtEnd)
{
return false;
}
while(currentEnumeration.Position < pos)
{
currentEnumeration.AtEnd = !currentEnumeration.Source.MoveNext();
currentEnumeration.Position++;
if (currentEnumeration.AtEnd)
{
return false;
}
}
item = currentEnumeration.Source.Current;
return true;
}
int refs = 0;
// needed for dispose semantics
public void AddRef()
{
refs++;
}
public void RemoveRef()
{
refs--;
if (refs == 0 && currentEnumeration != null)
{
var copy = currentEnumeration;
currentEnumeration = null;
copy.Source.Dispose();
}
}
}
public static IEnumerable<IEnumerable<T>> Chunk<T>(this IEnumerable<T> source, int chunksize)
{
if (chunksize < 1) throw new InvalidOperationException();
var wrapper = new EnumeratorWrapper<T>(source);
int currentPos = 0;
T ignore;
try
{
wrapper.AddRef();
while (wrapper.Get(currentPos, out ignore))
{
yield return new ChunkedEnumerable<T>(wrapper, chunksize, currentPos);
currentPos += chunksize;
}
}
finally
{
wrapper.RemoveRef();
}
}
}
class Program
{
static void Main(string[] args)
{
int i = 10;
foreach (var group in Enumerable.Range(1, int.MaxValue).Skip(10000000).Chunk(3))
{
foreach (var n in group)
{
Console.Write(n);
Console.Write(" ");
}
Console.WriteLine();
if (i-- == 0) break;
}
var stuffs = Enumerable.Range(1, 10).Chunk(2).ToArray();
foreach (var idx in new [] {3,2,1})
{
Console.Write("idx " + idx + " ");
foreach (var n in stuffs[idx])
{
Console.Write(n);
Console.Write(" ");
}
Console.WriteLine();
}
/*
10000001 10000002 10000003
10000004 10000005 10000006
10000007 10000008 10000009
10000010 10000011 10000012
10000013 10000014 10000015
10000016 10000017 10000018
10000019 10000020 10000021
10000022 10000023 10000024
10000025 10000026 10000027
10000028 10000029 10000030
10000031 10000032 10000033
idx 3 7 8
idx 2 5 6
idx 1 3 4
*/
Console.ReadKey();
}
}
}
There is also a round of optimisations you could introduce for out-of-order iteration of chunks, which is out of scope here.
As to which method you should choose? It totally depends on the problem you are trying to solve. If you are not concerned with the first flaw the simple answer is incredibly appealing.
Note as with most methods, this is not safe for multi threading, stuff can get weird if you wish to make it thread safe you would need to amend EnumeratorWrapper.
You could use a number of queries that use Take and Skip, but that would add too many iterations on the original list, I believe.
Rather, I think you should create an iterator of your own, like so:
public static IEnumerable<IEnumerable<T>> GetEnumerableOfEnumerables<T>(
IEnumerable<T> enumerable, int groupSize)
{
// The list to return.
List<T> list = new List<T>(groupSize);
// Cycle through all of the items.
foreach (T item in enumerable)
{
// Add the item.
list.Add(item);
// If the list has the number of elements, return that.
if (list.Count == groupSize)
{
// Return the list.
yield return list;
// Set the list to a new list.
list = new List<T>(groupSize);
}
}
// Return the remainder if there is any,
if (list.Count != 0)
{
// Return the list.
yield return list;
}
}
You can then call this and it is LINQ enabled so you can perform other operations on the resulting sequences.
In light of Sam's answer, I felt there was an easier way to do this without:
Iterating through the list again (which I didn't do originally)
Materializing the items in groups before releasing the chunk (for large chunks of items, there would be memory issues)
All of the code that Sam posted
That said, here's another pass, which I've codified in an extension method to IEnumerable<T> called Chunk:
public static IEnumerable<IEnumerable<T>> Chunk<T>(this IEnumerable<T> source,
int chunkSize)
{
// Validate parameters.
if (source == null) throw new ArgumentNullException(nameof(source));
if (chunkSize <= 0) throw new ArgumentOutOfRangeException(nameof(chunkSize),
"The chunkSize parameter must be a positive value.");
// Call the internal implementation.
return source.ChunkInternal(chunkSize);
}
Nothing surprising up there, just basic error checking.
Moving on to ChunkInternal:
private static IEnumerable<IEnumerable<T>> ChunkInternal<T>(
this IEnumerable<T> source, int chunkSize)
{
// Validate parameters.
Debug.Assert(source != null);
Debug.Assert(chunkSize > 0);
// Get the enumerator. Dispose of when done.
using (IEnumerator<T> enumerator = source.GetEnumerator())
do
{
// Move to the next element. If there's nothing left
// then get out.
if (!enumerator.MoveNext()) yield break;
// Return the chunked sequence.
yield return ChunkSequence(enumerator, chunkSize);
} while (true);
}
Basically, it gets the IEnumerator<T> and manually iterates through each item. It checks to see if there any items currently to be enumerated. After each chunk is enumerated through, if there aren't any items left, it breaks out.
Once it detects there are items in the sequence, it delegates the responsibility for the inner IEnumerable<T> implementation to ChunkSequence:
private static IEnumerable<T> ChunkSequence<T>(IEnumerator<T> enumerator,
int chunkSize)
{
// Validate parameters.
Debug.Assert(enumerator != null);
Debug.Assert(chunkSize > 0);
// The count.
int count = 0;
// There is at least one item. Yield and then continue.
do
{
// Yield the item.
yield return enumerator.Current;
} while (++count < chunkSize && enumerator.MoveNext());
}
Since MoveNext was already called on the IEnumerator<T> passed to ChunkSequence, it yields the item returned by Current and then increments the count, making sure never to return more than chunkSize items and moving to the next item in the sequence after every iteration (but short-circuited if the number of items yielded exceeds the chunk size).
If there are no items left, then the InternalChunk method will make another pass in the outer loop, but when MoveNext is called a second time, it will still return false, as per the documentation (emphasis mine):
If MoveNext passes the end of the collection, the enumerator is
positioned after the last element in the collection and MoveNext
returns false. When the enumerator is at this position, subsequent
calls to MoveNext also return false until Reset is called.
At this point, the loop will break, and the sequence of sequences will terminate.
This is a simple test:
static void Main()
{
string s = "agewpsqfxyimc";
int count = 0;
// Group by three.
foreach (IEnumerable<char> g in s.Chunk(3))
{
// Print out the group.
Console.Write("Group: {0} - ", ++count);
// Print the items.
foreach (char c in g)
{
// Print the item.
Console.Write(c + ", ");
}
// Finish the line.
Console.WriteLine();
}
}
Output:
Group: 1 - a, g, e,
Group: 2 - w, p, s,
Group: 3 - q, f, x,
Group: 4 - y, i, m,
Group: 5 - c,
An important note, this will not work if you don't drain the entire child sequence or break at any point in the parent sequence. This is an important caveat, but if your use case is that you will consume every element of the sequence of sequences, then this will work for you.
Additionally, it will do strange things if you play with the order, just as Sam's did at one point.
Ok, here's my take on it:
completely lazy: works on infinite enumerables
no intermediate copying/buffering
O(n) execution time
works also when inner sequences are only partially consumed
public static IEnumerable<IEnumerable<T>> Chunks<T>(this IEnumerable<T> enumerable,
int chunkSize)
{
if (chunkSize < 1) throw new ArgumentException("chunkSize must be positive");
using (var e = enumerable.GetEnumerator())
while (e.MoveNext())
{
var remaining = chunkSize; // elements remaining in the current chunk
var innerMoveNext = new Func<bool>(() => --remaining > 0 && e.MoveNext());
yield return e.GetChunk(innerMoveNext);
while (innerMoveNext()) {/* discard elements skipped by inner iterator */}
}
}
private static IEnumerable<T> GetChunk<T>(this IEnumerator<T> e,
Func<bool> innerMoveNext)
{
do yield return e.Current;
while (innerMoveNext());
}
Example Usage
var src = new [] {1, 2, 3, 4, 5, 6};
var c3 = src.Chunks(3); // {{1, 2, 3}, {4, 5, 6}};
var c4 = src.Chunks(4); // {{1, 2, 3, 4}, {5, 6}};
var sum = c3.Select(c => c.Sum()); // {6, 15}
var count = c3.Count(); // 2
var take2 = c3.Select(c => c.Take(2)); // {{1, 2}, {4, 5}}
Explanations
The code works by nesting two yield based iterators.
The outer iterator must keep track of how many elements have been effectively consumed by the inner (chunk) iterator. This is done by closing over remaining with innerMoveNext(). Unconsumed elements of a chunk are discarded before the next chunk is yielded by the outer iterator.
This is necessary because otherwise you get inconsistent results, when the inner enumerables are not (completely) consumed (e.g. c3.Count() would return 6).
Note: The answer has been updated to address the shortcomings pointed out by #aolszowka.
Update .NET 6.0
.NET 6.0 added a new native Chunk method to the System.Linq namespace:
public static System.Collections.Generic.IEnumerable<TSource[]> Chunk<TSource> (
this System.Collections.Generic.IEnumerable<TSource> source, int size);
Using this new method every chunk except the last will be of size size. The last chunk will contain the remaining elements and may be of a smaller size.
Here is an example:
var list = Enumerable.Range(1, 100);
var chunkSize = 10;
foreach(var chunk in list.Chunk(chunkSize)) //Returns a chunk with the correct size.
{
Parallel.ForEach(chunk, (item) =>
{
//Do something Parallel here.
Console.WriteLine(item);
});
}
You’re probably thinking, well why not use Skip and Take? Which is true, I think this is just a bit more concise and makes things just that little bit more readable.
completely lazy, no counting or copying:
public static class EnumerableExtensions
{
public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> source, int len)
{
if (len == 0)
throw new ArgumentNullException();
var enumer = source.GetEnumerator();
while (enumer.MoveNext())
{
yield return Take(enumer.Current, enumer, len);
}
}
private static IEnumerable<T> Take<T>(T head, IEnumerator<T> tail, int len)
{
while (true)
{
yield return head;
if (--len == 0)
break;
if (tail.MoveNext())
head = tail.Current;
else
break;
}
}
}
I think the following suggestion would be the fastest. I am sacrificing the lazyness of the source Enumerable for the ability to use Array.Copy and knowing ahead of the time the length of each of my sublists.
public static IEnumerable<T[]> Chunk<T>(this IEnumerable<T> items, int size)
{
T[] array = items as T[] ?? items.ToArray();
for (int i = 0; i < array.Length; i+=size)
{
T[] chunk = new T[Math.Min(size, array.Length - i)];
Array.Copy(array, i, chunk, 0, chunk.Length);
yield return chunk;
}
}
For anyone interested in a packaged/maintained solution, the MoreLINQ library provides the Batch extension method which matches your requested behavior:
IEnumerable<char> source = "Example string";
IEnumerable<IEnumerable<char>> chunksOfThreeChars = source.Batch(3);
The Batch implementation is similar to Cameron MacFarland's answer, with the addition of an overload for transforming the chunk/batch before returning, and performs quite well.
I wrote a Clump extension method several years ago. Works great, and is the fastest implementation here. :P
/// <summary>
/// Clumps items into same size lots.
/// </summary>
/// <typeparam name="T"></typeparam>
/// <param name="source">The source list of items.</param>
/// <param name="size">The maximum size of the clumps to make.</param>
/// <returns>A list of list of items, where each list of items is no bigger than the size given.</returns>
public static IEnumerable<IEnumerable<T>> Clump<T>(this IEnumerable<T> source, int size)
{
if (source == null)
throw new ArgumentNullException("source");
if (size < 1)
throw new ArgumentOutOfRangeException("size", "size must be greater than 0");
return ClumpIterator<T>(source, size);
}
private static IEnumerable<IEnumerable<T>> ClumpIterator<T>(IEnumerable<T> source, int size)
{
Debug.Assert(source != null, "source is null.");
T[] items = new T[size];
int count = 0;
foreach (var item in source)
{
items[count] = item;
count++;
if (count == size)
{
yield return items;
items = new T[size];
count = 0;
}
}
if (count > 0)
{
if (count == size)
yield return items;
else
{
T[] tempItems = new T[count];
Array.Copy(items, tempItems, count);
yield return tempItems;
}
}
}
We can improve #JaredPar's solution to do true lazy evaluation. We use a GroupAdjacentBy method that yields groups of consecutive elements with the same key:
sequence
.Select((x, i) => new { Value = x, Index = i })
.GroupAdjacentBy(x=>x.Index/3)
.Select(g=>g.Select(x=>x.Value))
Because the groups are yielded one-by-one, this solution works efficiently with long or infinite sequences.
System.Interactive provides Buffer() for this purpose. Some quick testing shows performance is similar to Sam's solution.
I find this little snippet does the job quite nicely.
public static IEnumerable<List<T>> Chunked<T>(this List<T> source, int chunkSize)
{
var offset = 0;
while (offset < source.Count)
{
yield return source.GetRange(offset, Math.Min(source.Count - offset, chunkSize));
offset += chunkSize;
}
}
Here's a list splitting routine I wrote a couple months ago:
public static List<List<T>> Chunk<T>(
List<T> theList,
int chunkSize
)
{
List<List<T>> result = theList
.Select((x, i) => new {
data = x,
indexgroup = i / chunkSize
})
.GroupBy(x => x.indexgroup, x => x.data)
.Select(g => new List<T>(g))
.ToList();
return result;
}
We found David B's solution worked the best. But we adapted it to a more general solution:
list.GroupBy(item => item.SomeProperty)
.Select(group => new List<T>(group))
.ToArray();
What about this one?
var input = new List<string> { "a", "g", "e", "w", "p", "s", "q", "f", "x", "y", "i", "m", "c" };
var k = 3
var res = Enumerable.Range(0, (input.Count - 1) / k + 1)
.Select(i => input.GetRange(i * k, Math.Min(k, input.Count - i * k)))
.ToList();
As far as I know, GetRange() is linear in terms of number of items taken. So this should perform well.
This is an old question but this is what I ended up with; it enumerates the enumerable only once, but does create lists for each of the partitions. It doesn't suffer from unexpected behavior when ToArray() is called as some of the implementations do:
public static IEnumerable<IEnumerable<T>> Partition<T>(IEnumerable<T> source, int chunkSize)
{
if (source == null)
{
throw new ArgumentNullException("source");
}
if (chunkSize < 1)
{
throw new ArgumentException("Invalid chunkSize: " + chunkSize);
}
using (IEnumerator<T> sourceEnumerator = source.GetEnumerator())
{
IList<T> currentChunk = new List<T>();
while (sourceEnumerator.MoveNext())
{
currentChunk.Add(sourceEnumerator.Current);
if (currentChunk.Count == chunkSize)
{
yield return currentChunk;
currentChunk = new List<T>();
}
}
if (currentChunk.Any())
{
yield return currentChunk;
}
}
}
Old code, but this is what I've been using:
public static IEnumerable<List<T>> InSetsOf<T>(this IEnumerable<T> source, int max)
{
var toReturn = new List<T>(max);
foreach (var item in source)
{
toReturn.Add(item);
if (toReturn.Count == max)
{
yield return toReturn;
toReturn = new List<T>(max);
}
}
if (toReturn.Any())
{
yield return toReturn;
}
}
This following solution is the most compact I could come up with that is O(n).
public static IEnumerable<T[]> Chunk<T>(IEnumerable<T> source, int chunksize)
{
var list = source as IList<T> ?? source.ToList();
for (int start = 0; start < list.Count; start += chunksize)
{
T[] chunk = new T[Math.Min(chunksize, list.Count - start)];
for (int i = 0; i < chunk.Length; i++)
chunk[i] = list[start + i];
yield return chunk;
}
}
If the list is of type system.collections.generic you can use the "CopyTo" method available to copy elements of your array to other sub arrays. You specify the start element and number of elements to copy.
You could also make 3 clones of your original list and use the "RemoveRange" on each list to shrink the list to the size you want.
Or just create a helper method to do it for you.
It's an old solution but I had a different approach. I use Skip to move to desired offset and Take to extract desired number of elements:
public static IEnumerable<IEnumerable<T>> Chunk<T>(this IEnumerable<T> source,
int chunkSize)
{
if (chunkSize <= 0)
throw new ArgumentOutOfRangeException($"{nameof(chunkSize)} should be > 0");
var nbChunks = (int)Math.Ceiling((double)source.Count()/chunkSize);
return Enumerable.Range(0, nbChunks)
.Select(chunkNb => source.Skip(chunkNb*chunkSize)
.Take(chunkSize));
}
Another way is using Rx Buffer operator
//using System.Linq;
//using System.Reactive.Linq;
//using System.Reactive.Threading.Tasks;
var observableBatches = anAnumerable.ToObservable().Buffer(size);
var batches = aList.ToObservable().Buffer(size).ToList().ToTask().GetAwaiter().GetResult();
The question was how to "Split List into Sublists with LINQ", but sometimes you may want those sub-lists to be references to the original list, not copies. This allows you to modify the original list from the sub-lists. In that case, this may work for you.
public static IEnumerable<Memory<T>> RefChunkBy<T>(this T[] array, int size)
{
if (size < 1 || array is null)
{
throw new ArgumentException("chunkSize must be positive");
}
var index = 0;
var counter = 0;
for (int i = 0; i < array.Length; i++)
{
if (counter == size)
{
yield return new Memory<T>(array, index, size);
index = i;
counter = 0;
}
counter++;
if (i + 1 == array.Length)
{
yield return new Memory<T>(array, index, array.Length - index);
}
}
}
Usage:
var src = new[] { 1, 2, 3, 4, 5, 6 };
var c3 = RefChunkBy(src, 3); // {{1, 2, 3}, {4, 5, 6}};
var c4 = RefChunkBy(src, 4); // {{1, 2, 3, 4}, {5, 6}};
// as extension method
var c3 = src.RefChunkBy(3); // {{1, 2, 3}, {4, 5, 6}};
var c4 = src.RefChunkBy(4); // {{1, 2, 3, 4}, {5, 6}};
var sum = c3.Select(c => c.Span.ToArray().Sum()); // {6, 15}
var count = c3.Count(); // 2
var take2 = c3.Select(c => c.Span.ToArray().Take(2)); // {{1, 2}, {4, 5}}
Feel free to make this code better.
Using modular partitioning:
public IEnumerable<IEnumerable<string>> Split(IEnumerable<string> input, int chunkSize)
{
var chunks = (int)Math.Ceiling((double)input.Count() / (double)chunkSize);
return Enumerable.Range(0, chunks).Select(id => input.Where(s => s.GetHashCode() % chunks == id));
}
Just putting in my two cents. If you wanted to "bucket" the list (visualize left to right), you could do the following:
public static List<List<T>> Buckets<T>(this List<T> source, int numberOfBuckets)
{
List<List<T>> result = new List<List<T>>();
for (int i = 0; i < numberOfBuckets; i++)
{
result.Add(new List<T>());
}
int count = 0;
while (count < source.Count())
{
var mod = count % numberOfBuckets;
result[mod].Add(source[count]);
count++;
}
return result;
}
public static List<List<T>> GetSplitItemsList<T>(List<T> originalItemsList, short number)
{
var listGroup = new List<List<T>>();
int j = number;
for (int i = 0; i < originalItemsList.Count; i += number)
{
var cList = originalItemsList.Take(j).Skip(i).ToList();
j += number;
listGroup.Add(cList);
}
return listGroup;
}
To insert my two cents...
By using the list type for the source to be chunked, I found another very compact solution:
public static IEnumerable<IEnumerable<TSource>> Chunk<TSource>(this IEnumerable<TSource> source, int chunkSize)
{
// copy the source into a list
var chunkList = source.ToList();
// return chunks of 'chunkSize' items
while (chunkList.Count > chunkSize)
{
yield return chunkList.GetRange(0, chunkSize);
chunkList.RemoveRange(0, chunkSize);
}
// return the rest
yield return chunkList;
}
I took the primary answer and made it to be an IOC container to determine where to split. (For who is really looking to only split on 3 items, in reading this post while searching for an answer?)
This method allows one to split on any type of item as needed.
public static List<List<T>> SplitOn<T>(List<T> main, Func<T, bool> splitOn)
{
int groupIndex = 0;
return main.Select( item => new
{
Group = (splitOn.Invoke(item) ? ++groupIndex : groupIndex),
Value = item
})
.GroupBy( it2 => it2.Group)
.Select(x => x.Select(v => v.Value).ToList())
.ToList();
}
So for the OP the code would be
var it = new List<string>()
{ "a", "g", "e", "w", "p", "s", "q", "f", "x", "y", "i", "m", "c" };
int index = 0;
var result = SplitOn(it, (itm) => (index++ % 3) == 0 );
So performatic as the Sam Saffron's approach.
public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> source, int size)
{
if (source == null) throw new ArgumentNullException(nameof(source));
if (size <= 0) throw new ArgumentOutOfRangeException(nameof(size), "Size must be greater than zero.");
return BatchImpl(source, size).TakeWhile(x => x.Any());
}
static IEnumerable<IEnumerable<T>> BatchImpl<T>(this IEnumerable<T> source, int size)
{
var values = new List<T>();
var group = 1;
var disposed = false;
var e = source.GetEnumerator();
try
{
while (!disposed)
{
yield return GetBatch(e, values, group, size, () => { e.Dispose(); disposed = true; });
group++;
}
}
finally
{
if (!disposed)
e.Dispose();
}
}
static IEnumerable<T> GetBatch<T>(IEnumerator<T> e, List<T> values, int group, int size, Action dispose)
{
var min = (group - 1) * size + 1;
var max = group * size;
var hasValue = false;
while (values.Count < min && e.MoveNext())
{
values.Add(e.Current);
}
for (var i = min; i <= max; i++)
{
if (i <= values.Count)
{
hasValue = true;
}
else if (hasValue = e.MoveNext())
{
values.Add(e.Current);
}
else
{
dispose();
}
if (hasValue)
yield return values[i - 1];
else
yield break;
}
}
}
Can work with infinite generators:
a.Zip(a.Skip(1), (x, y) => Enumerable.Repeat(x, 1).Concat(Enumerable.Repeat(y, 1)))
.Zip(a.Skip(2), (xy, z) => xy.Concat(Enumerable.Repeat(z, 1)))
.Where((x, i) => i % 3 == 0)
Demo code: https://ideone.com/GKmL7M
using System;
using System.Collections.Generic;
using System.Linq;
public class Test
{
private static void DoIt(IEnumerable<int> a)
{
Console.WriteLine(String.Join(" ", a));
foreach (var x in a.Zip(a.Skip(1), (x, y) => Enumerable.Repeat(x, 1).Concat(Enumerable.Repeat(y, 1))).Zip(a.Skip(2), (xy, z) => xy.Concat(Enumerable.Repeat(z, 1))).Where((x, i) => i % 3 == 0))
Console.WriteLine(String.Join(" ", x));
Console.WriteLine();
}
public static void Main()
{
DoIt(new int[] {1});
DoIt(new int[] {1, 2});
DoIt(new int[] {1, 2, 3});
DoIt(new int[] {1, 2, 3, 4});
DoIt(new int[] {1, 2, 3, 4, 5});
DoIt(new int[] {1, 2, 3, 4, 5, 6});
}
}
1
1 2
1 2 3
1 2 3
1 2 3 4
1 2 3
1 2 3 4 5
1 2 3
1 2 3 4 5 6
1 2 3
4 5 6
But actually I would prefer to write corresponding method without linq.

How to take all but the last element in a sequence using LINQ?

Let's say I have a sequence.
IEnumerable<int> sequence = GetSequenceFromExpensiveSource();
// sequence now contains: 0,1,2,3,...,999999,1000000
Getting the sequence is not cheap and is dynamically generated, and I want to iterate through it once only.
I want to get 0 - 999999 (i.e. everything but the last element)
I recognize that I could do something like:
sequence.Take(sequence.Count() - 1);
but that results in two enumerations over the big sequence.
Is there a LINQ construct that lets me do:
sequence.TakeAllButTheLastElement();
The Enumerable.SkipLast(IEnumerable<TSource>, Int32) method was added in .NET Standard 2.1. It does exactly what you want.
IEnumerable<int> sequence = GetSequenceFromExpensiveSource();
var allExceptLast = sequence.SkipLast(1);
From https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable.skiplast
Returns a new enumerable collection that contains the elements from source with the last count elements of the source collection omitted.
I don't know a Linq solution - But you can easily code the algorithm by yourself using generators (yield return).
public static IEnumerable<T> TakeAllButLast<T>(this IEnumerable<T> source) {
var it = source.GetEnumerator();
bool hasRemainingItems = false;
bool isFirst = true;
T item = default(T);
do {
hasRemainingItems = it.MoveNext();
if (hasRemainingItems) {
if (!isFirst) yield return item;
item = it.Current;
isFirst = false;
}
} while (hasRemainingItems);
}
static void Main(string[] args) {
var Seq = Enumerable.Range(1, 10);
Console.WriteLine(string.Join(", ", Seq.Select(x => x.ToString()).ToArray()));
Console.WriteLine(string.Join(", ", Seq.TakeAllButLast().Select(x => x.ToString()).ToArray()));
}
Or as a generalized solution discarding the last n items (using a queue like suggested in the comments):
public static IEnumerable<T> SkipLastN<T>(this IEnumerable<T> source, int n) {
var it = source.GetEnumerator();
bool hasRemainingItems = false;
var cache = new Queue<T>(n + 1);
do {
if (hasRemainingItems = it.MoveNext()) {
cache.Enqueue(it.Current);
if (cache.Count > n)
yield return cache.Dequeue();
}
} while (hasRemainingItems);
}
static void Main(string[] args) {
var Seq = Enumerable.Range(1, 4);
Console.WriteLine(string.Join(", ", Seq.Select(x => x.ToString()).ToArray()));
Console.WriteLine(string.Join(", ", Seq.SkipLastN(3).Select(x => x.ToString()).ToArray()));
}
As an alternative to creating your own method and in a case the elements order is not important, the next will work:
var result = sequence.Reverse().Skip(1);
Because I'm not a fan of explicitly using an Enumerator, here's an alternative. Note that the wrapper methods are needed to let invalid arguments throw early, rather than deferring the checks until the sequence is actually enumerated.
public static IEnumerable<T> DropLast<T>(this IEnumerable<T> source)
{
if (source == null)
throw new ArgumentNullException("source");
return InternalDropLast(source);
}
private static IEnumerable<T> InternalDropLast<T>(IEnumerable<T> source)
{
T buffer = default(T);
bool buffered = false;
foreach (T x in source)
{
if (buffered)
yield return buffer;
buffer = x;
buffered = true;
}
}
As per Eric Lippert's suggestion, it easily generalizes to n items:
public static IEnumerable<T> DropLast<T>(this IEnumerable<T> source, int n)
{
if (source == null)
throw new ArgumentNullException("source");
if (n < 0)
throw new ArgumentOutOfRangeException("n",
"Argument n should be non-negative.");
return InternalDropLast(source, n);
}
private static IEnumerable<T> InternalDropLast<T>(IEnumerable<T> source, int n)
{
Queue<T> buffer = new Queue<T>(n + 1);
foreach (T x in source)
{
buffer.Enqueue(x);
if (buffer.Count == n + 1)
yield return buffer.Dequeue();
}
}
Where I now buffer before yielding instead of after yielding, so that the n == 0 case does not need special handling.
With C# 8.0 you can use Ranges and indices for that.
var allButLast = sequence[..^1];
By default C# 8.0 requires .NET Core 3.0 or .NET Standard 2.1 (or above). Check this thread to use with older implementations.
Nothing in the BCL (or MoreLinq I believe), but you could create your own extension method.
public static IEnumerable<T> TakeAllButLast<T>(this IEnumerable<T> source)
{
using (var enumerator = source.GetEnumerator())
bool first = true;
T prev;
while(enumerator.MoveNext())
{
if (!first)
yield return prev;
first = false;
prev = enumerator.Current;
}
}
}
It would be helpful if .NET Framework was shipped with extension method like this.
public static IEnumerable<T> SkipLast<T>(this IEnumerable<T> source, int count)
{
var enumerator = source.GetEnumerator();
var queue = new Queue<T>(count + 1);
while (true)
{
if (!enumerator.MoveNext())
break;
queue.Enqueue(enumerator.Current);
if (queue.Count > count)
yield return queue.Dequeue();
}
}
if you don't have time to roll out your own extension, here's a quicker way:
var next = sequence.First();
sequence.Skip(1)
.Select(s =>
{
var selected = next;
next = s;
return selected;
});
A slight expansion on Joren's elegant solution:
public static IEnumerable<T> Shrink<T>(this IEnumerable<T> source, int left, int right)
{
int i = 0;
var buffer = new Queue<T>(right + 1);
foreach (T x in source)
{
if (i >= left) // Read past left many elements at the start
{
buffer.Enqueue(x);
if (buffer.Count > right) // Build a buffer to drop right many elements at the end
yield return buffer.Dequeue();
}
else i++;
}
}
public static IEnumerable<T> WithoutLast<T>(this IEnumerable<T> source, int n = 1)
{
return source.Shrink(0, n);
}
public static IEnumerable<T> WithoutFirst<T>(this IEnumerable<T> source, int n = 1)
{
return source.Shrink(n, 0);
}
Where shrink implements a simple count forward to drop the first left many elements and the same discarded buffer to drop the last right many elements.
If you can get the Count or Length of an enumerable, which in most cases you can, then just Take(n - 1)
Example with arrays
int[] arr = new int[] { 1, 2, 3, 4, 5 };
int[] sub = arr.Take(arr.Length - 1).ToArray();
Example with IEnumerable<T>
IEnumerable<int> enu = Enumerable.Range(1, 100);
IEnumerable<int> sub = enu.Take(enu.Count() - 1);
A slight variation on the accepted answer, which (for my tastes) is a bit simpler:
public static IEnumerable<T> AllButLast<T>(this IEnumerable<T> enumerable, int n = 1)
{
// for efficiency, handle degenerate n == 0 case separately
if (n == 0)
{
foreach (var item in enumerable)
yield return item;
yield break;
}
var queue = new Queue<T>(n);
foreach (var item in enumerable)
{
if (queue.Count == n)
yield return queue.Dequeue();
queue.Enqueue(item);
}
}
Why not just .ToList<type>() on the sequence, then call count and take like you did originally..but since it's been pulled into a list, it shouldnt do an expensive enumeration twice. Right?
The solution that I use for this problem is slightly more elaborate.
My util static class contains an extension method MarkEnd which converts the T-items in EndMarkedItem<T>-items. Each element is marked with an extra int, which is either 0; or (in case one is particularly interested in the last 3 items) -3, -2, or -1 for the last 3 items.
This could be useful on its own, e.g. when you want to create a list in a simple foreach-loop with commas after each element except the last 2, with the second-to-last item followed by a conjunction word (such as “and” or “or”), and the last element followed by a point.
For generating the entire list without the last n items, the extension method ButLast simply iterates over the EndMarkedItem<T>s while EndMark == 0.
If you don’t specify tailLength, only the last item is marked (in MarkEnd()) or dropped (in ButLast()).
Like the other solutions, this works by buffering.
using System;
using System.Collections.Generic;
using System.Linq;
namespace Adhemar.Util.Linq {
public struct EndMarkedItem<T> {
public T Item { get; private set; }
public int EndMark { get; private set; }
public EndMarkedItem(T item, int endMark) : this() {
Item = item;
EndMark = endMark;
}
}
public static class TailEnumerables {
public static IEnumerable<T> ButLast<T>(this IEnumerable<T> ts) {
return ts.ButLast(1);
}
public static IEnumerable<T> ButLast<T>(this IEnumerable<T> ts, int tailLength) {
return ts.MarkEnd(tailLength).TakeWhile(te => te.EndMark == 0).Select(te => te.Item);
}
public static IEnumerable<EndMarkedItem<T>> MarkEnd<T>(this IEnumerable<T> ts) {
return ts.MarkEnd(1);
}
public static IEnumerable<EndMarkedItem<T>> MarkEnd<T>(this IEnumerable<T> ts, int tailLength) {
if (tailLength < 0) {
throw new ArgumentOutOfRangeException("tailLength");
}
else if (tailLength == 0) {
foreach (var t in ts) {
yield return new EndMarkedItem<T>(t, 0);
}
}
else {
var buffer = new T[tailLength];
var index = -buffer.Length;
foreach (var t in ts) {
if (index < 0) {
buffer[buffer.Length + index] = t;
index++;
}
else {
yield return new EndMarkedItem<T>(buffer[index], 0);
buffer[index] = t;
index++;
if (index == buffer.Length) {
index = 0;
}
}
}
if (index >= 0) {
for (var i = index; i < buffer.Length; i++) {
yield return new EndMarkedItem<T>(buffer[i], i - buffer.Length - index);
}
for (var j = 0; j < index; j++) {
yield return new EndMarkedItem<T>(buffer[j], j - index);
}
}
else {
for (var k = 0; k < buffer.Length + index; k++) {
yield return new EndMarkedItem<T>(buffer[k], k - buffer.Length - index);
}
}
}
}
}
}
public static IEnumerable<T> NoLast<T> (this IEnumerable<T> items) {
if (items != null) {
var e = items.GetEnumerator();
if (e.MoveNext ()) {
T head = e.Current;
while (e.MoveNext ()) {
yield return head; ;
head = e.Current;
}
}
}
}
I don't think it can get more succinct than this - also ensuring to Dispose the IEnumerator<T>:
public static IEnumerable<T> SkipLast<T>(this IEnumerable<T> source)
{
using (var it = source.GetEnumerator())
{
if (it.MoveNext())
{
var item = it.Current;
while (it.MoveNext())
{
yield return item;
item = it.Current;
}
}
}
}
Edit: technically identical to this answer.
This is a general and IMHO elegant solution that will handle all cases correctly:
using System;
using System.Collections.Generic;
using System.Linq;
public class Program
{
public static void Main()
{
IEnumerable<int> r = Enumerable.Range(1, 20);
foreach (int i in r.AllButLast(3))
Console.WriteLine(i);
Console.ReadKey();
}
}
public static class LinqExt
{
public static IEnumerable<T> AllButLast<T>(this IEnumerable<T> enumerable, int n = 1)
{
using (IEnumerator<T> enumerator = enumerable.GetEnumerator())
{
Queue<T> queue = new Queue<T>(n);
for (int i = 0; i < n && enumerator.MoveNext(); i++)
queue.Enqueue(enumerator.Current);
while (enumerator.MoveNext())
{
queue.Enqueue(enumerator.Current);
yield return queue.Dequeue();
}
}
}
}
You could write:
var list = xyz.Select(x=>x.Id).ToList();
list.RemoveAt(list.Count - 1);
My traditional IEnumerable approach:
/// <summary>
/// Skips first element of an IEnumerable
/// </summary>
/// <typeparam name="U">Enumerable type</typeparam>
/// <param name="models">The enumerable</param>
/// <returns>IEnumerable of type skipping first element</returns>
private IEnumerable<U> SkipFirstEnumerable<U>(IEnumerable<U> models)
{
using (var e = models.GetEnumerator())
{
if (!e.MoveNext()) return;
for (;e.MoveNext();) yield return e.Current;
yield return e.Current;
}
}
/// <summary>
/// Skips last element of an IEnumerable
/// </summary>
/// <typeparam name="U">Enumerable type</typeparam>
/// <param name="models">The enumerable</param>
/// <returns>IEnumerable of type skipping last element</returns>
private IEnumerable<U> SkipLastEnumerable<U>(IEnumerable<U> models)
{
using (var e = models.GetEnumerator())
{
if (!e.MoveNext()) return;
yield return e.Current;
for (;e.MoveNext();) yield return e.Current;
}
}
Could be:
var allBuLast = sequence.TakeWhile(e => e != sequence.Last());
I guess it should be like de "Where" but preserving the order(?).
If speed is a requirement, this old school way should be the fastest, even though the code doesn't look as smooth as linq could make it.
int[] newSequence = int[sequence.Length - 1];
for (int x = 0; x < sequence.Length - 1; x++)
{
newSequence[x] = sequence[x];
}
This requires that the sequence is an array since it has a fixed length and indexed items.
A simple way would be to just convert to a queue and dequeue until only the number of items you want to skip is left.
public static IEnumerable<T> SkipLast<T>(this IEnumerable<T> source, int n)
{
var queue = new Queue<T>(source);
while (queue.Count() > n)
{
yield return queue.Dequeue();
}
}
I would probably do something like this:
sequence.Where(x => x != sequence.LastOrDefault())
This is one iteration with a check that it isn't the last one for each time though.

Categories

Resources