Is there any way I can separate a List<SomeObject> into several separate lists of SomeObject, using the item index as the delimiter of each split?
Let me exemplify:
I have a List<SomeObject> and I need a List<List<SomeObject>> or List<SomeObject>[], so that each of these resulting lists will contain a group of 3 items of the original list (sequentially).
eg.:
Original List: [a, g, e, w, p, s, q, f, x, y, i, m, c]
Resulting lists: [a, g, e], [w, p, s], [q, f, x], [y, i, m], [c]
I'd also need the resulting lists size to be a parameter of this function.
Try the following code.
public static List<List<T>> Split<T>(IList<T> source)
{
return source
.Select((x, i) => new { Index = i, Value = x })
.GroupBy(x => x.Index / 3)
.Select(x => x.Select(v => v.Value).ToList())
.ToList();
}
The idea is to first group the elements by indexes. Dividing by three has the effect of grouping them into groups of 3. Then convert each group to a list and the IEnumerable of List to a List of Lists
I just wrote this, and I think it's a little more elegant than the other proposed solutions:
/// <summary>
/// Break a list of items into chunks of a specific size
/// </summary>
public static IEnumerable<IEnumerable<T>> Chunk<T>(this IEnumerable<T> source, int chunksize)
{
while (source.Any())
{
yield return source.Take(chunksize);
source = source.Skip(chunksize);
}
}
In general the approach suggested by CaseyB works fine, in fact if you are passing in a List<T> it is hard to fault it, perhaps I would change it to:
public static IEnumerable<IEnumerable<T>> ChunkTrivialBetter<T>(this IEnumerable<T> source, int chunksize)
{
var pos = 0;
while (source.Skip(pos).Any())
{
yield return source.Skip(pos).Take(chunksize);
pos += chunksize;
}
}
Which will avoid massive call chains. Nonetheless, this approach has a general flaw. It materializes two enumerations per chunk, to highlight the issue try running:
foreach (var item in Enumerable.Range(1, int.MaxValue).Chunk(8).Skip(100000).First())
{
Console.WriteLine(item);
}
// wait forever
To overcome this we can try Cameron's approach, which passes the above test in flying colors as it only walks the enumeration once.
Trouble is that it has a different flaw, it materializes every item in each chunk, the trouble with that approach is that you run high on memory.
To illustrate that try running:
foreach (var item in Enumerable.Range(1, int.MaxValue)
.Select(x => x + new string('x', 100000))
.Clump(10000).Skip(100).First())
{
Console.Write('.');
}
// OutOfMemoryException
Finally, any implementation should be able to handle out of order iteration of chunks, for example:
Enumerable.Range(1,3).Chunk(2).Reverse().ToArray()
// should return [3],[1,2]
Many highly optimal solutions like my first revision of this answer failed there. The same issue can be seen in casperOne's optimized answer.
To address all these issues you can use the following:
namespace ChunkedEnumerator
{
public static class Extensions
{
class ChunkedEnumerable<T> : IEnumerable<T>
{
class ChildEnumerator : IEnumerator<T>
{
ChunkedEnumerable<T> parent;
int position;
bool done = false;
T current;
public ChildEnumerator(ChunkedEnumerable<T> parent)
{
this.parent = parent;
position = -1;
parent.wrapper.AddRef();
}
public T Current
{
get
{
if (position == -1 || done)
{
throw new InvalidOperationException();
}
return current;
}
}
public void Dispose()
{
if (!done)
{
done = true;
parent.wrapper.RemoveRef();
}
}
object System.Collections.IEnumerator.Current
{
get { return Current; }
}
public bool MoveNext()
{
position++;
if (position + 1 > parent.chunkSize)
{
done = true;
}
if (!done)
{
done = !parent.wrapper.Get(position + parent.start, out current);
}
return !done;
}
public void Reset()
{
// per http://msdn.microsoft.com/en-us/library/system.collections.ienumerator.reset.aspx
throw new NotSupportedException();
}
}
EnumeratorWrapper<T> wrapper;
int chunkSize;
int start;
public ChunkedEnumerable(EnumeratorWrapper<T> wrapper, int chunkSize, int start)
{
this.wrapper = wrapper;
this.chunkSize = chunkSize;
this.start = start;
}
public IEnumerator<T> GetEnumerator()
{
return new ChildEnumerator(this);
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
class EnumeratorWrapper<T>
{
public EnumeratorWrapper (IEnumerable<T> source)
{
SourceEumerable = source;
}
IEnumerable<T> SourceEumerable {get; set;}
Enumeration currentEnumeration;
class Enumeration
{
public IEnumerator<T> Source { get; set; }
public int Position { get; set; }
public bool AtEnd { get; set; }
}
public bool Get(int pos, out T item)
{
if (currentEnumeration != null && currentEnumeration.Position > pos)
{
currentEnumeration.Source.Dispose();
currentEnumeration = null;
}
if (currentEnumeration == null)
{
currentEnumeration = new Enumeration { Position = -1, Source = SourceEumerable.GetEnumerator(), AtEnd = false };
}
item = default(T);
if (currentEnumeration.AtEnd)
{
return false;
}
while(currentEnumeration.Position < pos)
{
currentEnumeration.AtEnd = !currentEnumeration.Source.MoveNext();
currentEnumeration.Position++;
if (currentEnumeration.AtEnd)
{
return false;
}
}
item = currentEnumeration.Source.Current;
return true;
}
int refs = 0;
// needed for dispose semantics
public void AddRef()
{
refs++;
}
public void RemoveRef()
{
refs--;
if (refs == 0 && currentEnumeration != null)
{
var copy = currentEnumeration;
currentEnumeration = null;
copy.Source.Dispose();
}
}
}
public static IEnumerable<IEnumerable<T>> Chunk<T>(this IEnumerable<T> source, int chunksize)
{
if (chunksize < 1) throw new InvalidOperationException();
var wrapper = new EnumeratorWrapper<T>(source);
int currentPos = 0;
T ignore;
try
{
wrapper.AddRef();
while (wrapper.Get(currentPos, out ignore))
{
yield return new ChunkedEnumerable<T>(wrapper, chunksize, currentPos);
currentPos += chunksize;
}
}
finally
{
wrapper.RemoveRef();
}
}
}
class Program
{
static void Main(string[] args)
{
int i = 10;
foreach (var group in Enumerable.Range(1, int.MaxValue).Skip(10000000).Chunk(3))
{
foreach (var n in group)
{
Console.Write(n);
Console.Write(" ");
}
Console.WriteLine();
if (i-- == 0) break;
}
var stuffs = Enumerable.Range(1, 10).Chunk(2).ToArray();
foreach (var idx in new [] {3,2,1})
{
Console.Write("idx " + idx + " ");
foreach (var n in stuffs[idx])
{
Console.Write(n);
Console.Write(" ");
}
Console.WriteLine();
}
/*
10000001 10000002 10000003
10000004 10000005 10000006
10000007 10000008 10000009
10000010 10000011 10000012
10000013 10000014 10000015
10000016 10000017 10000018
10000019 10000020 10000021
10000022 10000023 10000024
10000025 10000026 10000027
10000028 10000029 10000030
10000031 10000032 10000033
idx 3 7 8
idx 2 5 6
idx 1 3 4
*/
Console.ReadKey();
}
}
}
There is also a round of optimisations you could introduce for out-of-order iteration of chunks, which is out of scope here.
As to which method you should choose? It totally depends on the problem you are trying to solve. If you are not concerned with the first flaw the simple answer is incredibly appealing.
Note as with most methods, this is not safe for multi threading, stuff can get weird if you wish to make it thread safe you would need to amend EnumeratorWrapper.
You could use a number of queries that use Take and Skip, but that would add too many iterations on the original list, I believe.
Rather, I think you should create an iterator of your own, like so:
public static IEnumerable<IEnumerable<T>> GetEnumerableOfEnumerables<T>(
IEnumerable<T> enumerable, int groupSize)
{
// The list to return.
List<T> list = new List<T>(groupSize);
// Cycle through all of the items.
foreach (T item in enumerable)
{
// Add the item.
list.Add(item);
// If the list has the number of elements, return that.
if (list.Count == groupSize)
{
// Return the list.
yield return list;
// Set the list to a new list.
list = new List<T>(groupSize);
}
}
// Return the remainder if there is any,
if (list.Count != 0)
{
// Return the list.
yield return list;
}
}
You can then call this and it is LINQ enabled so you can perform other operations on the resulting sequences.
In light of Sam's answer, I felt there was an easier way to do this without:
Iterating through the list again (which I didn't do originally)
Materializing the items in groups before releasing the chunk (for large chunks of items, there would be memory issues)
All of the code that Sam posted
That said, here's another pass, which I've codified in an extension method to IEnumerable<T> called Chunk:
public static IEnumerable<IEnumerable<T>> Chunk<T>(this IEnumerable<T> source,
int chunkSize)
{
// Validate parameters.
if (source == null) throw new ArgumentNullException(nameof(source));
if (chunkSize <= 0) throw new ArgumentOutOfRangeException(nameof(chunkSize),
"The chunkSize parameter must be a positive value.");
// Call the internal implementation.
return source.ChunkInternal(chunkSize);
}
Nothing surprising up there, just basic error checking.
Moving on to ChunkInternal:
private static IEnumerable<IEnumerable<T>> ChunkInternal<T>(
this IEnumerable<T> source, int chunkSize)
{
// Validate parameters.
Debug.Assert(source != null);
Debug.Assert(chunkSize > 0);
// Get the enumerator. Dispose of when done.
using (IEnumerator<T> enumerator = source.GetEnumerator())
do
{
// Move to the next element. If there's nothing left
// then get out.
if (!enumerator.MoveNext()) yield break;
// Return the chunked sequence.
yield return ChunkSequence(enumerator, chunkSize);
} while (true);
}
Basically, it gets the IEnumerator<T> and manually iterates through each item. It checks to see if there any items currently to be enumerated. After each chunk is enumerated through, if there aren't any items left, it breaks out.
Once it detects there are items in the sequence, it delegates the responsibility for the inner IEnumerable<T> implementation to ChunkSequence:
private static IEnumerable<T> ChunkSequence<T>(IEnumerator<T> enumerator,
int chunkSize)
{
// Validate parameters.
Debug.Assert(enumerator != null);
Debug.Assert(chunkSize > 0);
// The count.
int count = 0;
// There is at least one item. Yield and then continue.
do
{
// Yield the item.
yield return enumerator.Current;
} while (++count < chunkSize && enumerator.MoveNext());
}
Since MoveNext was already called on the IEnumerator<T> passed to ChunkSequence, it yields the item returned by Current and then increments the count, making sure never to return more than chunkSize items and moving to the next item in the sequence after every iteration (but short-circuited if the number of items yielded exceeds the chunk size).
If there are no items left, then the InternalChunk method will make another pass in the outer loop, but when MoveNext is called a second time, it will still return false, as per the documentation (emphasis mine):
If MoveNext passes the end of the collection, the enumerator is
positioned after the last element in the collection and MoveNext
returns false. When the enumerator is at this position, subsequent
calls to MoveNext also return false until Reset is called.
At this point, the loop will break, and the sequence of sequences will terminate.
This is a simple test:
static void Main()
{
string s = "agewpsqfxyimc";
int count = 0;
// Group by three.
foreach (IEnumerable<char> g in s.Chunk(3))
{
// Print out the group.
Console.Write("Group: {0} - ", ++count);
// Print the items.
foreach (char c in g)
{
// Print the item.
Console.Write(c + ", ");
}
// Finish the line.
Console.WriteLine();
}
}
Output:
Group: 1 - a, g, e,
Group: 2 - w, p, s,
Group: 3 - q, f, x,
Group: 4 - y, i, m,
Group: 5 - c,
An important note, this will not work if you don't drain the entire child sequence or break at any point in the parent sequence. This is an important caveat, but if your use case is that you will consume every element of the sequence of sequences, then this will work for you.
Additionally, it will do strange things if you play with the order, just as Sam's did at one point.
Ok, here's my take on it:
completely lazy: works on infinite enumerables
no intermediate copying/buffering
O(n) execution time
works also when inner sequences are only partially consumed
public static IEnumerable<IEnumerable<T>> Chunks<T>(this IEnumerable<T> enumerable,
int chunkSize)
{
if (chunkSize < 1) throw new ArgumentException("chunkSize must be positive");
using (var e = enumerable.GetEnumerator())
while (e.MoveNext())
{
var remaining = chunkSize; // elements remaining in the current chunk
var innerMoveNext = new Func<bool>(() => --remaining > 0 && e.MoveNext());
yield return e.GetChunk(innerMoveNext);
while (innerMoveNext()) {/* discard elements skipped by inner iterator */}
}
}
private static IEnumerable<T> GetChunk<T>(this IEnumerator<T> e,
Func<bool> innerMoveNext)
{
do yield return e.Current;
while (innerMoveNext());
}
Example Usage
var src = new [] {1, 2, 3, 4, 5, 6};
var c3 = src.Chunks(3); // {{1, 2, 3}, {4, 5, 6}};
var c4 = src.Chunks(4); // {{1, 2, 3, 4}, {5, 6}};
var sum = c3.Select(c => c.Sum()); // {6, 15}
var count = c3.Count(); // 2
var take2 = c3.Select(c => c.Take(2)); // {{1, 2}, {4, 5}}
Explanations
The code works by nesting two yield based iterators.
The outer iterator must keep track of how many elements have been effectively consumed by the inner (chunk) iterator. This is done by closing over remaining with innerMoveNext(). Unconsumed elements of a chunk are discarded before the next chunk is yielded by the outer iterator.
This is necessary because otherwise you get inconsistent results, when the inner enumerables are not (completely) consumed (e.g. c3.Count() would return 6).
Note: The answer has been updated to address the shortcomings pointed out by #aolszowka.
Update .NET 6.0
.NET 6.0 added a new native Chunk method to the System.Linq namespace:
public static System.Collections.Generic.IEnumerable<TSource[]> Chunk<TSource> (
this System.Collections.Generic.IEnumerable<TSource> source, int size);
Using this new method every chunk except the last will be of size size. The last chunk will contain the remaining elements and may be of a smaller size.
Here is an example:
var list = Enumerable.Range(1, 100);
var chunkSize = 10;
foreach(var chunk in list.Chunk(chunkSize)) //Returns a chunk with the correct size.
{
Parallel.ForEach(chunk, (item) =>
{
//Do something Parallel here.
Console.WriteLine(item);
});
}
You’re probably thinking, well why not use Skip and Take? Which is true, I think this is just a bit more concise and makes things just that little bit more readable.
completely lazy, no counting or copying:
public static class EnumerableExtensions
{
public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> source, int len)
{
if (len == 0)
throw new ArgumentNullException();
var enumer = source.GetEnumerator();
while (enumer.MoveNext())
{
yield return Take(enumer.Current, enumer, len);
}
}
private static IEnumerable<T> Take<T>(T head, IEnumerator<T> tail, int len)
{
while (true)
{
yield return head;
if (--len == 0)
break;
if (tail.MoveNext())
head = tail.Current;
else
break;
}
}
}
I think the following suggestion would be the fastest. I am sacrificing the lazyness of the source Enumerable for the ability to use Array.Copy and knowing ahead of the time the length of each of my sublists.
public static IEnumerable<T[]> Chunk<T>(this IEnumerable<T> items, int size)
{
T[] array = items as T[] ?? items.ToArray();
for (int i = 0; i < array.Length; i+=size)
{
T[] chunk = new T[Math.Min(size, array.Length - i)];
Array.Copy(array, i, chunk, 0, chunk.Length);
yield return chunk;
}
}
For anyone interested in a packaged/maintained solution, the MoreLINQ library provides the Batch extension method which matches your requested behavior:
IEnumerable<char> source = "Example string";
IEnumerable<IEnumerable<char>> chunksOfThreeChars = source.Batch(3);
The Batch implementation is similar to Cameron MacFarland's answer, with the addition of an overload for transforming the chunk/batch before returning, and performs quite well.
I wrote a Clump extension method several years ago. Works great, and is the fastest implementation here. :P
/// <summary>
/// Clumps items into same size lots.
/// </summary>
/// <typeparam name="T"></typeparam>
/// <param name="source">The source list of items.</param>
/// <param name="size">The maximum size of the clumps to make.</param>
/// <returns>A list of list of items, where each list of items is no bigger than the size given.</returns>
public static IEnumerable<IEnumerable<T>> Clump<T>(this IEnumerable<T> source, int size)
{
if (source == null)
throw new ArgumentNullException("source");
if (size < 1)
throw new ArgumentOutOfRangeException("size", "size must be greater than 0");
return ClumpIterator<T>(source, size);
}
private static IEnumerable<IEnumerable<T>> ClumpIterator<T>(IEnumerable<T> source, int size)
{
Debug.Assert(source != null, "source is null.");
T[] items = new T[size];
int count = 0;
foreach (var item in source)
{
items[count] = item;
count++;
if (count == size)
{
yield return items;
items = new T[size];
count = 0;
}
}
if (count > 0)
{
if (count == size)
yield return items;
else
{
T[] tempItems = new T[count];
Array.Copy(items, tempItems, count);
yield return tempItems;
}
}
}
We can improve #JaredPar's solution to do true lazy evaluation. We use a GroupAdjacentBy method that yields groups of consecutive elements with the same key:
sequence
.Select((x, i) => new { Value = x, Index = i })
.GroupAdjacentBy(x=>x.Index/3)
.Select(g=>g.Select(x=>x.Value))
Because the groups are yielded one-by-one, this solution works efficiently with long or infinite sequences.
System.Interactive provides Buffer() for this purpose. Some quick testing shows performance is similar to Sam's solution.
I find this little snippet does the job quite nicely.
public static IEnumerable<List<T>> Chunked<T>(this List<T> source, int chunkSize)
{
var offset = 0;
while (offset < source.Count)
{
yield return source.GetRange(offset, Math.Min(source.Count - offset, chunkSize));
offset += chunkSize;
}
}
Here's a list splitting routine I wrote a couple months ago:
public static List<List<T>> Chunk<T>(
List<T> theList,
int chunkSize
)
{
List<List<T>> result = theList
.Select((x, i) => new {
data = x,
indexgroup = i / chunkSize
})
.GroupBy(x => x.indexgroup, x => x.data)
.Select(g => new List<T>(g))
.ToList();
return result;
}
We found David B's solution worked the best. But we adapted it to a more general solution:
list.GroupBy(item => item.SomeProperty)
.Select(group => new List<T>(group))
.ToArray();
What about this one?
var input = new List<string> { "a", "g", "e", "w", "p", "s", "q", "f", "x", "y", "i", "m", "c" };
var k = 3
var res = Enumerable.Range(0, (input.Count - 1) / k + 1)
.Select(i => input.GetRange(i * k, Math.Min(k, input.Count - i * k)))
.ToList();
As far as I know, GetRange() is linear in terms of number of items taken. So this should perform well.
This is an old question but this is what I ended up with; it enumerates the enumerable only once, but does create lists for each of the partitions. It doesn't suffer from unexpected behavior when ToArray() is called as some of the implementations do:
public static IEnumerable<IEnumerable<T>> Partition<T>(IEnumerable<T> source, int chunkSize)
{
if (source == null)
{
throw new ArgumentNullException("source");
}
if (chunkSize < 1)
{
throw new ArgumentException("Invalid chunkSize: " + chunkSize);
}
using (IEnumerator<T> sourceEnumerator = source.GetEnumerator())
{
IList<T> currentChunk = new List<T>();
while (sourceEnumerator.MoveNext())
{
currentChunk.Add(sourceEnumerator.Current);
if (currentChunk.Count == chunkSize)
{
yield return currentChunk;
currentChunk = new List<T>();
}
}
if (currentChunk.Any())
{
yield return currentChunk;
}
}
}
Old code, but this is what I've been using:
public static IEnumerable<List<T>> InSetsOf<T>(this IEnumerable<T> source, int max)
{
var toReturn = new List<T>(max);
foreach (var item in source)
{
toReturn.Add(item);
if (toReturn.Count == max)
{
yield return toReturn;
toReturn = new List<T>(max);
}
}
if (toReturn.Any())
{
yield return toReturn;
}
}
This following solution is the most compact I could come up with that is O(n).
public static IEnumerable<T[]> Chunk<T>(IEnumerable<T> source, int chunksize)
{
var list = source as IList<T> ?? source.ToList();
for (int start = 0; start < list.Count; start += chunksize)
{
T[] chunk = new T[Math.Min(chunksize, list.Count - start)];
for (int i = 0; i < chunk.Length; i++)
chunk[i] = list[start + i];
yield return chunk;
}
}
If the list is of type system.collections.generic you can use the "CopyTo" method available to copy elements of your array to other sub arrays. You specify the start element and number of elements to copy.
You could also make 3 clones of your original list and use the "RemoveRange" on each list to shrink the list to the size you want.
Or just create a helper method to do it for you.
It's an old solution but I had a different approach. I use Skip to move to desired offset and Take to extract desired number of elements:
public static IEnumerable<IEnumerable<T>> Chunk<T>(this IEnumerable<T> source,
int chunkSize)
{
if (chunkSize <= 0)
throw new ArgumentOutOfRangeException($"{nameof(chunkSize)} should be > 0");
var nbChunks = (int)Math.Ceiling((double)source.Count()/chunkSize);
return Enumerable.Range(0, nbChunks)
.Select(chunkNb => source.Skip(chunkNb*chunkSize)
.Take(chunkSize));
}
Another way is using Rx Buffer operator
//using System.Linq;
//using System.Reactive.Linq;
//using System.Reactive.Threading.Tasks;
var observableBatches = anAnumerable.ToObservable().Buffer(size);
var batches = aList.ToObservable().Buffer(size).ToList().ToTask().GetAwaiter().GetResult();
The question was how to "Split List into Sublists with LINQ", but sometimes you may want those sub-lists to be references to the original list, not copies. This allows you to modify the original list from the sub-lists. In that case, this may work for you.
public static IEnumerable<Memory<T>> RefChunkBy<T>(this T[] array, int size)
{
if (size < 1 || array is null)
{
throw new ArgumentException("chunkSize must be positive");
}
var index = 0;
var counter = 0;
for (int i = 0; i < array.Length; i++)
{
if (counter == size)
{
yield return new Memory<T>(array, index, size);
index = i;
counter = 0;
}
counter++;
if (i + 1 == array.Length)
{
yield return new Memory<T>(array, index, array.Length - index);
}
}
}
Usage:
var src = new[] { 1, 2, 3, 4, 5, 6 };
var c3 = RefChunkBy(src, 3); // {{1, 2, 3}, {4, 5, 6}};
var c4 = RefChunkBy(src, 4); // {{1, 2, 3, 4}, {5, 6}};
// as extension method
var c3 = src.RefChunkBy(3); // {{1, 2, 3}, {4, 5, 6}};
var c4 = src.RefChunkBy(4); // {{1, 2, 3, 4}, {5, 6}};
var sum = c3.Select(c => c.Span.ToArray().Sum()); // {6, 15}
var count = c3.Count(); // 2
var take2 = c3.Select(c => c.Span.ToArray().Take(2)); // {{1, 2}, {4, 5}}
Feel free to make this code better.
Using modular partitioning:
public IEnumerable<IEnumerable<string>> Split(IEnumerable<string> input, int chunkSize)
{
var chunks = (int)Math.Ceiling((double)input.Count() / (double)chunkSize);
return Enumerable.Range(0, chunks).Select(id => input.Where(s => s.GetHashCode() % chunks == id));
}
Just putting in my two cents. If you wanted to "bucket" the list (visualize left to right), you could do the following:
public static List<List<T>> Buckets<T>(this List<T> source, int numberOfBuckets)
{
List<List<T>> result = new List<List<T>>();
for (int i = 0; i < numberOfBuckets; i++)
{
result.Add(new List<T>());
}
int count = 0;
while (count < source.Count())
{
var mod = count % numberOfBuckets;
result[mod].Add(source[count]);
count++;
}
return result;
}
public static List<List<T>> GetSplitItemsList<T>(List<T> originalItemsList, short number)
{
var listGroup = new List<List<T>>();
int j = number;
for (int i = 0; i < originalItemsList.Count; i += number)
{
var cList = originalItemsList.Take(j).Skip(i).ToList();
j += number;
listGroup.Add(cList);
}
return listGroup;
}
To insert my two cents...
By using the list type for the source to be chunked, I found another very compact solution:
public static IEnumerable<IEnumerable<TSource>> Chunk<TSource>(this IEnumerable<TSource> source, int chunkSize)
{
// copy the source into a list
var chunkList = source.ToList();
// return chunks of 'chunkSize' items
while (chunkList.Count > chunkSize)
{
yield return chunkList.GetRange(0, chunkSize);
chunkList.RemoveRange(0, chunkSize);
}
// return the rest
yield return chunkList;
}
I took the primary answer and made it to be an IOC container to determine where to split. (For who is really looking to only split on 3 items, in reading this post while searching for an answer?)
This method allows one to split on any type of item as needed.
public static List<List<T>> SplitOn<T>(List<T> main, Func<T, bool> splitOn)
{
int groupIndex = 0;
return main.Select( item => new
{
Group = (splitOn.Invoke(item) ? ++groupIndex : groupIndex),
Value = item
})
.GroupBy( it2 => it2.Group)
.Select(x => x.Select(v => v.Value).ToList())
.ToList();
}
So for the OP the code would be
var it = new List<string>()
{ "a", "g", "e", "w", "p", "s", "q", "f", "x", "y", "i", "m", "c" };
int index = 0;
var result = SplitOn(it, (itm) => (index++ % 3) == 0 );
So performatic as the Sam Saffron's approach.
public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> source, int size)
{
if (source == null) throw new ArgumentNullException(nameof(source));
if (size <= 0) throw new ArgumentOutOfRangeException(nameof(size), "Size must be greater than zero.");
return BatchImpl(source, size).TakeWhile(x => x.Any());
}
static IEnumerable<IEnumerable<T>> BatchImpl<T>(this IEnumerable<T> source, int size)
{
var values = new List<T>();
var group = 1;
var disposed = false;
var e = source.GetEnumerator();
try
{
while (!disposed)
{
yield return GetBatch(e, values, group, size, () => { e.Dispose(); disposed = true; });
group++;
}
}
finally
{
if (!disposed)
e.Dispose();
}
}
static IEnumerable<T> GetBatch<T>(IEnumerator<T> e, List<T> values, int group, int size, Action dispose)
{
var min = (group - 1) * size + 1;
var max = group * size;
var hasValue = false;
while (values.Count < min && e.MoveNext())
{
values.Add(e.Current);
}
for (var i = min; i <= max; i++)
{
if (i <= values.Count)
{
hasValue = true;
}
else if (hasValue = e.MoveNext())
{
values.Add(e.Current);
}
else
{
dispose();
}
if (hasValue)
yield return values[i - 1];
else
yield break;
}
}
}
Can work with infinite generators:
a.Zip(a.Skip(1), (x, y) => Enumerable.Repeat(x, 1).Concat(Enumerable.Repeat(y, 1)))
.Zip(a.Skip(2), (xy, z) => xy.Concat(Enumerable.Repeat(z, 1)))
.Where((x, i) => i % 3 == 0)
Demo code: https://ideone.com/GKmL7M
using System;
using System.Collections.Generic;
using System.Linq;
public class Test
{
private static void DoIt(IEnumerable<int> a)
{
Console.WriteLine(String.Join(" ", a));
foreach (var x in a.Zip(a.Skip(1), (x, y) => Enumerable.Repeat(x, 1).Concat(Enumerable.Repeat(y, 1))).Zip(a.Skip(2), (xy, z) => xy.Concat(Enumerable.Repeat(z, 1))).Where((x, i) => i % 3 == 0))
Console.WriteLine(String.Join(" ", x));
Console.WriteLine();
}
public static void Main()
{
DoIt(new int[] {1});
DoIt(new int[] {1, 2});
DoIt(new int[] {1, 2, 3});
DoIt(new int[] {1, 2, 3, 4});
DoIt(new int[] {1, 2, 3, 4, 5});
DoIt(new int[] {1, 2, 3, 4, 5, 6});
}
}
1
1 2
1 2 3
1 2 3
1 2 3 4
1 2 3
1 2 3 4 5
1 2 3
1 2 3 4 5 6
1 2 3
4 5 6
But actually I would prefer to write corresponding method without linq.
I am trying to develop an optimization function that will determine which elements in list of doubles when added together will be less than a specified threshold values. The elements can be used multiple times.
For example if my list of elements is
{1,3,7,10}
and my threshold is 20 I would expect my result to be
1
3
7
10
10, 10
10, 7
10, 7, 3
10,7,1
10,7,1,1
10,7,1,1,1
7,7
7,7,3
7,7,1
7,7,1,1
7,7,1,1,1
...
I expect that the answer to this question will probably be a recursive call and probably could be found in a textbook, but I don't know how to properly phrase the question to find the answer. Help from this group of experts would be appreciated.
This program works, and seems to be the simplest solution. All results are sorted ascending.
private static final HashSet<ArrayList<Double>> lists =
new HashSet<ArrayList<Double>>(); // all of the combinations generated
private static final double[] elements = {10, 7, 3, 1};
public static void main(String[] args) {
combine(20, new ArrayList<Double>());
for (ArrayList<Double> set : lists) {
System.out.println(set);
}
}
private static void combine(final double limit, ArrayList<Double> stack) {
// iterates through the elements that fit in the threshold
for (double item : elements) {
if (item < limit) {
final ArrayList<Double> nextStack = new ArrayList<Double>(stack);
nextStack.add(item);
// a sort is necessary to let the HashSet de-dup properly
Collections.sort(nextStack);
lists.add(nextStack);
combine(limit - item, nextStack);
}
}
}
This type of combinatoric problem, though, generates many results. If you are more concerned with performance than code readability and simplicity, I can optimize further.
c#:
static void Main(string[] args)
{
Run();
}
static public void Run()
{
Combine(20, new List<Double>());
foreach (List<Double> set in lists)
{
Debug.Print(set.ToString());
}
}
private static HashSet<List<Double>> lists =
new HashSet<List<Double>>(); // all of the combinations generated
private static double[] elements = { 10, 7, 3, 1 };
private static void Combine(double limit, List<Double> stack)
{
// iterates through the elements that fit in the threshold
foreach (double item in elements)
{
if (item < limit)
{
List<Double> nextStack = new List<Double>(stack);
nextStack.Add(item);
// a sort is necessary to let the HashSet de-dup properly
nextStack.Sort();
lists.Add(nextStack);
Combine(limit - item, nextStack);
}
}
}
I'm not sure if the Sort() is needed for detecting correctly duplicate entries but this code should work:
private List<int[]> CombinedElementsInArrayLessThanValue(int[] foo, int value)
{
List<int[]> list = new List<int[]>();
for (int i = 0; i < foo.Length; i++)
{
List<int> start = new List<int>();
start.Add(foo[i]);
start.Sort();
int[] clone = start.ToArray();
if (start.Sum() < value && !list.Contains(clone))
{
list.Add(clone);
CombinedElementsInArrayLessThanValue(foo, value, start, list);
}
}
return list;
}
private void CombinedElementsInArrayLessThanValue(int[] foo, int value, List<int> partial, List<int[]> accumulate_result)
{
for (int i = 0; i < foo.Length; i++)
{
List<int> clone = new List<int>(partial);
clone.Add(foo[i]);
clone.Sort();
int[] array = clone.ToArray();
if (clone.Sum() < value && !accumulate_result.Contains(array))
{
accumulate_result.Add(array);
CombinedElementsInArrayLessThanValue(foo, value, clone, accumulate_result);
}
}
}
Process one item in the list at a time, and let the recursion handle one item completely in order to shorten the "depth" of the recursion.
public static List<int[]> Combine(int[] elements, int maxValue)
{
LinkedList<int[]> result = new LinkedList<int[]>();
List<int> listElements = new List<int>(elements);
listElements.Sort();
Combine(listElements.ToArray(), maxValue, new int[0], result);
return result.ToList();
}
private static void Combine(int[] elements, int maxValue, int[] stack, LinkedList<int[]> result)
{
if(elements.Length > 0 && maxValue >= elements[0])
{
var newElements = elements.Skip(1).ToArray();
for (int i = maxValue / elements[0]; i > 0; i--)
{
result.AddLast(stack.Concat(Enumerable.Repeat(elements[0], i)).ToArray());
Combine(newElements, maxValue - i*elements[0], result.Last(), result);
}
Combine(newElements, maxValue, stack, result);
}
}
What is the fastest way to union 2 sets of sorted values? Speed (big-O) is important here; not clarity - assume this is being done millions of times.
Assume you do not know the type or range of the values, but have an efficent IComparer<T> and/or IEqualityComparer<T>.
Given the following set of numbers:
var la = new int[] { 1, 2, 4, 5, 9 };
var ra = new int[] { 3, 4, 5, 6, 6, 7, 8 };
I am expecting 1, 2, 3, 4, 5, 6, 7, 8, 9. The following stub may be used to test the code:
static void Main(string[] args)
{
var la = new int[] { 1, 2, 4, 5, 9 };
var ra = new int[] { 3, 4, 5, 6, 6, 7, 8 };
foreach (var item in UnionSorted(la, ra, Int32Comparer.Default))
{
Console.Write("{0}, ", item);
}
Console.ReadLine();
}
class Int32Comparer : IComparer<Int32>
{
public static readonly Int32Comparer Default = new Int32Comparer();
public int Compare(int x, int y)
{
if (x < y)
return -1;
else if (x > y)
return 1;
else
return 0;
}
}
static IEnumerable<T> UnionSorted<T>(IEnumerable<T> sortedLeft, IEnumerable<T> sortedRight, IComparer<T> comparer)
{
}
The following method returns the correct results:
static IEnumerable<T> UnionSorted<T>(IEnumerable<T> sortedLeft, IEnumerable<T> sortedRight, IComparer<T> comparer)
{
var first = true;
var continueLeft = true;
var continueRight = true;
T left = default(T);
T right = default(T);
using (var el = sortedLeft.GetEnumerator())
using (var er = sortedRight.GetEnumerator())
{
// Loop until both enumeration are done.
while (continueLeft | continueRight)
{
// Only if both enumerations have values.
if (continueLeft & continueRight)
{
// Seed the enumeration.
if (first)
{
continueLeft = el.MoveNext();
if (continueLeft)
{
left = el.Current;
}
else
{
// left is empty, just dump the right enumerable
while (er.MoveNext())
yield return er.Current;
yield break;
}
continueRight = er.MoveNext();
if (continueRight)
{
right = er.Current;
}
else
{
// right is empty, just dump the left enumerable
if (continueLeft)
{
// there was a value when it was read earlier, let's return it before continuing
do
{
yield return el.Current;
}
while (el.MoveNext());
} // if continueLeft is false, then both enumerable are empty here.
yield break;
}
first = false;
}
// Compare them and decide which to return.
var comp = comparer.Compare(left, right);
if (comp < 0)
{
yield return left;
// We only advance left until they match.
continueLeft = el.MoveNext();
if (continueLeft)
left = el.Current;
}
else if (comp > 0)
{
yield return right;
continueRight = er.MoveNext();
if (continueRight)
right = er.Current;
}
else
{
// The both match, so advance both.
yield return left;
continueLeft = el.MoveNext();
if (continueLeft)
left = el.Current;
continueRight = er.MoveNext();
if (continueRight)
right = er.Current;
}
}
// One of the lists is done, don't advance it.
else if (continueLeft)
{
yield return left;
continueLeft = el.MoveNext();
if (continueLeft)
left = el.Current;
}
else if (continueRight)
{
yield return right;
continueRight = er.MoveNext();
if (continueRight)
right = er.Current;
}
}
}
}
The space is ~O(6) and time ~O(max(n,m)) (where m is the second set).
This will make your UnionSorted function a little less versatile, but you can make a small improvement by making an assumption about types. If you do the comparison inside the loop itself (rather than calling the Int32Comparer) then that'll save on some function call overhead.
So your UnionSorted declaration becomes this...
static IEnumerable<int> UnionSorted(IEnumerable<int> sortedLeft, IEnumerable<int> sortedRight)
And then you do this inside the loop, getting rid of the call to comparer.Compare()...
//var comp = comparer.Compare(left, right); // too slow
int comp = 0;
if (left < right)
comp = -1;
else if (left > right)
comp = 1;
In my testing this was about 15% faster.
I'm going to give LINQ the benefit of the doubt and say this is probably as fast as you are going to get without writing excessive code:
var result = la.Union(ra);
EDITED:
Thanks, I missed the sorted part.
You could do:
var result = la.Union(ra).OrderBy(i => i);
I would solve the problem this way. (I am making an assumption which lightens the difficulty of this problem significantly, only to illustrate the idea.)
Assumption: All numbers contained in sets are non-negative.
Create a word of at least n bits, where n is the largest value you expect. (If the largest value you expect is 12, then you must create a word of 16 bits.).
Iterate through both sets. For each value, val, or the val-th bit with 1.
Once done, count the amount of bits set to 1. Create an array of that size.
Go through each bit one by one, adding n to the new array if the n-th bit is set.
I have been stumped on this one for a while. I want to take a List and order the list such that the Products with the largest Price end up in the middle of the list. And I also want to do the opposite, i.e. make sure that the items with the largest price end up on the outer boundaries of the list.
Imagine a data structure like this.. 1,2,3,4,5,6,7,8,9,10
In the first scenario I need to get back 1,3,5,7,9,10,8,6,4,2
In the second scenario I need to get back 10,8,6,4,2,1,3,5,7,9
The list may have upwards of 250 items, the numbers will not be evenly distributed, and they will not be sequential, and I wanted to minimize copying. The numbers will be contained in Product objects, and not simple primitive integers.
Is there a simple solution that I am not seeing?
Any thoughts.
So for those of you wondering what I am up to, I am ordering items based on calculated font size. Here is the code that I went with...
The Implementation...
private void Reorder()
{
var tempList = new LinkedList<DisplayTag>();
bool even = true;
foreach (var tag in this) {
if (even)
tempList.AddLast(tag);
else
tempList.AddFirst(tag);
even = !even;
}
this.Clear();
this.AddRange(tempList);
}
The Test...
[TestCase(DisplayTagOrder.SmallestToLargest, Result=new[]{10,14,18,22,26,30})]
[TestCase(DisplayTagOrder.LargestToSmallest, Result=new[]{30,26,22,18,14,10})]
[TestCase(DisplayTagOrder.LargestInTheMiddle, Result = new[] { 10, 18, 26, 30, 22, 14 })]
[TestCase(DisplayTagOrder.LargestOnTheEnds, Result = new[] { 30, 22, 14, 10, 18, 26 })]
public int[] CalculateFontSize_Orders_Tags_Appropriately(DisplayTagOrder sortOrder)
{
list.CloudOrder = sortOrder;
list.CalculateFontSize();
var result = (from displayTag in list select displayTag.FontSize).ToArray();
return result;
}
The Usage...
public void CalculateFontSize()
{
GetMaximumRange();
GetMinimunRange();
CalculateDelta();
this.ForEach((displayTag) => CalculateFontSize(displayTag));
OrderByFontSize();
}
private void OrderByFontSize()
{
switch (CloudOrder) {
case DisplayTagOrder.SmallestToLargest:
this.Sort((arg1, arg2) => arg1.FontSize.CompareTo(arg2.FontSize));
break;
case DisplayTagOrder.LargestToSmallest:
this.Sort(new LargestFirstComparer());
break;
case DisplayTagOrder.LargestInTheMiddle:
this.Sort(new LargestFirstComparer());
Reorder();
break;
case DisplayTagOrder.LargestOnTheEnds:
this.Sort();
Reorder();
break;
}
}
The appropriate data structure is a LinkedList because it allows you to efficiently add to either end:
LinkedList<int> result = new LinkedList<int>();
int[] array = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
Array.Sort(array);
bool odd = true;
foreach (var x in array)
{
if (odd)
result.AddLast(x);
else
result.AddFirst(x);
odd = !odd;
}
foreach (int item in result)
Console.Write("{0} ", item);
No extra copying steps, no reversing steps, ... just a small overhead per node for storage.
C# Iterator version
(Very simple code to satisfy all conditions.)
One function to rule them all! Doesn't use intermediate storage collection (see yield keyword). Orders the large numbers either to the middle, or to the sides depending on the argument. It's implemented as a C# iterator
// Pass forward sorted array for large middle numbers,
// or reverse sorted array for large side numbers.
//
public static IEnumerable<long> CurveOrder(long[] nums) {
if (nums == null || nums.Length == 0)
yield break; // Nothing to do.
// Move forward every two.
for (int i = 0; i < nums.Length; i+=2)
yield return nums[i];
// Move backward every other two. Note: Length%2 makes sure we're on the correct offset.
for (int i = nums.Length-1 - nums.Length%2; i >= 0; i-=2)
yield return nums[i];
}
Example Usage
For example with array long[] nums = { 1,2,3,4,5,6,7,8,9,10,11 };
Start with forward sort order, to bump high numbers into the middle.
Array.Sort(nums); //forward sort
// Array argument will be: { 1,2,3,4,5,6,7,8,9,10,11 };
long[] arrLargeMiddle = CurveOrder(nums).ToArray();
Produces: 1 3 5 7 9 11 10 8 6 4 2
Or, Start with reverse sort order, to push high numbers to sides.
Array.Reverse(nums); //reverse sort
// Array argument will be: { 11,10,9,8,7,6,5,4,3,2,1 };
long[] arrLargeSides = CurveOrder(nums).ToArray();
Produces: 11 9 7 5 3 1 2 4 6 8 10
Significant namespaces are:
using System;
using System.Collections.Generic;
using System.Linq;
Note: The iterator leaves the decision up to the caller about whether or not to use intermediate storage. The caller might simply be issuing a foreach loop over the results instead.
Extension Method Option
Optionally change the static method header to use the this modifier public static IEnumerable<long> CurveOrder(this long[] nums) { and put it inside a static class in your namespace;
Then call the order method directly on any long[ ] array instance like so:
Array.Reverse(nums); //reverse sort
// Array argument will be: { 11,10,9,8,7,6,5,4,3,2,1 };
long[] arrLargeSides = nums.CurveOrder().ToArray();
Just some (unneeded) syntactic sugar to mix things up a bit for fun. This can be applied to any answers to your question that take an array argument.
I might go for something like this
static T[] SortFromMiddleOut<T, U>(IList<T> list, Func<T, U> orderSelector, bool largestInside) where U : IComparable<U>
{
T[] sortedArray = new T[list.Count];
bool add = false;
int index = (list.Count / 2);
int iterations = 0;
IOrderedEnumerable<T> orderedList;
if (largestInside)
orderedList = list.OrderByDescending(orderSelector);
else
orderedList = list.OrderBy(orderSelector);
foreach (T item in orderedList)
{
sortedArray[index] = item;
if (add)
index += ++iterations;
else
index -= ++iterations;
add = !add;
}
return sortedArray;
}
Sample invocations:
int[] array = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
int[] sortedArray = SortFromMiddleOut(array, i => i, false);
foreach (int item in sortedArray)
Console.Write("{0} ", item);
Console.Write("\n");
sortedArray = SortFromMiddleOut(array, i => i, true);
foreach (int item in sortedArray)
Console.Write("{0} ", item);
With it being generic, it could be a list of Foo and the order selector could be f => f.Name or whatever you want to throw at it.
The fastest (but not the clearest) solution is probably to simply calculate the new index for each element:
Array.Sort(array);
int length = array.Length;
int middle = length / 2;
int[] result2 = new int[length];
for (int i = 0; i < array.Length; i++)
{
result2[middle + (1 - 2 * (i % 2)) * ((i + 1) / 2)] = array[i];
}
Something like this?
public IEnumerable<int> SortToMiddle(IEnumerable<int> input)
{
var sorted = new List<int>(input);
sorted.Sort();
var firstHalf = new List<int>();
var secondHalf = new List<int>();
var sendToFirst = true;
foreach (var current in sorted)
{
if (sendToFirst)
{
firstHalf.Add(current);
}
else
{
secondHalf.Add(current);
}
sendToFirst = !sendToFirst;
}
//to get the highest values on the outside just reverse
//the first list instead of the second
secondHalf.Reverse();
return firstHalf.Concat(secondHalf);
}
For your specific (general) case (assuming unique keys):
public static IEnumerable<T> SortToMiddle<T, TU>(IEnumerable<T> input, Func<T, TU> getSortKey)
{
var sorted = new List<TU>(input.Select(getSortKey));
sorted.Sort();
var firstHalf = new List<TU>();
var secondHalf = new List<TU>();
var sendToFirst = true;
foreach (var current in sorted)
{
if (sendToFirst)
{
firstHalf.Add(current);
}
else
{
secondHalf.Add(current);
}
sendToFirst = !sendToFirst;
}
//to get the highest values on the outside just reverse
//the first list instead of the second
secondHalf.Reverse();
sorted = new List<TU>(firstHalf.Concat(secondHalf));
//This assumes the sort keys are unique - if not, the implementation
//needs to use a SortedList<TU, T>
return sorted.Select(s => input.First(t => s.Equals(getSortKey(t))));
}
And assuming non-unique keys:
public static IEnumerable<T> SortToMiddle<T, TU>(IEnumerable<T> input, Func<T, TU> getSortKey)
{
var sendToFirst = true;
var sorted = new SortedList<TU, T>(input.ToDictionary(getSortKey, t => t));
var firstHalf = new SortedList<TU, T>();
var secondHalf = new SortedList<TU, T>();
foreach (var current in sorted)
{
if (sendToFirst)
{
firstHalf.Add(current.Key, current.Value);
}
else
{
secondHalf.Add(current.Key, current.Value);
}
sendToFirst = !sendToFirst;
}
//to get the highest values on the outside just reverse
//the first list instead of the second
secondHalf.Reverse();
return(firstHalf.Concat(secondHalf)).Select(kvp => kvp.Value);
}
Simplest solution - order the list descending, create two new lists, into the first place every odd-indexed item, into the other every even indexed item. Reverse the first list then append the second to the first.
Okay, I'm not going to question your sanity here since I'm sure you wouldn't be asking the question if there weren't a good reason :-)
Here's how I'd approach it. Create a sorted list, then simply create another list by processing the keys in order, alternately inserting before and appending, something like:
sortedlist = list.sort (descending)
biginmiddle = new list()
state = append
foreach item in sortedlist:
if state == append:
biginmiddle.append (item)
state = prepend
else:
biginmiddle.insert (0, item)
state = append
This will give you a list where the big items are in the middle. Other items will fan out from the middle (in alternating directions) as needed:
1, 3, 5, 7, 9, 10, 8, 6, 4, 2
To get a list where the larger elements are at the ends, just replace the initial sort with an ascending one.
The sorted and final lists can just be pointers to the actual items (since you state they're not simple integers) - this will minimise both extra storage requirements and copying.
Maybe its not the best solution, but here's a nifty way...
Let Product[] parr be your array.
Disclaimer It's java, my C# is rusty.
Untested code, but you get the idea.
int plen = parr.length
int [] indices = new int[plen];
for(int i = 0; i < (plen/2); i ++)
indices[i] = 2*i + 1; // Line1
for(int i = (plen/2); i < plen; i++)
indices[i] = 2*(plen-i); // Line2
for(int i = 0; i < plen; i++)
{
if(i != indices[i])
swap(parr[i], parr[indices[i]]);
}
The second case, Something like this?
int plen = parr.length
int [] indices = new int[plen];
for(int i = 0; i <= (plen/2); i ++)
indices[i] = (plen^1) - 2*i;
for(int i = 0; i < (plen/2); i++)
indices[i+(plen/2)+1] = 2*i + 1;
for(int i = 0; i < plen; i++)
{
if(i != indices[i])
swap(parr[i], parr[indices[i]]);
}