Related
This question already has answers here:
Split List into Sublists with LINQ
(34 answers)
Closed 12 months ago.
I am attempting to split a list into a series of smaller lists.
My Problem: My function to split lists doesn't split them into lists of the correct size. It should split them into lists of size 30 but instead it splits them into lists of size 114?
How can I make my function split a list into X number of Lists of size 30 or less?
public static List<List<float[]>> splitList(List <float[]> locations, int nSize=30)
{
List<List<float[]>> list = new List<List<float[]>>();
for (int i=(int)(Math.Ceiling((decimal)(locations.Count/nSize))); i>=0; i--) {
List <float[]> subLocat = new List <float[]>(locations);
if (subLocat.Count >= ((i*nSize)+nSize))
subLocat.RemoveRange(i*nSize, nSize);
else subLocat.RemoveRange(i*nSize, subLocat.Count-(i*nSize));
Debug.Log ("Index: "+i.ToString()+", Size: "+subLocat.Count.ToString());
list.Add (subLocat);
}
return list;
}
If I use the function on a list of size 144 then the output is:
Index: 4, Size: 120
Index: 3, Size: 114
Index: 2, Size: 114
Index: 1, Size: 114
Index: 0, Size: 114
I would suggest to use this extension method to chunk the source list to the sub-lists by specified chunk size:
/// <summary>
/// Helper methods for the lists.
/// </summary>
public static class ListExtensions
{
public static List<List<T>> ChunkBy<T>(this List<T> source, int chunkSize)
{
return source
.Select((x, i) => new { Index = i, Value = x })
.GroupBy(x => x.Index / chunkSize)
.Select(x => x.Select(v => v.Value).ToList())
.ToList();
}
}
For example, if you chunk the list of 18 items by 5 items per chunk, it gives you the list of 4 sub-lists with the following items inside: 5-5-5-3.
NOTE: at the upcoming improvements to LINQ in .NET 6 chunking
will come out of the box like this:
const int PAGE_SIZE = 5;
IEnumerable<Movie[]> chunks = movies.Chunk(PAGE_SIZE);
public static List<List<float[]>> SplitList(List<float[]> locations, int nSize=30)
{
var list = new List<List<float[]>>();
for (int i = 0; i < locations.Count; i += nSize)
{
list.Add(locations.GetRange(i, Math.Min(nSize, locations.Count - i)));
}
return list;
}
Generic version:
public static IEnumerable<List<T>> SplitList<T>(List<T> locations, int nSize=30)
{
for (int i = 0; i < locations.Count; i += nSize)
{
yield return locations.GetRange(i, Math.Min(nSize, locations.Count - i));
}
}
how about:
while(locations.Any())
{
list.Add(locations.Take(nSize).ToList());
locations= locations.Skip(nSize).ToList();
}
Library MoreLinq have method called Batch
List<int> ids = new List<int>() { 1, 2, 3, 4, 5, 6, 7, 8, 9, 0 }; // 10 elements
int counter = 1;
foreach(var batch in ids.Batch(2))
{
foreach(var eachId in batch)
{
Console.WriteLine("Batch: {0}, Id: {1}", counter, eachId);
}
counter++;
}
Result is
Batch: 1, Id: 1
Batch: 1, Id: 2
Batch: 2, Id: 3
Batch: 2, Id: 4
Batch: 3, Id: 5
Batch: 3, Id: 6
Batch: 4, Id: 7
Batch: 4, Id: 8
Batch: 5, Id: 9
Batch: 5, Id: 0
ids are splitted into 5 chunks with 2 elements.
Update for .NET 6
var originalList = new List<int>{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}
// split into arrays of no more than three
IEnumerable<int[]> chunks = originalList.Chunk(3);
Prior to .NET 6
public static IEnumerable<IEnumerable<T>> SplitIntoSets<T>
(this IEnumerable<T> source, int itemsPerSet)
{
var sourceList = source as List<T> ?? source.ToList();
for (var index = 0; index < sourceList.Count; index += itemsPerSet)
{
yield return sourceList.Skip(index).Take(itemsPerSet);
}
}
Serj-Tm solution is fine, also this is the generic version as extension method for lists (put it into a static class):
public static List<List<T>> Split<T>(this List<T> items, int sliceSize = 30)
{
List<List<T>> list = new List<List<T>>();
for (int i = 0; i < items.Count; i += sliceSize)
list.Add(items.GetRange(i, Math.Min(sliceSize, items.Count - i)));
return list;
}
I find accepted answer (Serj-Tm) most robust, but I'd like to suggest a generic version.
public static List<List<T>> splitList<T>(List<T> locations, int nSize = 30)
{
var list = new List<List<T>>();
for (int i = 0; i < locations.Count; i += nSize)
{
list.Add(locations.GetRange(i, Math.Min(nSize, locations.Count - i)));
}
return list;
}
Addition after very useful comment of mhand at the end
Original answer
Although most solutions might work, I think they are not very efficiently. Suppose if you only want the first few items of the first few chunks. Then you wouldn't want to iterate over all (zillion) items in your sequence.
The following will at utmost enumerate twice: once for the Take and once for the Skip. It won't enumerate over any more elements than you will use:
public static IEnumerable<IEnumerable<TSource>> ChunkBy<TSource>
(this IEnumerable<TSource> source, int chunkSize)
{
while (source.Any()) // while there are elements left
{ // still something to chunk:
yield return source.Take(chunkSize); // return a chunk of chunkSize
source = source.Skip(chunkSize); // skip the returned chunk
}
}
How many times will this Enumerate the sequence?
Suppose you divide your source into chunks of chunkSize. You enumerate only the first N chunks. From every enumerated chunk you'll only enumerate the first M elements.
While(source.Any())
{
...
}
the Any will get the Enumerator, do 1 MoveNext() and returns the returned value after Disposing the Enumerator. This will be done N times
yield return source.Take(chunkSize);
According to the reference source this will do something like:
public static IEnumerable<TSource> Take<TSource>(this IEnumerable<TSource> source, int count)
{
return TakeIterator<TSource>(source, count);
}
static IEnumerable<TSource> TakeIterator<TSource>(IEnumerable<TSource> source, int count)
{
foreach (TSource element in source)
{
yield return element;
if (--count == 0) break;
}
}
This doesn't do a lot until you start enumerating over the fetched Chunk. If you fetch several Chunks, but decide not to enumerate over the first Chunk, the foreach is not executed, as your debugger will show you.
If you decide to take the first M elements of the first chunk then the yield return is executed exactly M times. This means:
get the enumerator
call MoveNext() and Current M times.
Dispose the enumerator
After the first chunk has been yield returned, we skip this first Chunk:
source = source.Skip(chunkSize);
Once again: we'll take a look at reference source to find the skipiterator
static IEnumerable<TSource> SkipIterator<TSource>(IEnumerable<TSource> source, int count)
{
using (IEnumerator<TSource> e = source.GetEnumerator())
{
while (count > 0 && e.MoveNext()) count--;
if (count <= 0)
{
while (e.MoveNext()) yield return e.Current;
}
}
}
As you see, the SkipIterator calls MoveNext() once for every element in the Chunk. It doesn't call Current.
So per Chunk we see that the following is done:
Any(): GetEnumerator; 1 MoveNext(); Dispose Enumerator;
Take():
nothing if the content of the chunk is not enumerated.
If the content is enumerated: GetEnumerator(), one MoveNext and one Current per enumerated item, Dispose enumerator;
Skip(): for every chunk that is enumerated (NOT the contents of the chunk):
GetEnumerator(), MoveNext() chunkSize times, no Current! Dispose enumerator
If you look at what happens with the enumerator, you'll see that there are a lot of calls to MoveNext(), and only calls to Current for the TSource items you actually decide to access.
If you take N Chunks of size chunkSize, then calls to MoveNext()
N times for Any()
not yet any time for Take, as long as you don't enumerate the Chunks
N times chunkSize for Skip()
If you decide to enumerate only the first M elements of every fetched chunk, then you need to call MoveNext M times per enumerated Chunk.
The total
MoveNext calls: N + N*M + N*chunkSize
Current calls: N*M; (only the items you really access)
So if you decide to enumerate all elements of all chunks:
MoveNext: numberOfChunks + all elements + all elements = about twice the sequence
Current: every item is accessed exactly once
Whether MoveNext is a lot of work or not, depends on the type of source sequence. For lists and arrays it is a simple index increment, with maybe an out of range check.
But if your IEnumerable is the result of a database query, make sure that the data is really materialized on your computer, otherwise the data will be fetched several times. DbContext and Dapper will properly transfer the data to local process before it can be accessed. If you enumerate the same sequence several times it is not fetched several times. Dapper returns an object that is a List, DbContext remembers that the data is already fetched.
It depends on your Repository whether it is wise to call AsEnumerable() or ToLists() before you start to divide the items in Chunks
While plenty of the answers above do the job, they all fail horribly on a never ending sequence (or a really long sequence). The following is a completely on-line implementation which guarantees best time and memory complexity possible. We only iterate the source enumerable exactly once and use yield return for lazy evaluation. The consumer could throw away the list on each iteration making the memory footprint equal to that of the list w/ batchSize number of elements.
public static IEnumerable<List<T>> BatchBy<T>(this IEnumerable<T> enumerable, int batchSize)
{
using (var enumerator = enumerable.GetEnumerator())
{
List<T> list = null;
while (enumerator.MoveNext())
{
if (list == null)
{
list = new List<T> {enumerator.Current};
}
else if (list.Count < batchSize)
{
list.Add(enumerator.Current);
}
else
{
yield return list;
list = new List<T> {enumerator.Current};
}
}
if (list?.Count > 0)
{
yield return list;
}
}
}
EDIT: Just now realizing the OP asks about breaking a List<T> into smaller List<T>, so my comments regarding infinite enumerables aren't applicable to the OP, but may help others who end up here. These comments were in response to other posted solutions that do use IEnumerable<T> as an input to their function, yet enumerate the source enumerable multiple times.
I have a generic method that would take any types include float, and it's been unit-tested, hope it helps:
/// <summary>
/// Breaks the list into groups with each group containing no more than the specified group size
/// </summary>
/// <typeparam name="T"></typeparam>
/// <param name="values">The values.</param>
/// <param name="groupSize">Size of the group.</param>
/// <returns></returns>
public static List<List<T>> SplitList<T>(IEnumerable<T> values, int groupSize, int? maxCount = null)
{
List<List<T>> result = new List<List<T>>();
// Quick and special scenario
if (values.Count() <= groupSize)
{
result.Add(values.ToList());
}
else
{
List<T> valueList = values.ToList();
int startIndex = 0;
int count = valueList.Count;
int elementCount = 0;
while (startIndex < count && (!maxCount.HasValue || (maxCount.HasValue && startIndex < maxCount)))
{
elementCount = (startIndex + groupSize > count) ? count - startIndex : groupSize;
result.Add(valueList.GetRange(startIndex, elementCount));
startIndex += elementCount;
}
}
return result;
}
As of .NET 6.0, you can use the LINQ extension Chunk<T>() to split enumerations into chunks. Docs
var chars = new List<char>() { 'h', 'e', 'l', 'l', 'o', 'w','o','r' ,'l','d' };
foreach (var batch in chars.Chunk(2))
{
foreach (var ch in batch)
{
// iterates 2 letters at a time
}
}
public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> items, int maxItems)
{
return items.Select((item, index) => new { item, index })
.GroupBy(x => x.index / maxItems)
.Select(g => g.Select(x => x.item));
}
How about this one? The idea was to use only one loop. And, who knows, maybe you're using only IList implementations thorough your code and you don't want to cast to List.
private IEnumerable<IList<T>> SplitList<T>(IList<T> list, int totalChunks)
{
IList<T> auxList = new List<T>();
int totalItems = list.Count();
if (totalChunks <= 0)
{
yield return auxList;
}
else
{
for (int i = 0; i < totalItems; i++)
{
auxList.Add(list[i]);
if ((i + 1) % totalChunks == 0)
{
yield return auxList;
auxList = new List<T>();
}
else if (i == totalItems - 1)
{
yield return auxList;
}
}
}
}
In .NET 6 you can just use source.Chunk(chunkSize)
A more generic version based on the accepted answer by Serj-Tm.
public static IEnumerable<IEnumerable<T>> Split<T>(IEnumerable<T> source, int size = 30)
{
var count = source.Count();
for (int i = 0; i < count; i += size)
{
yield return source
.Skip(Math.Min(size, count - i))
.Take(size);
}
}
One more
public static IList<IList<T>> SplitList<T>(this IList<T> list, int chunkSize)
{
var chunks = new List<IList<T>>();
List<T> chunk = null;
for (var i = 0; i < list.Count; i++)
{
if (i % chunkSize == 0)
{
chunk = new List<T>(chunkSize);
chunks.Add(chunk);
}
chunk.Add(list[i]);
}
return chunks;
}
public static List<List<T>> ChunkBy<T>(this List<T> source, int chunkSize)
{
var result = new List<List<T>>();
for (int i = 0; i < source.Count; i += chunkSize)
{
var rows = new List<T>();
for (int j = i; j < i + chunkSize; j++)
{
if (j >= source.Count) break;
rows.Add(source[j]);
}
result.Add(rows);
}
return result;
}
I had encountered this same need, and I used a combination of Linq's Skip() and Take() methods. I multiply the number I take by the number of iterations this far, and that gives me the number of items to skip, then I take the next group.
var categories = Properties.Settings.Default.MovementStatsCategories;
var items = summariesWithinYear
.Select(s => s.sku).Distinct().ToList();
//need to run by chunks of 10,000
var count = items.Count;
var counter = 0;
var numToTake = 10000;
while (count > 0)
{
var itemsChunk = items.Skip(numToTake * counter).Take(numToTake).ToList();
counter += 1;
MovementHistoryUtilities.RecordMovementHistoryStatsBulk(itemsChunk, categories, nLogger);
count -= numToTake;
}
Based on Dimitry Pavlov answere I would remove .ToList(). And also avoid the anonymous class.
Instead I like to use a struct which does not require a heap memory allocation. (A ValueTuple would also do job.)
public static IEnumerable<IEnumerable<TSource>> ChunkBy<TSource>(this IEnumerable<TSource> source, int chunkSize)
{
if (source is null)
{
throw new ArgumentNullException(nameof(source));
}
if (chunkSize <= 0)
{
throw new ArgumentOutOfRangeException(nameof(chunkSize), chunkSize, "The argument must be greater than zero.");
}
return source
.Select((x, i) => new ChunkedValue<TSource>(x, i / chunkSize))
.GroupBy(cv => cv.ChunkIndex)
.Select(g => g.Select(cv => cv.Value));
}
[StructLayout(LayoutKind.Auto)]
[DebuggerDisplay("{" + nameof(ChunkedValue<T>.ChunkIndex) + "}: {" + nameof(ChunkedValue<T>.Value) + "}")]
private struct ChunkedValue<T>
{
public ChunkedValue(T value, int chunkIndex)
{
this.ChunkIndex = chunkIndex;
this.Value = value;
}
public int ChunkIndex { get; }
public T Value { get; }
}
This can be used like the following which only iterates over the collection once and
also does not allocate any significant memory.
int chunkSize = 30;
foreach (var chunk in collection.ChunkBy(chunkSize))
{
foreach (var item in chunk)
{
// your code for item here.
}
}
If a concrete list is actually needed then I would do it like this:
int chunkSize = 30;
var chunkList = new List<List<T>>();
foreach (var chunk in collection.ChunkBy(chunkSize))
{
// create a list with the correct capacity to be able to contain one chunk
// to avoid the resizing (additional memory allocation and memory copy) within the List<T>.
var list = new List<T>(chunkSize);
list.AddRange(chunk);
chunkList.Add(list);
}
List<int> orginalList =new List<int>(){1,2,3,4,5,6,7,8,9,10,12};
Dictionary<int,List<int>> dic = new Dictionary <int,List<int>> ();
int batchcount = orginalList.Count/2; //To List into two 2 parts if you
want three give three
List<int> lst = new List<int>();
for (int i=0;i<orginalList.Count; i++)
{
lst.Add(orginalList[i]);
if (i % batchCount == 0 && i!=0)
{
Dic.Add(threadId, lst);
lst = new List<int>();**strong text**
threadId++;
}
}
if(lst.Count>0)
Dic.Add(threadId, lst); //in case if any dayleft
foreach(int BatchId in Dic.Keys)
{
Console.Writeline("BatchId:"+BatchId);
Console.Writeline('Batch Count:"+Dic[BatchId].Count);
}
in case you wanna split it with condition instead of fixed number :
///<summary>
/// splits a list based on a condition (similar to the split function for strings)
///</summary>
public static IEnumerable<List<T>> Split<T>(this IEnumerable<T> src, Func<T, bool> pred)
{
var list = new List<T>();
foreach(T item in src)
{
if(pred(item))
{
if(list != null && list.Count > 0)
yield return list;
list = new List<T>();
}
else
{
list.Add(item);
}
}
}
You can simply try the following code with only using LINQ :
public static IList<IList<T>> Split<T>(IList<T> source)
{
return source
.Select((x, i) => new { Index = i, Value = x })
.GroupBy(x => x.Index / 3)
.Select(x => x.Select(v => v.Value).ToList())
.ToList();
}
I wrote skip last method. When I call it with int array, I expect to only get 2 elements back, not 4.
What is wrong?
Thanks
public static class myclass
{
public static IEnumerable<T> SkipLast<T>(this IEnumerable<T> source, int n)
{
return source.Reverse().Skip(n).Reverse();
}
}
class Program
{
static void Main(string[] args)
{
int [] a = new int[] {5, 6, 7, 8};
ArrayList a1 = new ArrayList();
a.SkipLast(2);
for( int i = 0; i <a.Length; i++)
{
Console.Write(a[i]);
}
}
}
you need to call as
var newlist = a.SkipLast(2);
for( int i = 0; i <newlist.Count; i++)
{
Console.Write(newlist[i]);
}
your method returning skipped list, but your original list will not update
if you want to assign or update same list you can set the returned list back to original as a = a.SkipLast(2).ToArray();
You should assign the result, not just put a.SkipLast(2):
a = a.SkipLast(2).ToArray(); // <- if you want to change "a" and loop on a
for( int i = 0; i <a.Length; i++) { ...
When you do a.SkipLast(2) it creates IEnumerable<int> and then discards it;
The most readable solution, IMHO, is to use foreach which is very convenient with LINQ:
...
int [] a = new int[] {5, 6, 7, 8};
foreach(int item in a.SkipLast(2))
Console.Write(item);
The other replies have answered your question, but wouldn't a more efficient implementation be this (which doesn't involve making two copies of the array in order to reverse it twice). It does iterate the collection twice (or rather, once and then count-n accesses) though:
public static IEnumerable<T> SkipLast<T>(this IEnumerable<T> source, int n)
{
n = source.Count() - n;
return source.TakeWhile(_ => n-- > 0);
}
Actually, if source is a type that implements Count without iteration (such as an array or a List) this will only access the elements count-n times, so it will be extremely efficient for those types.
Here is a better solution that only iterates the sequence once. It's data requirements are such that it only needs a buffer with n elements, which makes it very efficient if n is small compared with the size of the sequence:
public static IEnumerable<T> SkipLast<T>(this IEnumerable<T> source, int n)
{
int count = 0;
T[] buffer = new T[n];
var iter = source.GetEnumerator();
while (iter.MoveNext())
{
if (count >= n)
yield return buffer[count%n];
buffer[count++%n] = iter.Current;
}
}
Change your code to,
foreach (var r in a.SkipLast(2))
{
Console.Write(r);
}
for three reasons,
The SkipLast function returns the mutated sequence, it doesn't change it directly.
What is the point of using an indexer with IEnumerable? It imposes a needless count.
This code is easy to read, easier to type and shows intent.
For a more efficient generic SkipLast see Matthew's buffer with enumerator.
Your example could use a more specialised SkipLast,
public static IEnumerable<T> SkipLast<T>(this IList<T> source, int n = 1)
{
for (var i = 0; i < (source.Count - n); i++)
{
yield return source[i];
}
}
Is there a more elegant way to implement going 5 items at a time than a for loop like this?
var q = Campaign_stats.OrderByDescending(c=>c.Leads).Select(c=>c.PID).Take(23);
var count = q.Count();
for (int i = 0; i < (count/5)+1; i++)
{
q.Skip(i*5).Take(5).Dump();
}
for(int i = 0; i <= count; i+=5)
{
}
So you want to efficiently call Dump() on every 5 items in q.
The solution you have now will re-iterate the IEnumerable<T> every time through the for loop. It may be more efficient to do something like this: (I don't know what your type is so I'm using T)
const int N = 5;
T[] ar = new T[N]; // Temporary array of N items.
int i=0;
foreach(var item in q) { // Just one iterator.
ar[i++] = item; // Store a reference to this item.
if (i == N) { // When we have N items,
ar.Dump(); // dump them,
i = 0; // and reset the array index.
}
}
// Dump the remaining items
if (i > 0) {
ar.Take(i).Dump();
}
This only uses one iterator. Considering your variable is named q, I'm assuming that is short for "query", which implies this is against a database. So using just one iterator may be very beneficial.
I may keep this code, and wrap it up in an extension method. How about "clump"?
public static IEnumerable<IEnumerable<T>> Clump<T>(this IEnumerable<T> items, int clumpSize) {
T[] ar = new T[clumpSize];
int i=0;
foreach(var item in items) {
ar[i++] = item;
if (i == clumpSize) {
yield return ar;
i = 0;
}
}
if (i > 0)
yield return ar.Take(i);
}
Calling it in the context of your code:
foreach (var clump in q.Clump(5)) {
clump.Dump();
}
try iterating by 5 instead!
for(int i = 0; i < count; i += 5)
{
//etc
}
Adding more LINQ with GroupBy and Zip:
q
// add indexes
.Zip(Enumerable.Range(0, Int32.MaxValue),(a,index)=> new {Index=index, Value=a})
.GroupBy(m=>m.Index /5) // divide in groups by 5 items each
.Select(k => {
k.Select(v => v.Value).Dump(); // Perform operation on 5 elements
return k.Key; // return something to satisfy Select.
});
Given a collection, is there a way to get the last N elements of that collection? If there isn't a method in the framework, what would be the best way to write an extension method to do this?
collection.Skip(Math.Max(0, collection.Count() - N));
This approach preserves item order without a dependency on any sorting, and has broad compatibility across several LINQ providers.
It is important to take care not to call Skip with a negative number. Some providers, such as the Entity Framework, will produce an ArgumentException when presented with a negative argument. The call to Math.Max avoids this neatly.
The class below has all of the essentials for extension methods, which are: a static class, a static method, and use of the this keyword.
public static class MiscExtensions
{
// Ex: collection.TakeLast(5);
public static IEnumerable<T> TakeLast<T>(this IEnumerable<T> source, int N)
{
return source.Skip(Math.Max(0, source.Count() - N));
}
}
A brief note on performance:
Because the call to Count() can cause enumeration of certain data structures, this approach has the risk of causing two passes over the data. This isn't really a problem with most enumerables; in fact, optimizations exist already for Lists, Arrays, and even EF queries to evaluate the Count() operation in O(1) time.
If, however, you must use a forward-only enumerable and would like to avoid making two passes, consider a one-pass algorithm like Lasse V. Karlsen or Mark Byers describe. Both of these approaches use a temporary buffer to hold items while enumerating, which are yielded once the end of the collection is found.
coll.Reverse().Take(N).Reverse().ToList();
public static IEnumerable<T> TakeLast<T>(this IEnumerable<T> coll, int N)
{
return coll.Reverse().Take(N).Reverse();
}
UPDATE: To address clintp's problem: a) Using the TakeLast() method I defined above solves the problem, but if you really want the do it without the extra method, then you just have to recognize that while Enumerable.Reverse() can be used as an extension method, you aren't required to use it that way:
List<string> mystring = new List<string>() { "one", "two", "three" };
mystring = Enumerable.Reverse(mystring).Take(2).Reverse().ToList();
.NET Core 2.0+ provides the LINQ method TakeLast():
https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable.takelast
example:
Enumerable
.Range(1, 10)
.TakeLast(3) // <--- takes last 3 items
.ToList()
.ForEach(i => System.Console.WriteLine(i))
// outputs:
// 8
// 9
// 10
Note: I missed your question title which said Using Linq, so my answer does not in fact use Linq.
If you want to avoid caching a non-lazy copy of the entire collection, you could write a simple method that does it using a linked list.
The following method will add each value it finds in the original collection into a linked list, and trim the linked list down to the number of items required. Since it keeps the linked list trimmed to this number of items the entire time through iterating through the collection, it will only keep a copy of at most N items from the original collection.
It does not require you to know the number of items in the original collection, nor iterate over it more than once.
Usage:
IEnumerable<int> sequence = Enumerable.Range(1, 10000);
IEnumerable<int> last10 = sequence.TakeLast(10);
...
Extension method:
public static class Extensions
{
public static IEnumerable<T> TakeLast<T>(this IEnumerable<T> collection,
int n)
{
if (collection == null)
throw new ArgumentNullException(nameof(collection));
if (n < 0)
throw new ArgumentOutOfRangeException(nameof(n), $"{nameof(n)} must be 0 or greater");
LinkedList<T> temp = new LinkedList<T>();
foreach (var value in collection)
{
temp.AddLast(value);
if (temp.Count > n)
temp.RemoveFirst();
}
return temp;
}
}
Here's a method that works on any enumerable but uses only O(N) temporary storage:
public static class TakeLastExtension
{
public static IEnumerable<T> TakeLast<T>(this IEnumerable<T> source, int takeCount)
{
if (source == null) { throw new ArgumentNullException("source"); }
if (takeCount < 0) { throw new ArgumentOutOfRangeException("takeCount", "must not be negative"); }
if (takeCount == 0) { yield break; }
T[] result = new T[takeCount];
int i = 0;
int sourceCount = 0;
foreach (T element in source)
{
result[i] = element;
i = (i + 1) % takeCount;
sourceCount++;
}
if (sourceCount < takeCount)
{
takeCount = sourceCount;
i = 0;
}
for (int j = 0; j < takeCount; ++j)
{
yield return result[(i + j) % takeCount];
}
}
}
Usage:
List<int> l = new List<int> {4, 6, 3, 6, 2, 5, 7};
List<int> lastElements = l.TakeLast(3).ToList();
It works by using a ring buffer of size N to store the elements as it sees them, overwriting old elements with new ones. When the end of the enumerable is reached the ring buffer contains the last N elements.
I am surprised that no one has mentioned it, but SkipWhile does have a method that uses the element's index.
public static IEnumerable<T> TakeLastN<T>(this IEnumerable<T> source, int n)
{
if (source == null)
throw new ArgumentNullException("Source cannot be null");
int goldenIndex = source.Count() - n;
return source.SkipWhile((val, index) => index < goldenIndex);
}
//Or if you like them one-liners (in the spirit of the current accepted answer);
//However, this is most likely impractical due to the repeated calculations
collection.SkipWhile((val, index) => index < collection.Count() - N)
The only perceivable benefit that this solution presents over others is that you can have the option to add in a predicate to make a more powerful and efficient LINQ query, instead of having two separate operations that traverse the IEnumerable twice.
public static IEnumerable<T> FilterLastN<T>(this IEnumerable<T> source, int n, Predicate<T> pred)
{
int goldenIndex = source.Count() - n;
return source.SkipWhile((val, index) => index < goldenIndex && pred(val));
}
Use EnumerableEx.TakeLast in RX's System.Interactive assembly. It's an O(N) implementation like #Mark's, but it uses a queue rather than a ring-buffer construct (and dequeues items when it reaches buffer capacity).
(NB: This is the IEnumerable version - not the IObservable version, though the implementation of the two is pretty much identical)
If you are dealing with a collection with a key (e.g. entries from a database) a quick (i.e. faster than the selected answer) solution would be
collection.OrderByDescending(c => c.Key).Take(3).OrderBy(c => c.Key);
If you don't mind dipping into Rx as part of the monad, you can use TakeLast:
IEnumerable<int> source = Enumerable.Range(1, 10000);
IEnumerable<int> lastThree = source.AsObservable().TakeLast(3).AsEnumerable();
I tried to combine efficiency and simplicity and end up with this :
public static IEnumerable<T> TakeLast<T>(this IEnumerable<T> source, int count)
{
if (source == null) { throw new ArgumentNullException("source"); }
Queue<T> lastElements = new Queue<T>();
foreach (T element in source)
{
lastElements.Enqueue(element);
if (lastElements.Count > count)
{
lastElements.Dequeue();
}
}
return lastElements;
}
About
performance : In C#, Queue<T> is implemented using a circular buffer so there is no object instantiation done each loop (only when the queue is growing up). I did not set queue capacity (using dedicated constructor) because someone might call this extension with count = int.MaxValue . For extra performance you might check if source implement IList<T> and if yes, directly extract the last values using array indexes.
If using a third-party library is an option, MoreLinq defines TakeLast() which does exactly this.
It is a little inefficient to take the last N of a collection using LINQ as all the above solutions require iterating across the collection. TakeLast(int n) in System.Interactive also has this problem.
If you have a list a more efficient thing to do is slice it using the following method
/// Select from start to end exclusive of end using the same semantics
/// as python slice.
/// <param name="list"> the list to slice</param>
/// <param name="start">The starting index</param>
/// <param name="end">The ending index. The result does not include this index</param>
public static List<T> Slice<T>
(this IReadOnlyList<T> list, int start, int? end = null)
{
if (end == null)
{
end = list.Count();
}
if (start < 0)
{
start = list.Count + start;
}
if (start >= 0 && end.Value > 0 && end.Value > start)
{
return list.GetRange(start, end.Value - start);
}
if (end < 0)
{
return list.GetRange(start, (list.Count() + end.Value) - start);
}
if (end == start)
{
return new List<T>();
}
throw new IndexOutOfRangeException(
"count = " + list.Count() +
" start = " + start +
" end = " + end);
}
with
public static List<T> GetRange<T>( this IReadOnlyList<T> list, int index, int count )
{
List<T> r = new List<T>(count);
for ( int i = 0; i < count; i++ )
{
int j=i + index;
if ( j >= list.Count )
{
break;
}
r.Add(list[j]);
}
return r;
}
and some test cases
[Fact]
public void GetRange()
{
IReadOnlyList<int> l = new List<int>() { 0, 10, 20, 30, 40, 50, 60 };
l
.GetRange(2, 3)
.ShouldAllBeEquivalentTo(new[] { 20, 30, 40 });
l
.GetRange(5, 10)
.ShouldAllBeEquivalentTo(new[] { 50, 60 });
}
[Fact]
void SliceMethodShouldWork()
{
var list = new List<int>() { 1, 3, 5, 7, 9, 11 };
list.Slice(1, 4).ShouldBeEquivalentTo(new[] { 3, 5, 7 });
list.Slice(1, -2).ShouldBeEquivalentTo(new[] { 3, 5, 7 });
list.Slice(1, null).ShouldBeEquivalentTo(new[] { 3, 5, 7, 9, 11 });
list.Slice(-2)
.Should()
.BeEquivalentTo(new[] {9, 11});
list.Slice(-2,-1 )
.Should()
.BeEquivalentTo(new[] {9});
}
I know it's to late to answer this question. But if you are working with collection of type IList<> and you don't care about an order of the returned collection, then this method is working faster. I've used Mark Byers answer and made a little changes. So now method TakeLast is:
public static IEnumerable<T> TakeLast<T>(IList<T> source, int takeCount)
{
if (source == null) { throw new ArgumentNullException("source"); }
if (takeCount < 0) { throw new ArgumentOutOfRangeException("takeCount", "must not be negative"); }
if (takeCount == 0) { yield break; }
if (source.Count > takeCount)
{
for (int z = source.Count - 1; takeCount > 0; z--)
{
takeCount--;
yield return source[z];
}
}
else
{
for(int i = 0; i < source.Count; i++)
{
yield return source[i];
}
}
}
For test I have used Mark Byers method and kbrimington's andswer. This is test:
IList<int> test = new List<int>();
for(int i = 0; i<1000000; i++)
{
test.Add(i);
}
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
IList<int> result = TakeLast(test, 10).ToList();
stopwatch.Stop();
Stopwatch stopwatch1 = new Stopwatch();
stopwatch1.Start();
IList<int> result1 = TakeLast2(test, 10).ToList();
stopwatch1.Stop();
Stopwatch stopwatch2 = new Stopwatch();
stopwatch2.Start();
IList<int> result2 = test.Skip(Math.Max(0, test.Count - 10)).Take(10).ToList();
stopwatch2.Stop();
And here are results for taking 10 elements:
and for taking 1000001 elements results are:
Here's my solution:
public static class EnumerationExtensions
{
public static IEnumerable<T> TakeLast<T>(this IEnumerable<T> input, int count)
{
if (count <= 0)
yield break;
var inputList = input as IList<T>;
if (inputList != null)
{
int last = inputList.Count;
int first = last - count;
if (first < 0)
first = 0;
for (int i = first; i < last; i++)
yield return inputList[i];
}
else
{
// Use a ring buffer. We have to enumerate the input, and we don't know in advance how many elements it will contain.
T[] buffer = new T[count];
int index = 0;
count = 0;
foreach (T item in input)
{
buffer[index] = item;
index = (index + 1) % buffer.Length;
count++;
}
// The index variable now points at the next buffer entry that would be filled. If the buffer isn't completely
// full, then there are 'count' elements preceding index. If the buffer *is* full, then index is pointing at
// the oldest entry, which is the first one to return.
//
// If the buffer isn't full, which means that the enumeration has fewer than 'count' elements, we'll fix up
// 'index' to point at the first entry to return. That's easy to do; if the buffer isn't full, then the oldest
// entry is the first one. :-)
//
// We'll also set 'count' to the number of elements to be returned. It only needs adjustment if we've wrapped
// past the end of the buffer and have enumerated more than the original count value.
if (count < buffer.Length)
index = 0;
else
count = buffer.Length;
// Return the values in the correct order.
while (count > 0)
{
yield return buffer[index];
index = (index + 1) % buffer.Length;
count--;
}
}
}
public static IEnumerable<T> SkipLast<T>(this IEnumerable<T> input, int count)
{
if (count <= 0)
return input;
else
return input.SkipLastIter(count);
}
private static IEnumerable<T> SkipLastIter<T>(this IEnumerable<T> input, int count)
{
var inputList = input as IList<T>;
if (inputList != null)
{
int first = 0;
int last = inputList.Count - count;
if (last < 0)
last = 0;
for (int i = first; i < last; i++)
yield return inputList[i];
}
else
{
// Aim to leave 'count' items in the queue. If the input has fewer than 'count'
// items, then the queue won't ever fill and we return nothing.
Queue<T> elements = new Queue<T>();
foreach (T item in input)
{
elements.Enqueue(item);
if (elements.Count > count)
yield return elements.Dequeue();
}
}
}
}
The code is a bit chunky, but as a drop-in reusable component, it should perform as well as it can in most scenarios, and it'll keep the code that's using it nice and concise. :-)
My TakeLast for non-IList`1 is based on the same ring buffer algorithm as that in the answers by #Mark Byers and #MackieChan further up. It's interesting how similar they are -- I wrote mine completely independently. Guess there's really just one way to do a ring buffer properly. :-)
Looking at #kbrimington's answer, an additional check could be added to this for IQuerable<T> to fall back to the approach that works well with Entity Framework -- assuming that what I have at this point does not.
Honestly I'm not super proud of the answer, but for small collections you could use the following:
var lastN = collection.Reverse().Take(n).Reverse();
A bit hacky but it does the job ;)
My solution is based on ranges, introduced in C# version 8.
public static IEnumerable<T> TakeLast<T>(this IEnumerable<T> source, int N)
{
return source.ToArray()[(source.Count()-N)..];
}
After running a benchmark with most rated solutions (and my humbly proposed solution):
public static class TakeLastExtension
{
public static IEnumerable<T> TakeLastMarkByers<T>(this IEnumerable<T> source, int takeCount)
{
if (source == null) { throw new ArgumentNullException("source"); }
if (takeCount < 0) { throw new ArgumentOutOfRangeException("takeCount", "must not be negative"); }
if (takeCount == 0) { yield break; }
T[] result = new T[takeCount];
int i = 0;
int sourceCount = 0;
foreach (T element in source)
{
result[i] = element;
i = (i + 1) % takeCount;
sourceCount++;
}
if (sourceCount < takeCount)
{
takeCount = sourceCount;
i = 0;
}
for (int j = 0; j < takeCount; ++j)
{
yield return result[(i + j) % takeCount];
}
}
public static IEnumerable<T> TakeLastKbrimington<T>(this IEnumerable<T> source, int N)
{
return source.Skip(Math.Max(0, source.Count() - N));
}
public static IEnumerable<T> TakeLastJamesCurran<T>(this IEnumerable<T> source, int N)
{
return source.Reverse().Take(N).Reverse();
}
public static IEnumerable<T> TakeLastAlex<T>(this IEnumerable<T> source, int N)
{
return source.ToArray()[(source.Count()-N)..];
}
}
Test
[MemoryDiagnoser]
public class TakeLastBenchmark
{
[Params(10000)]
public int N;
private readonly List<string> l = new();
[GlobalSetup]
public void Setup()
{
for (var i = 0; i < this.N; i++)
{
this.l.Add($"i");
}
}
[Benchmark]
public void Benchmark1_MarkByers()
{
var lastElements = l.TakeLastMarkByers(3).ToList();
}
[Benchmark]
public void Benchmark2_Kbrimington()
{
var lastElements = l.TakeLastKbrimington(3).ToList();
}
[Benchmark]
public void Benchmark3_JamesCurran()
{
var lastElements = l.TakeLastJamesCurran(3).ToList();
}
[Benchmark]
public void Benchmark4_Alex()
{
var lastElements = l.TakeLastAlex(3).ToList();
}
}
Program.cs:
var summary = BenchmarkRunner.Run(typeof(TakeLastBenchmark).Assembly);
Command dotnet run --project .\TestsConsole2.csproj -c Release --logBuildOutput
The results were following:
// * Summary *
BenchmarkDotNet=v0.13.2, OS=Windows 10 (10.0.19044.1889/21H2/November2021Update)
AMD Ryzen 5 5600X, 1 CPU, 12 logical and 6 physical cores
.NET SDK=6.0.401
[Host] : .NET 6.0.9 (6.0.922.41905), X64 RyuJIT AVX2
DefaultJob : .NET 6.0.9 (6.0.922.41905), X64 RyuJIT AVX2
Method
N
Mean
Error
StdDev
Gen0
Gen1
Allocated
Benchmark1_MarkByers
10000
89,390.53 ns
1,735.464 ns
1,704.457 ns
-
-
248 B
Benchmark2_Kbrimington
10000
46.15 ns
0.410 ns
0.363 ns
0.0076
-
128 B
Benchmark3_JamesCurran
10000
2,703.15 ns
46.298 ns
67.862 ns
4.7836
0.0038
80264 B
Benchmark4_Alex
10000
2,513.48 ns
48.661 ns
45.517 ns
4.7607
-
80152 B
Turns out the solution proposed by #Kbrimington to be the most efficient in terms of memory alloc as well as raw performance.
Below the real example how to take last 3 elements from a collection (array):
// split address by spaces into array
string[] adrParts = adr.Split(new string[] { " " },StringSplitOptions.RemoveEmptyEntries);
// take only 3 last items in array
adrParts = adrParts.SkipWhile((value, index) => { return adrParts.Length - index > 3; }).ToArray();
Using This Method To Get All Range Without Error
public List<T> GetTsRate( List<T> AllT,int Index,int Count)
{
List<T> Ts = null;
try
{
Ts = AllT.ToList().GetRange(Index, Count);
}
catch (Exception ex)
{
Ts = AllT.Skip(Index).ToList();
}
return Ts ;
}
Little different implementation with usage of circular buffer. The benchmarks show that the method is circa two times faster than ones using Queue (implementation of TakeLast in System.Linq), however not without a cost - it needs a buffer which grows along with the requested number of elements, even if you have a small collection you can get huge memory allocation.
public IEnumerable<T> TakeLast<T>(IEnumerable<T> source, int count)
{
int i = 0;
if (count < 1)
yield break;
if (source is IList<T> listSource)
{
if (listSource.Count < 1)
yield break;
for (i = listSource.Count < count ? 0 : listSource.Count - count; i < listSource.Count; i++)
yield return listSource[i];
}
else
{
bool move = true;
bool filled = false;
T[] result = new T[count];
using (var enumerator = source.GetEnumerator())
while (move)
{
for (i = 0; (move = enumerator.MoveNext()) && i < count; i++)
result[i] = enumerator.Current;
filled |= move;
}
if (filled)
for (int j = i; j < count; j++)
yield return result[j];
for (int j = 0; j < i; j++)
yield return result[j];
}
}
//detailed code for the problem
//suppose we have a enumerable collection 'collection'
var lastIndexOfCollection=collection.Count-1 ;
var nthIndexFromLast= lastIndexOfCollection- N;
var desiredCollection=collection.GetRange(nthIndexFromLast, N);
---------------------------------------------------------------------
// use this one liner
var desiredCollection=collection.GetRange((collection.Count-(1+N)), N);
I'm using .NET 3.5 and would like to be able to obtain every *n*th item from a List. I'm not bothered as to whether it's achieved using a lambda expression or LINQ.
Edit
Looks like this question provoked quite a lot of debate (which is a good thing, right?). The main thing I've learnt is that when you think you know every way to do something (even as simple as this), think again!
return list.Where((x, i) => i % nStep == 0);
I know it's "old school," but why not just use a for loop with stepping = n?
Sounds like
IEnumerator<T> GetNth<T>(List<T> list, int n) {
for (int i=0; i<list.Count; i+=n)
yield return list[i]
}
would do the trick. I do not see the need to use Linq or a lambda expressions.
EDIT:
Make it
public static class MyListExtensions {
public static IEnumerable<T> GetNth<T>(this List<T> list, int n) {
for (int i=0; i<list.Count; i+=n)
yield return list[i];
}
}
and you write in a LINQish way
from var element in MyList.GetNth(10) select element;
2nd Edit:
To make it even more LINQish
from var i in Range(0, ((myList.Length-1)/n)+1) select list[n*i];
You can use the Where overload which passes the index along with the element
var everyFourth = list.Where((x,i) => i % 4 == 0);
For Loop
for(int i = 0; i < list.Count; i += n)
//Nth Item..
I think if you provide a linq extension, you should be able to operate on the least specific interface, thus on IEnumerable. Of course, if you are up for speed especially for large N you might provide an overload for indexed access. The latter removes the need of iterating over large amounts of not needed data, and will be much faster than the Where clause. Providing both overloads lets the compiler select the most suitable variant.
public static class LinqExtensions
{
public static IEnumerable<T> GetNth<T>(this IEnumerable<T> list, int n)
{
if (n < 0)
throw new ArgumentOutOfRangeException("n");
if (n > 0)
{
int c = 0;
foreach (var e in list)
{
if (c % n == 0)
yield return e;
c++;
}
}
}
public static IEnumerable<T> GetNth<T>(this IList<T> list, int n)
{
if (n < 0)
throw new ArgumentOutOfRangeException("n");
if (n > 0)
for (int c = 0; c < list.Count; c += n)
yield return list[c];
}
}
I'm not sure if it's possible to do with a LINQ expression, but I know that you can use the Where extension method to do it. For example to get every fifth item:
List<T> list = originalList.Where((t,i) => (i % 5) == 0).ToList();
This will get the first item and every fifth from there. If you want to start at the fifth item instead of the first, you compare with 4 instead of comparing with 0.
Imho no answer is right. All solutions begins from 0. But I want to have the real nth element
public static IEnumerable<T> GetNth<T>(this IList<T> list, int n)
{
for (int i = n - 1; i < list.Count; i += n)
yield return list[i];
}
#belucha I like this, because the client code is very readable and the Compiler chooses the most efficient Implementation. I would build upon this by reducing the requirements to IReadOnlyList<T> and to save the Division for high-performance LINQ:
public static IEnumerable<T> GetNth<T>(this IEnumerable<T> list, int n) {
if (n <= 0) throw new ArgumentOutOfRangeException(nameof(n), n, null);
int i = n;
foreach (var e in list) {
if (++i < n) { //save Division
continue;
}
i = 0;
yield return e;
}
}
public static IEnumerable<T> GetNth<T>(this IReadOnlyList<T> list, int n
, int offset = 0) { //use IReadOnlyList<T>
if (n <= 0) throw new ArgumentOutOfRangeException(nameof(n), n, null);
for (var i = offset; i < list.Count; i += n) {
yield return list[i];
}
}
private static readonly string[] sequence = "1,2,3,4,5,6,7,8,9,10,11,12,13,14,15".Split(',');
static void Main(string[] args)
{
var every4thElement = sequence
.Where((p, index) => index % 4 == 0);
foreach (string p in every4thElement)
{
Console.WriteLine("{0}", p);
}
Console.ReadKey();
}
output