String.Where Comparatively Poor Performance - c#

I have two methods that take a string and remove any 'invalid' characters (characters contained in a hashset). One method uses Linq.Where, another uses a loop w/ char array.
The Linq method takes nearly twice as long (208756.9 ticks) as the loop (108688.2 ticks)
Linq:
string Linq(string field)
{
var c = field.Where(p => !hashChar.Contains(p));
return new string(c.ToArray());
}
Loop:
string CharArray(string field)
{
char[] c = new char[field.Length];
int count = 0;
for (int i = 0; i < field.Length; i++)
if (!hashChar.Contains(field[i]))
{
c[count] = field[i];
count++;
}
if (count == 0)
return field;
char[] f = new char[count];
Buffer.BlockCopy(c, 0, f, 0, count * sizeof(char));
return new string(f);
}
My expectation would be that LINQ would beat, or at least be comparable to, the loop method. The loop method isn't even optimized. I must be missing something here.
How does Linq.Where work under the hood, and why does it lose to my method?

If the source code of ToArray in Mono is any indication, your implementation wins because it performs fewer allocations (scroll down to line 2874 to see the method).
Like many methods of LINQ, the ToArray method contains separate code paths for collections and for other enumerables:
TSource[] array;
var collection = source as ICollection<TSource>;
if (collection != null) {
...
return array;
}
In your case, this branch is not taken, so the code proceeds to this loop:
int pos = 0;
array = EmptyOf<TSource>.Instance;
foreach (var element in source) {
if (pos == array.Length) {
if (pos == 0)
array = new TSource [4];
else
// If the number of returned character is significant,
// this method will be called multiple times
Array.Resize (ref array, pos * 2);
}
array[pos++] = element;
}
if (pos != array.Length)
Array.Resize (ref array, pos);
return array;
As you can see, LINQ's version may allocate and re-allocate the array several times. Your implementation, on the other hand, does just two allocations - the upfront one of the max size, and the final one, where the data is copied. That's why your code is faster.

Related

List<string> to List<List<string>> with linq or lambda [duplicate]

This question already has answers here:
Split List into Sublists with LINQ
(34 answers)
Closed 12 months ago.
I am attempting to split a list into a series of smaller lists.
My Problem: My function to split lists doesn't split them into lists of the correct size. It should split them into lists of size 30 but instead it splits them into lists of size 114?
How can I make my function split a list into X number of Lists of size 30 or less?
public static List<List<float[]>> splitList(List <float[]> locations, int nSize=30)
{
List<List<float[]>> list = new List<List<float[]>>();
for (int i=(int)(Math.Ceiling((decimal)(locations.Count/nSize))); i>=0; i--) {
List <float[]> subLocat = new List <float[]>(locations);
if (subLocat.Count >= ((i*nSize)+nSize))
subLocat.RemoveRange(i*nSize, nSize);
else subLocat.RemoveRange(i*nSize, subLocat.Count-(i*nSize));
Debug.Log ("Index: "+i.ToString()+", Size: "+subLocat.Count.ToString());
list.Add (subLocat);
}
return list;
}
If I use the function on a list of size 144 then the output is:
Index: 4, Size: 120
Index: 3, Size: 114
Index: 2, Size: 114
Index: 1, Size: 114
Index: 0, Size: 114
I would suggest to use this extension method to chunk the source list to the sub-lists by specified chunk size:
/// <summary>
/// Helper methods for the lists.
/// </summary>
public static class ListExtensions
{
public static List<List<T>> ChunkBy<T>(this List<T> source, int chunkSize)
{
return source
.Select((x, i) => new { Index = i, Value = x })
.GroupBy(x => x.Index / chunkSize)
.Select(x => x.Select(v => v.Value).ToList())
.ToList();
}
}
For example, if you chunk the list of 18 items by 5 items per chunk, it gives you the list of 4 sub-lists with the following items inside: 5-5-5-3.
NOTE: at the upcoming improvements to LINQ in .NET 6 chunking
will come out of the box like this:
const int PAGE_SIZE = 5;
IEnumerable<Movie[]> chunks = movies.Chunk(PAGE_SIZE);
public static List<List<float[]>> SplitList(List<float[]> locations, int nSize=30)
{
var list = new List<List<float[]>>();
for (int i = 0; i < locations.Count; i += nSize)
{
list.Add(locations.GetRange(i, Math.Min(nSize, locations.Count - i)));
}
return list;
}
Generic version:
public static IEnumerable<List<T>> SplitList<T>(List<T> locations, int nSize=30)
{
for (int i = 0; i < locations.Count; i += nSize)
{
yield return locations.GetRange(i, Math.Min(nSize, locations.Count - i));
}
}
how about:
while(locations.Any())
{
list.Add(locations.Take(nSize).ToList());
locations= locations.Skip(nSize).ToList();
}
Library MoreLinq have method called Batch
List<int> ids = new List<int>() { 1, 2, 3, 4, 5, 6, 7, 8, 9, 0 }; // 10 elements
int counter = 1;
foreach(var batch in ids.Batch(2))
{
foreach(var eachId in batch)
{
Console.WriteLine("Batch: {0}, Id: {1}", counter, eachId);
}
counter++;
}
Result is
Batch: 1, Id: 1
Batch: 1, Id: 2
Batch: 2, Id: 3
Batch: 2, Id: 4
Batch: 3, Id: 5
Batch: 3, Id: 6
Batch: 4, Id: 7
Batch: 4, Id: 8
Batch: 5, Id: 9
Batch: 5, Id: 0
ids are splitted into 5 chunks with 2 elements.
Update for .NET 6
var originalList = new List<int>{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}
// split into arrays of no more than three
IEnumerable<int[]> chunks = originalList.Chunk(3);
Prior to .NET 6
public static IEnumerable<IEnumerable<T>> SplitIntoSets<T>
(this IEnumerable<T> source, int itemsPerSet)
{
var sourceList = source as List<T> ?? source.ToList();
for (var index = 0; index < sourceList.Count; index += itemsPerSet)
{
yield return sourceList.Skip(index).Take(itemsPerSet);
}
}
Serj-Tm solution is fine, also this is the generic version as extension method for lists (put it into a static class):
public static List<List<T>> Split<T>(this List<T> items, int sliceSize = 30)
{
List<List<T>> list = new List<List<T>>();
for (int i = 0; i < items.Count; i += sliceSize)
list.Add(items.GetRange(i, Math.Min(sliceSize, items.Count - i)));
return list;
}
I find accepted answer (Serj-Tm) most robust, but I'd like to suggest a generic version.
public static List<List<T>> splitList<T>(List<T> locations, int nSize = 30)
{
var list = new List<List<T>>();
for (int i = 0; i < locations.Count; i += nSize)
{
list.Add(locations.GetRange(i, Math.Min(nSize, locations.Count - i)));
}
return list;
}
Addition after very useful comment of mhand at the end
Original answer
Although most solutions might work, I think they are not very efficiently. Suppose if you only want the first few items of the first few chunks. Then you wouldn't want to iterate over all (zillion) items in your sequence.
The following will at utmost enumerate twice: once for the Take and once for the Skip. It won't enumerate over any more elements than you will use:
public static IEnumerable<IEnumerable<TSource>> ChunkBy<TSource>
(this IEnumerable<TSource> source, int chunkSize)
{
while (source.Any()) // while there are elements left
{ // still something to chunk:
yield return source.Take(chunkSize); // return a chunk of chunkSize
source = source.Skip(chunkSize); // skip the returned chunk
}
}
How many times will this Enumerate the sequence?
Suppose you divide your source into chunks of chunkSize. You enumerate only the first N chunks. From every enumerated chunk you'll only enumerate the first M elements.
While(source.Any())
{
...
}
the Any will get the Enumerator, do 1 MoveNext() and returns the returned value after Disposing the Enumerator. This will be done N times
yield return source.Take(chunkSize);
According to the reference source this will do something like:
public static IEnumerable<TSource> Take<TSource>(this IEnumerable<TSource> source, int count)
{
return TakeIterator<TSource>(source, count);
}
static IEnumerable<TSource> TakeIterator<TSource>(IEnumerable<TSource> source, int count)
{
foreach (TSource element in source)
{
yield return element;
if (--count == 0) break;
}
}
This doesn't do a lot until you start enumerating over the fetched Chunk. If you fetch several Chunks, but decide not to enumerate over the first Chunk, the foreach is not executed, as your debugger will show you.
If you decide to take the first M elements of the first chunk then the yield return is executed exactly M times. This means:
get the enumerator
call MoveNext() and Current M times.
Dispose the enumerator
After the first chunk has been yield returned, we skip this first Chunk:
source = source.Skip(chunkSize);
Once again: we'll take a look at reference source to find the skipiterator
static IEnumerable<TSource> SkipIterator<TSource>(IEnumerable<TSource> source, int count)
{
using (IEnumerator<TSource> e = source.GetEnumerator())
{
while (count > 0 && e.MoveNext()) count--;
if (count <= 0)
{
while (e.MoveNext()) yield return e.Current;
}
}
}
As you see, the SkipIterator calls MoveNext() once for every element in the Chunk. It doesn't call Current.
So per Chunk we see that the following is done:
Any(): GetEnumerator; 1 MoveNext(); Dispose Enumerator;
Take():
nothing if the content of the chunk is not enumerated.
If the content is enumerated: GetEnumerator(), one MoveNext and one Current per enumerated item, Dispose enumerator;
Skip(): for every chunk that is enumerated (NOT the contents of the chunk):
GetEnumerator(), MoveNext() chunkSize times, no Current! Dispose enumerator
If you look at what happens with the enumerator, you'll see that there are a lot of calls to MoveNext(), and only calls to Current for the TSource items you actually decide to access.
If you take N Chunks of size chunkSize, then calls to MoveNext()
N times for Any()
not yet any time for Take, as long as you don't enumerate the Chunks
N times chunkSize for Skip()
If you decide to enumerate only the first M elements of every fetched chunk, then you need to call MoveNext M times per enumerated Chunk.
The total
MoveNext calls: N + N*M + N*chunkSize
Current calls: N*M; (only the items you really access)
So if you decide to enumerate all elements of all chunks:
MoveNext: numberOfChunks + all elements + all elements = about twice the sequence
Current: every item is accessed exactly once
Whether MoveNext is a lot of work or not, depends on the type of source sequence. For lists and arrays it is a simple index increment, with maybe an out of range check.
But if your IEnumerable is the result of a database query, make sure that the data is really materialized on your computer, otherwise the data will be fetched several times. DbContext and Dapper will properly transfer the data to local process before it can be accessed. If you enumerate the same sequence several times it is not fetched several times. Dapper returns an object that is a List, DbContext remembers that the data is already fetched.
It depends on your Repository whether it is wise to call AsEnumerable() or ToLists() before you start to divide the items in Chunks
While plenty of the answers above do the job, they all fail horribly on a never ending sequence (or a really long sequence). The following is a completely on-line implementation which guarantees best time and memory complexity possible. We only iterate the source enumerable exactly once and use yield return for lazy evaluation. The consumer could throw away the list on each iteration making the memory footprint equal to that of the list w/ batchSize number of elements.
public static IEnumerable<List<T>> BatchBy<T>(this IEnumerable<T> enumerable, int batchSize)
{
using (var enumerator = enumerable.GetEnumerator())
{
List<T> list = null;
while (enumerator.MoveNext())
{
if (list == null)
{
list = new List<T> {enumerator.Current};
}
else if (list.Count < batchSize)
{
list.Add(enumerator.Current);
}
else
{
yield return list;
list = new List<T> {enumerator.Current};
}
}
if (list?.Count > 0)
{
yield return list;
}
}
}
EDIT: Just now realizing the OP asks about breaking a List<T> into smaller List<T>, so my comments regarding infinite enumerables aren't applicable to the OP, but may help others who end up here. These comments were in response to other posted solutions that do use IEnumerable<T> as an input to their function, yet enumerate the source enumerable multiple times.
I have a generic method that would take any types include float, and it's been unit-tested, hope it helps:
/// <summary>
/// Breaks the list into groups with each group containing no more than the specified group size
/// </summary>
/// <typeparam name="T"></typeparam>
/// <param name="values">The values.</param>
/// <param name="groupSize">Size of the group.</param>
/// <returns></returns>
public static List<List<T>> SplitList<T>(IEnumerable<T> values, int groupSize, int? maxCount = null)
{
List<List<T>> result = new List<List<T>>();
// Quick and special scenario
if (values.Count() <= groupSize)
{
result.Add(values.ToList());
}
else
{
List<T> valueList = values.ToList();
int startIndex = 0;
int count = valueList.Count;
int elementCount = 0;
while (startIndex < count && (!maxCount.HasValue || (maxCount.HasValue && startIndex < maxCount)))
{
elementCount = (startIndex + groupSize > count) ? count - startIndex : groupSize;
result.Add(valueList.GetRange(startIndex, elementCount));
startIndex += elementCount;
}
}
return result;
}
As of .NET 6.0, you can use the LINQ extension Chunk<T>() to split enumerations into chunks. Docs
var chars = new List<char>() { 'h', 'e', 'l', 'l', 'o', 'w','o','r' ,'l','d' };
foreach (var batch in chars.Chunk(2))
{
foreach (var ch in batch)
{
// iterates 2 letters at a time
}
}
public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> items, int maxItems)
{
return items.Select((item, index) => new { item, index })
.GroupBy(x => x.index / maxItems)
.Select(g => g.Select(x => x.item));
}
How about this one? The idea was to use only one loop. And, who knows, maybe you're using only IList implementations thorough your code and you don't want to cast to List.
private IEnumerable<IList<T>> SplitList<T>(IList<T> list, int totalChunks)
{
IList<T> auxList = new List<T>();
int totalItems = list.Count();
if (totalChunks <= 0)
{
yield return auxList;
}
else
{
for (int i = 0; i < totalItems; i++)
{
auxList.Add(list[i]);
if ((i + 1) % totalChunks == 0)
{
yield return auxList;
auxList = new List<T>();
}
else if (i == totalItems - 1)
{
yield return auxList;
}
}
}
}
In .NET 6 you can just use source.Chunk(chunkSize)
A more generic version based on the accepted answer by Serj-Tm.
public static IEnumerable<IEnumerable<T>> Split<T>(IEnumerable<T> source, int size = 30)
{
var count = source.Count();
for (int i = 0; i < count; i += size)
{
yield return source
.Skip(Math.Min(size, count - i))
.Take(size);
}
}
One more
public static IList<IList<T>> SplitList<T>(this IList<T> list, int chunkSize)
{
var chunks = new List<IList<T>>();
List<T> chunk = null;
for (var i = 0; i < list.Count; i++)
{
if (i % chunkSize == 0)
{
chunk = new List<T>(chunkSize);
chunks.Add(chunk);
}
chunk.Add(list[i]);
}
return chunks;
}
public static List<List<T>> ChunkBy<T>(this List<T> source, int chunkSize)
{
var result = new List<List<T>>();
for (int i = 0; i < source.Count; i += chunkSize)
{
var rows = new List<T>();
for (int j = i; j < i + chunkSize; j++)
{
if (j >= source.Count) break;
rows.Add(source[j]);
}
result.Add(rows);
}
return result;
}
I had encountered this same need, and I used a combination of Linq's Skip() and Take() methods. I multiply the number I take by the number of iterations this far, and that gives me the number of items to skip, then I take the next group.
var categories = Properties.Settings.Default.MovementStatsCategories;
var items = summariesWithinYear
.Select(s => s.sku).Distinct().ToList();
//need to run by chunks of 10,000
var count = items.Count;
var counter = 0;
var numToTake = 10000;
while (count > 0)
{
var itemsChunk = items.Skip(numToTake * counter).Take(numToTake).ToList();
counter += 1;
MovementHistoryUtilities.RecordMovementHistoryStatsBulk(itemsChunk, categories, nLogger);
count -= numToTake;
}
Based on Dimitry Pavlov answere I would remove .ToList(). And also avoid the anonymous class.
Instead I like to use a struct which does not require a heap memory allocation. (A ValueTuple would also do job.)
public static IEnumerable<IEnumerable<TSource>> ChunkBy<TSource>(this IEnumerable<TSource> source, int chunkSize)
{
if (source is null)
{
throw new ArgumentNullException(nameof(source));
}
if (chunkSize <= 0)
{
throw new ArgumentOutOfRangeException(nameof(chunkSize), chunkSize, "The argument must be greater than zero.");
}
return source
.Select((x, i) => new ChunkedValue<TSource>(x, i / chunkSize))
.GroupBy(cv => cv.ChunkIndex)
.Select(g => g.Select(cv => cv.Value));
}
[StructLayout(LayoutKind.Auto)]
[DebuggerDisplay("{" + nameof(ChunkedValue<T>.ChunkIndex) + "}: {" + nameof(ChunkedValue<T>.Value) + "}")]
private struct ChunkedValue<T>
{
public ChunkedValue(T value, int chunkIndex)
{
this.ChunkIndex = chunkIndex;
this.Value = value;
}
public int ChunkIndex { get; }
public T Value { get; }
}
This can be used like the following which only iterates over the collection once and
also does not allocate any significant memory.
int chunkSize = 30;
foreach (var chunk in collection.ChunkBy(chunkSize))
{
foreach (var item in chunk)
{
// your code for item here.
}
}
If a concrete list is actually needed then I would do it like this:
int chunkSize = 30;
var chunkList = new List<List<T>>();
foreach (var chunk in collection.ChunkBy(chunkSize))
{
// create a list with the correct capacity to be able to contain one chunk
// to avoid the resizing (additional memory allocation and memory copy) within the List<T>.
var list = new List<T>(chunkSize);
list.AddRange(chunk);
chunkList.Add(list);
}
List<int> orginalList =new List<int>(){1,2,3,4,5,6,7,8,9,10,12};
Dictionary<int,List<int>> dic = new Dictionary <int,List<int>> ();
int batchcount = orginalList.Count/2; //To List into two 2 parts if you
want three give three
List<int> lst = new List<int>();
for (int i=0;i<orginalList.Count; i++)
{
lst.Add(orginalList[i]);
if (i % batchCount == 0 && i!=0)
{
Dic.Add(threadId, lst);
lst = new List<int>();**strong text**
threadId++;
}
}
if(lst.Count>0)
Dic.Add(threadId, lst); //in case if any dayleft
foreach(int BatchId in Dic.Keys)
{
Console.Writeline("BatchId:"+BatchId);
Console.Writeline('Batch Count:"+Dic[BatchId].Count);
}
in case you wanna split it with condition instead of fixed number :
///<summary>
/// splits a list based on a condition (similar to the split function for strings)
///</summary>
public static IEnumerable<List<T>> Split<T>(this IEnumerable<T> src, Func<T, bool> pred)
{
var list = new List<T>();
foreach(T item in src)
{
if(pred(item))
{
if(list != null && list.Count > 0)
yield return list;
list = new List<T>();
}
else
{
list.Add(item);
}
}
}
You can simply try the following code with only using LINQ :
public static IList<IList<T>> Split<T>(IList<T> source)
{
return source
.Select((x, i) => new { Index = i, Value = x })
.GroupBy(x => x.Index / 3)
.Select(x => x.Select(v => v.Value).ToList())
.ToList();
}

Switch 2 spots in a list [duplicate]

Is there a LINQ way to swap the position of two items inside a List<T>?
Check the answer from Marc from C#: Good/best implementation of Swap method.
public static void Swap<T>(IList<T> list, int indexA, int indexB)
{
T tmp = list[indexA];
list[indexA] = list[indexB];
list[indexB] = tmp;
}
which can be linq-i-fied like
public static IList<T> Swap<T>(this IList<T> list, int indexA, int indexB)
{
T tmp = list[indexA];
list[indexA] = list[indexB];
list[indexB] = tmp;
return list;
}
var lst = new List<int>() { 8, 3, 2, 4 };
lst = lst.Swap(1, 2);
Maybe someone will think of a clever way to do this, but you shouldn't. Swapping two items in a list is inherently side-effect laden but LINQ operations should be side-effect free. Thus, just use a simple extension method:
static class IListExtensions {
public static void Swap<T>(
this IList<T> list,
int firstIndex,
int secondIndex
) {
Contract.Requires(list != null);
Contract.Requires(firstIndex >= 0 && firstIndex < list.Count);
Contract.Requires(secondIndex >= 0 && secondIndex < list.Count);
if (firstIndex == secondIndex) {
return;
}
T temp = list[firstIndex];
list[firstIndex] = list[secondIndex];
list[secondIndex] = temp;
}
}
List<T> has a Reverse() method, however it only reverses the order of two (or more) consecutive items.
your_list.Reverse(index, 2);
Where the second parameter 2 indicates we are reversing the order of 2 items, starting with the item at the given index.
Source: https://msdn.microsoft.com/en-us/library/hf2ay11y(v=vs.110).aspx
Starting with C# 7 you can do
public static IList<T> Swap<T>(IList<T> list, int indexA, int indexB)
{
(list[indexA], list[indexB]) = (list[indexB], list[indexA]);
return list;
}
There is no existing Swap-method, so you have to create one yourself. Of course you can linqify it, but that has to be done with one (unwritten?) rules in mind: LINQ-operations do not change the input parameters!
In the other "linqify" answers, the (input) list is modified and returned, but this action brakes that rule. If would be weird if you have a list with unsorted items, do a LINQ "OrderBy"-operation and than discover that the input list is also sorted (just like the result). This is not allowed to happen!
So... how do we do this?
My first thought was just to restore the collection after it was finished iterating. But this is a dirty solution, so do not use it:
static public IEnumerable<T> Swap1<T>(this IList<T> source, int index1, int index2)
{
// Parameter checking is skipped in this example.
// Swap the items.
T temp = source[index1];
source[index1] = source[index2];
source[index2] = temp;
// Return the items in the new order.
foreach (T item in source)
yield return item;
// Restore the collection.
source[index2] = source[index1];
source[index1] = temp;
}
This solution is dirty because it does modify the input list, even if it restores it to the original state. This could cause several problems:
The list could be readonly which will throw an exception.
If the list is shared by multiple threads, the list will change for the other threads during the duration of this function.
If an exception occurs during the iteration, the list will not be restored. (This could be resolved to write an try-finally inside the Swap-function, and put the restore-code inside the finally-block).
There is a better (and shorter) solution: just make a copy of the original list. (This also makes it possible to use an IEnumerable as a parameter, instead of an IList):
static public IEnumerable<T> Swap2<T>(this IList<T> source, int index1, int index2)
{
// Parameter checking is skipped in this example.
// If nothing needs to be swapped, just return the original collection.
if (index1 == index2)
return source;
// Make a copy.
List<T> copy = source.ToList();
// Swap the items.
T temp = copy[index1];
copy[index1] = copy[index2];
copy[index2] = temp;
// Return the copy with the swapped items.
return copy;
}
One disadvantage of this solution is that it copies the entire list which will consume memory and that makes the solution rather slow.
You might consider the following solution:
static public IEnumerable<T> Swap3<T>(this IList<T> source, int index1, int index2)
{
// Parameter checking is skipped in this example.
// It is assumed that index1 < index2. Otherwise a check should be build in and both indexes should be swapped.
using (IEnumerator<T> e = source.GetEnumerator())
{
// Iterate to the first index.
for (int i = 0; i < index1; i++)
yield return source[i];
// Return the item at the second index.
yield return source[index2];
if (index1 != index2)
{
// Return the items between the first and second index.
for (int i = index1 + 1; i < index2; i++)
yield return source[i];
// Return the item at the first index.
yield return source[index1];
}
// Return the remaining items.
for (int i = index2 + 1; i < source.Count; i++)
yield return source[i];
}
}
And if you want to input parameter to be IEnumerable:
static public IEnumerable<T> Swap4<T>(this IEnumerable<T> source, int index1, int index2)
{
// Parameter checking is skipped in this example.
// It is assumed that index1 < index2. Otherwise a check should be build in and both indexes should be swapped.
using(IEnumerator<T> e = source.GetEnumerator())
{
// Iterate to the first index.
for(int i = 0; i < index1; i++)
{
if (!e.MoveNext())
yield break;
yield return e.Current;
}
if (index1 != index2)
{
// Remember the item at the first position.
if (!e.MoveNext())
yield break;
T rememberedItem = e.Current;
// Store the items between the first and second index in a temporary list.
List<T> subset = new List<T>(index2 - index1 - 1);
for (int i = index1 + 1; i < index2; i++)
{
if (!e.MoveNext())
break;
subset.Add(e.Current);
}
// Return the item at the second index.
if (e.MoveNext())
yield return e.Current;
// Return the items in the subset.
foreach (T item in subset)
yield return item;
// Return the first (remembered) item.
yield return rememberedItem;
}
// Return the remaining items in the list.
while (e.MoveNext())
yield return e.Current;
}
}
Swap4 also makes a copy of (a subset of) the source. So worst case scenario, it is as slow and memory consuming as function Swap2.
If order matters, you should keep a property on the "T" objects in your list that denotes sequence. In order to swap them, just swap the value of that property, and then use that in the .Sort(comparison with sequence property)

How can I decrease the amount of memory and operations involved in the ""Find all the pairs that add to N" interview challenge?

I was asked in an interview to write a function for finding all pairs of ints in an array that add up to N. My answer was kinda bulky:
HashSet<Tuple<int,int>> PairsThatSumToN ( int [] arr, int N )
{
HashSet<int> arrhash = new HashShet<int> (arr);
HashSet<Tuple<int,int>> result = new HashSet<Tuple<int,int>>();
for ( int i in arrhash )
{
int j = N - i;
if ( arrhash.Contains(j) ) result.Add(new Tuple<int,int> (i,j));
}
return result;
}
I'm a beginner to C#, come from a C++ background, and I have a few questions about how to make this better:
Is it innefficient to iterate through a HashSet? In other words, would my procedure be more efficient (although less compact) if I changed it to
HashSet<Tuple<int,int>> PairsThatSumToN ( int [] arr, int N )
{
HashSet<int> arrhash = new HashShet<int> ();
HashSet<Tuple<int,int>> result = new HashSet<Tuple<int,int>>();
for ( int i in arr )
{
int j = N - i;
if ( arrhash.Contains(j) ) result.Add(new Type<int,int> (i,j));
arrHash.Add(i);
}
return result;
}
?????
I realize that Add is more like an "Add if not already in there", so I have a useless operation whenever I run result.Add(new Tuple<int,int> (i,j)) for an i,j pair that is already in the set. The more repeated pairs in the array, the more useless operations, and there's all the overhead of allocating the new Tuple that may never be used. Is there a way to optimize this by checking whether the pair i,j is a Tuple in the set before creating a new Tuple out of said pair and trying to add it?
Speaking of the above allocation of a new Tuple on the heap, do I need to free this memory if I don't end up adding that Tuple to the result? Potential memory leak here?
There has to be some way of combining the two sets
HashSet<int> arrhash = new HashShet<int> (arr);
HashSet<Tuple<int,int>> result = new HashSet<Tuple<int,int>>();
In a sense, they contain redundant information since every int in the second one is also in the first one. Something feels "wrong" about having to sets here, yet I can't think of a better way to do it.
Better yet, does the .NET library have any way of doing a 1-line solution for the problem? ;)
Paging Dr. Skeet.
This is what I would try
public Dictionary<int, int> Pairs(int[] arr, int N)
{
// int N asssumes no arr > int32 max / 2
int len = arr.Length < N ? arr.Length / 2 : N / 2;
Dictionary<int, int> d = new Dictionary<int, int>(len);
// add is O(1) if count <= capacity
if(arr.Length == 0) return d;
Array.Sort(arr); // so it is O(n log n) I still take my chances with it
// that is n * log(n)
int start = 0;
int end = arr.Length - 1;
do
{
int ttl = arr[start] + arr[end];
if (ttl == N)
{
if(!d.ContainsKey(arr[start]))
d.Add(arr[start], arr[end]);
// if start <= end then pair uniquely defined by either
// and a perfect hash (zero collisions)
start++;
end--;
}
else if (ttl > N)
end--;
else
start++;
if(start >= end)
return d;
} while (true);
}
Even with a HashSet based solution still use Dictionary(N/2) with Key <= Value
Or use Dictionary(arr.Length / 2)
If you need a neat solution for your problem, here it is, implemented with LINQ.
The performance however, is 4 times worse than your second solution.
Since you have asked for a one liner, here it is anyway.
NOTE: I would appreciate any improvements especially to get rid of that Distinct() since it takes the 50% of the overall cpu time
static List<Pair> PairsThatSumToN(int[] arr, int N)
{
return
(
from x in arr join y in arr on N - x equals y select new Pair(x, y)
)
.Distinct()
.ToList();
}
public class Pair : Tuple<int, int>
{
public Pair(int item1, int item2) : base(item1, item2) { }
public override bool Equals(object pair)
{
Pair dest = pair as Pair;
return dest.Item1 == Item1 || dest.Item2 == Item1;
}
public override int GetHashCode()
{
return Item1 + Item2;
}
}
First of all HashSet removes duplicate items. So iterating through HashSet or Array may yield different results since the array may have duplicate items.
Iterating through HashSet is ok. but note that it should not be used for only iterating purpose. BTW using HashSet is best option here because of O(1) for finding items.
Tuples are compared by reference inside HashSet. That means two different tuples with same items are never equal by default. since they always have different reference. (Sorry my mistake.) it seems tuples are compared by their items. But it compares only x.item1 to y.item1 and x.item2 to y.item2. so 1,2 and 2,1 are not equal. you can make them equal by setting another IEqualityComparer to hashset.
You should not be worry about memory leaks. when HashSet fails to add tuple the garbage collector will remove that tuple when the reference of that tuple is gone. Not immediately but when its needed.
static HashSet<Tuple<int, int>> PairsThatSumToN(int[] arr, int N)
{
HashSet<int> hash = new HashSet<int>(arr);
HashSet<Tuple<int, int>> result = new HashSet<Tuple<int, int>>(new IntTupleComparer());
foreach(int i in arr)
{
int j = N - i;
if (hash.Contains(j)) result.Add(new Tuple<int, int>(i, j));
}
return result;
}
public class IntTupleComparer : IEqualityComparer<Tuple<int, int>>
{
public bool Equals(Tuple<int, int> x, Tuple<int, int> y)
{
return (x.Item1 == y.Item1 && x.Item2 == y.Item2) || (x.Item1 == y.Item2 && x.Item2 == y.Item1);
}
public int GetHashCode(Tuple<int, int> obj)
{
return (obj.Item1 + obj.Item2).GetHashCode();
}
}
If the input set contains unique numbers, or the function must return only unique pairs, I think your second algorithm is the best. Just the result doesn't need to be a HashSet<Tuple<int, int>>, because the uniqueness is guaranteed by the algorithm - a simple List<Tuple<int, int>> would do the same, and better abstraction would be IEnumerable<Tuple<int, int>>. Here is how it looks implemented with C# iterator function:
static IEnumerable<Tuple<int, int>> UniquePairsThatSumToN(int[] source, int N)
{
var set = new HashSet<int>();
for (int i = 0; i < source.Length; i++)
{
var a = source[i];
var b = N - a;
if (set.Add(a) && set.Contains(b))
yield return Tuple.Create(b, a);
}
}
The key point is the line if (set.Add(a) && set.Contains(b)). Since both HashSet<T>.Add and HashSet<T>.Contains are O(1), the whole algorithm is therefore O(N).
With a relatively small modification we can make a function that returns all pairs (not only unique) like this
static IEnumerable<Tuple<int, int>> AllPairsThatSumToN(int[] source, int N)
{
var countMap = new Dictionary<int, int>(source.Length);
for (int i = 0; i < source.Length; i++)
{
var a = source[i];
var b = N - a;
int countA;
countMap.TryGetValue(a, out countA);
countMap[a] = ++countA;
int countB;
if (countMap.TryGetValue(b, out countB))
while (--countB >= 0)
yield return Tuple.Create(b, a);
}
}

Extract the k maximum elements of a list

Let's say I have a collection of some type, e.g.
IEnumerable<double> values;
Now I need to extract the k highest values from that collection, for some parameter k. This is a very simple way to do this:
values.OrderByDescending(x => x).Take(k)
However, this (if I understand this correctly) first sorts the entire list, then picks the first k elements. But if the list is very large, and k is comparatively small (smaller than log n), this is not very efficient - the list is sorted in O(nlog n), but I figure selecting the k highest values from a list should be more like O(nk).
So, does anyone have any suggestion for a better, more efficient way to do this?
This gives a bit of a performance increase. Note that it's ascending rather than descending but you should be able to repurpose it (see comments):
static IEnumerable<double> TopNSorted(this IEnumerable<double> source, int n)
{
List<double> top = new List<double>(n + 1);
using (var e = source.GetEnumerator())
{
for (int i = 0; i < n; i++)
{
if (e.MoveNext())
top.Add(e.Current);
else
throw new InvalidOperationException("Not enough elements");
}
top.Sort();
while (e.MoveNext())
{
double c = e.Current;
int index = top.BinarySearch(c);
if (index < 0) index = ~index;
if (index < n) // if (index != 0)
{
top.Insert(index, c);
top.RemoveAt(n); // top.RemoveAt(0)
}
}
}
return top; // return ((IEnumerable<double>)top).Reverse();
}
Consider the below method:
static IEnumerable<double> GetTopValues(this IEnumerable<double> values, int count)
{
var maxSet = new List<double>(Enumerable.Repeat(double.MinValue, count));
var currentMin = double.MinValue;
foreach (var t in values)
{
if (t <= currentMin) continue;
maxSet.Remove(currentMin);
maxSet.Add(t);
currentMin = maxSet.Min();
}
return maxSet.OrderByDescending(i => i);
}
And the test program:
static void Main()
{
const int SIZE = 1000000;
const int K = 10;
var random = new Random();
var values = new double[SIZE];
for (var i = 0; i < SIZE; i++)
values[i] = random.NextDouble();
// Test values
values[SIZE/2] = 2.0;
values[SIZE/4] = 3.0;
values[SIZE/8] = 4.0;
IEnumerable<double> result;
var stopwatch = new Stopwatch();
stopwatch.Start();
result = values.OrderByDescending(x => x).Take(K).ToArray();
stopwatch.Stop();
Console.WriteLine(stopwatch.ElapsedMilliseconds);
stopwatch.Restart();
result = values.GetTopValues(K).ToArray();
stopwatch.Stop();
Console.WriteLine(stopwatch.ElapsedMilliseconds);
}
On my machine results are 1002 and 14.
Another way of doing this (haven't been around C# for years, so pseudo-code it is, sorry) would be:
highestList = []
lowestValueOfHigh = 0
for every item in the list
if(lowestValueOfHigh > item) {
delete highestList[highestList.length - 1] from list
do insert into list with binarysearch
if(highestList[highestList.length - 1] > lowestValueOfHigh)
lowestValueOfHigh = highestList[highestList.length - 1]
}
I wouldn't state anything about performance without profiling. In this answer I'll just try to implement O(n*k) take-one-enumeration-for-one-max-value approach. Personally I think that ordering approach is superior. Anyway:
public static IEnumerable<double> GetMaxElements(this IEnumerable<double> source)
{
var usedIndices = new HashSet<int>();
while (true)
{
var enumerator = source.GetEnumerator();
int index = 0;
int maxIndex = 0;
double? maxValue = null;
while(enumerator.MoveNext())
{
if((!maxValue.HasValue||enumerator.Current>maxValue)&&!usedIndices.Contains(index))
{
maxValue = enumerator.Current;
maxIndex = index;
}
index++;
}
usedIndices.Add(maxIndex);
if (!maxValue.HasValue) break;
yield return maxValue.Value;
}
}
Usage:
var biggestElements = values.GetMaxElements().Take(3);
Downsides:
Method assumes that source IEnumerable has an order
Method uses additional memory/operations to save used indices.
Advantage:
You can be sure that it takes one enumeration to get next max value.
See it running
Here is a Linqy TopN operator for enumerable sequences, based on the PriorityQueue<TElement, TPriority> collection:
/// <summary>
/// Selects the top N elements from the source sequence. The selected elements
/// are returned in descending order.
/// </summary>
public static IEnumerable<T> TopN<T>(this IEnumerable<T> source, int n,
IComparer<T> comparer = default)
{
ArgumentNullException.ThrowIfNull(source);
if (n < 1) throw new ArgumentOutOfRangeException(nameof(n));
PriorityQueue<bool, T> top = new(comparer);
foreach (var item in source)
{
if (top.Count < n)
top.Enqueue(default, item);
else
top.EnqueueDequeue(default, item);
}
List<T> topList = new(top.Count);
while (top.TryDequeue(out _, out var item)) topList.Add(item);
for (int i = topList.Count - 1; i >= 0; i--) yield return topList[i];
}
Usage example:
IEnumerable<double> topValues = values.TopN(k);
The topValues sequence contains the k maximum values in the values, in descending order. In case there are duplicate values in the topValues, the order of the equal values is undefined (non-stable sort).
For a SortedSet<T>-based implementation that compiles on .NET versions earlier than .NET 6, you could look at the 5th revision of this answer.
An operator PartialSort with similar functionality exists in the MoreLinq package. It's not implemented optimally though (source code). It performs invariably a binary search for each item, instead of comparing it with the smallest item in the top list, resulting in many more comparisons than necessary.
Surprisingly the LINQ itself is well optimized for the OrderByDescending+Take combination, resulting in excellent performance. It's only slightly slower than the TopN operator above. This applies to all versions of the .NET Core and later (.NET 5 and .NET 6). It doesn't apply to the .NET Framework platform, where the complexity is O(n*log n) as expected.
A demo that compares 4 different approaches can be found here. It compares:
values.OrderByDescending(x => x).Take(k).
values.OrderByDescending(x => x).HideIdentity().Take(k), where HideIdentity is a trivial LINQ propagator that hides the identity of the underlying enumerable, and so it effectively disables the LINQ optimizations.
values.PartialSort(k, MoreLinq.OrderByDirection.Descending) (MoreLinq).
values.TopN(k)
Below is a typical output of the demo, running in Release mode on .NET 6:
.NET 6.0.0-rtm.21522.10
Extract the 100 maximum elements from 2,000,000 random values, and calculate the sum.
OrderByDescending+Take Duration: 156 msec, Comparisons: 3,129,640, Sum: 99.997344
OrderByDescending+HideIdentity+Take Duration: 1,415 msec, Comparisons: 48,602,298, Sum: 99.997344
MoreLinq.PartialSort Duration: 277 msec, Comparisons: 13,999,582, Sum: 99.997344
TopN Duration: 62 msec, Comparisons: 2,013,207, Sum: 99.997344

Using Linq to get the last N elements of a collection?

Given a collection, is there a way to get the last N elements of that collection? If there isn't a method in the framework, what would be the best way to write an extension method to do this?
collection.Skip(Math.Max(0, collection.Count() - N));
This approach preserves item order without a dependency on any sorting, and has broad compatibility across several LINQ providers.
It is important to take care not to call Skip with a negative number. Some providers, such as the Entity Framework, will produce an ArgumentException when presented with a negative argument. The call to Math.Max avoids this neatly.
The class below has all of the essentials for extension methods, which are: a static class, a static method, and use of the this keyword.
public static class MiscExtensions
{
// Ex: collection.TakeLast(5);
public static IEnumerable<T> TakeLast<T>(this IEnumerable<T> source, int N)
{
return source.Skip(Math.Max(0, source.Count() - N));
}
}
A brief note on performance:
Because the call to Count() can cause enumeration of certain data structures, this approach has the risk of causing two passes over the data. This isn't really a problem with most enumerables; in fact, optimizations exist already for Lists, Arrays, and even EF queries to evaluate the Count() operation in O(1) time.
If, however, you must use a forward-only enumerable and would like to avoid making two passes, consider a one-pass algorithm like Lasse V. Karlsen or Mark Byers describe. Both of these approaches use a temporary buffer to hold items while enumerating, which are yielded once the end of the collection is found.
coll.Reverse().Take(N).Reverse().ToList();
public static IEnumerable<T> TakeLast<T>(this IEnumerable<T> coll, int N)
{
return coll.Reverse().Take(N).Reverse();
}
UPDATE: To address clintp's problem: a) Using the TakeLast() method I defined above solves the problem, but if you really want the do it without the extra method, then you just have to recognize that while Enumerable.Reverse() can be used as an extension method, you aren't required to use it that way:
List<string> mystring = new List<string>() { "one", "two", "three" };
mystring = Enumerable.Reverse(mystring).Take(2).Reverse().ToList();
.NET Core 2.0+ provides the LINQ method TakeLast():
https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable.takelast
example:
Enumerable
.Range(1, 10)
.TakeLast(3) // <--- takes last 3 items
.ToList()
.ForEach(i => System.Console.WriteLine(i))
// outputs:
// 8
// 9
// 10
Note: I missed your question title which said Using Linq, so my answer does not in fact use Linq.
If you want to avoid caching a non-lazy copy of the entire collection, you could write a simple method that does it using a linked list.
The following method will add each value it finds in the original collection into a linked list, and trim the linked list down to the number of items required. Since it keeps the linked list trimmed to this number of items the entire time through iterating through the collection, it will only keep a copy of at most N items from the original collection.
It does not require you to know the number of items in the original collection, nor iterate over it more than once.
Usage:
IEnumerable<int> sequence = Enumerable.Range(1, 10000);
IEnumerable<int> last10 = sequence.TakeLast(10);
...
Extension method:
public static class Extensions
{
public static IEnumerable<T> TakeLast<T>(this IEnumerable<T> collection,
int n)
{
if (collection == null)
throw new ArgumentNullException(nameof(collection));
if (n < 0)
throw new ArgumentOutOfRangeException(nameof(n), $"{nameof(n)} must be 0 or greater");
LinkedList<T> temp = new LinkedList<T>();
foreach (var value in collection)
{
temp.AddLast(value);
if (temp.Count > n)
temp.RemoveFirst();
}
return temp;
}
}
Here's a method that works on any enumerable but uses only O(N) temporary storage:
public static class TakeLastExtension
{
public static IEnumerable<T> TakeLast<T>(this IEnumerable<T> source, int takeCount)
{
if (source == null) { throw new ArgumentNullException("source"); }
if (takeCount < 0) { throw new ArgumentOutOfRangeException("takeCount", "must not be negative"); }
if (takeCount == 0) { yield break; }
T[] result = new T[takeCount];
int i = 0;
int sourceCount = 0;
foreach (T element in source)
{
result[i] = element;
i = (i + 1) % takeCount;
sourceCount++;
}
if (sourceCount < takeCount)
{
takeCount = sourceCount;
i = 0;
}
for (int j = 0; j < takeCount; ++j)
{
yield return result[(i + j) % takeCount];
}
}
}
Usage:
List<int> l = new List<int> {4, 6, 3, 6, 2, 5, 7};
List<int> lastElements = l.TakeLast(3).ToList();
It works by using a ring buffer of size N to store the elements as it sees them, overwriting old elements with new ones. When the end of the enumerable is reached the ring buffer contains the last N elements.
I am surprised that no one has mentioned it, but SkipWhile does have a method that uses the element's index.
public static IEnumerable<T> TakeLastN<T>(this IEnumerable<T> source, int n)
{
if (source == null)
throw new ArgumentNullException("Source cannot be null");
int goldenIndex = source.Count() - n;
return source.SkipWhile((val, index) => index < goldenIndex);
}
//Or if you like them one-liners (in the spirit of the current accepted answer);
//However, this is most likely impractical due to the repeated calculations
collection.SkipWhile((val, index) => index < collection.Count() - N)
The only perceivable benefit that this solution presents over others is that you can have the option to add in a predicate to make a more powerful and efficient LINQ query, instead of having two separate operations that traverse the IEnumerable twice.
public static IEnumerable<T> FilterLastN<T>(this IEnumerable<T> source, int n, Predicate<T> pred)
{
int goldenIndex = source.Count() - n;
return source.SkipWhile((val, index) => index < goldenIndex && pred(val));
}
Use EnumerableEx.TakeLast in RX's System.Interactive assembly. It's an O(N) implementation like #Mark's, but it uses a queue rather than a ring-buffer construct (and dequeues items when it reaches buffer capacity).
(NB: This is the IEnumerable version - not the IObservable version, though the implementation of the two is pretty much identical)
If you are dealing with a collection with a key (e.g. entries from a database) a quick (i.e. faster than the selected answer) solution would be
collection.OrderByDescending(c => c.Key).Take(3).OrderBy(c => c.Key);
If you don't mind dipping into Rx as part of the monad, you can use TakeLast:
IEnumerable<int> source = Enumerable.Range(1, 10000);
IEnumerable<int> lastThree = source.AsObservable().TakeLast(3).AsEnumerable();
I tried to combine efficiency and simplicity and end up with this :
public static IEnumerable<T> TakeLast<T>(this IEnumerable<T> source, int count)
{
if (source == null) { throw new ArgumentNullException("source"); }
Queue<T> lastElements = new Queue<T>();
foreach (T element in source)
{
lastElements.Enqueue(element);
if (lastElements.Count > count)
{
lastElements.Dequeue();
}
}
return lastElements;
}
About
performance : In C#, Queue<T> is implemented using a circular buffer so there is no object instantiation done each loop (only when the queue is growing up). I did not set queue capacity (using dedicated constructor) because someone might call this extension with count = int.MaxValue . For extra performance you might check if source implement IList<T> and if yes, directly extract the last values using array indexes.
If using a third-party library is an option, MoreLinq defines TakeLast() which does exactly this.
It is a little inefficient to take the last N of a collection using LINQ as all the above solutions require iterating across the collection. TakeLast(int n) in System.Interactive also has this problem.
If you have a list a more efficient thing to do is slice it using the following method
/// Select from start to end exclusive of end using the same semantics
/// as python slice.
/// <param name="list"> the list to slice</param>
/// <param name="start">The starting index</param>
/// <param name="end">The ending index. The result does not include this index</param>
public static List<T> Slice<T>
(this IReadOnlyList<T> list, int start, int? end = null)
{
if (end == null)
{
end = list.Count();
}
if (start < 0)
{
start = list.Count + start;
}
if (start >= 0 && end.Value > 0 && end.Value > start)
{
return list.GetRange(start, end.Value - start);
}
if (end < 0)
{
return list.GetRange(start, (list.Count() + end.Value) - start);
}
if (end == start)
{
return new List<T>();
}
throw new IndexOutOfRangeException(
"count = " + list.Count() +
" start = " + start +
" end = " + end);
}
with
public static List<T> GetRange<T>( this IReadOnlyList<T> list, int index, int count )
{
List<T> r = new List<T>(count);
for ( int i = 0; i < count; i++ )
{
int j=i + index;
if ( j >= list.Count )
{
break;
}
r.Add(list[j]);
}
return r;
}
and some test cases
[Fact]
public void GetRange()
{
IReadOnlyList<int> l = new List<int>() { 0, 10, 20, 30, 40, 50, 60 };
l
.GetRange(2, 3)
.ShouldAllBeEquivalentTo(new[] { 20, 30, 40 });
l
.GetRange(5, 10)
.ShouldAllBeEquivalentTo(new[] { 50, 60 });
}
[Fact]
void SliceMethodShouldWork()
{
var list = new List<int>() { 1, 3, 5, 7, 9, 11 };
list.Slice(1, 4).ShouldBeEquivalentTo(new[] { 3, 5, 7 });
list.Slice(1, -2).ShouldBeEquivalentTo(new[] { 3, 5, 7 });
list.Slice(1, null).ShouldBeEquivalentTo(new[] { 3, 5, 7, 9, 11 });
list.Slice(-2)
.Should()
.BeEquivalentTo(new[] {9, 11});
list.Slice(-2,-1 )
.Should()
.BeEquivalentTo(new[] {9});
}
I know it's to late to answer this question. But if you are working with collection of type IList<> and you don't care about an order of the returned collection, then this method is working faster. I've used Mark Byers answer and made a little changes. So now method TakeLast is:
public static IEnumerable<T> TakeLast<T>(IList<T> source, int takeCount)
{
if (source == null) { throw new ArgumentNullException("source"); }
if (takeCount < 0) { throw new ArgumentOutOfRangeException("takeCount", "must not be negative"); }
if (takeCount == 0) { yield break; }
if (source.Count > takeCount)
{
for (int z = source.Count - 1; takeCount > 0; z--)
{
takeCount--;
yield return source[z];
}
}
else
{
for(int i = 0; i < source.Count; i++)
{
yield return source[i];
}
}
}
For test I have used Mark Byers method and kbrimington's andswer. This is test:
IList<int> test = new List<int>();
for(int i = 0; i<1000000; i++)
{
test.Add(i);
}
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
IList<int> result = TakeLast(test, 10).ToList();
stopwatch.Stop();
Stopwatch stopwatch1 = new Stopwatch();
stopwatch1.Start();
IList<int> result1 = TakeLast2(test, 10).ToList();
stopwatch1.Stop();
Stopwatch stopwatch2 = new Stopwatch();
stopwatch2.Start();
IList<int> result2 = test.Skip(Math.Max(0, test.Count - 10)).Take(10).ToList();
stopwatch2.Stop();
And here are results for taking 10 elements:
and for taking 1000001 elements results are:
Here's my solution:
public static class EnumerationExtensions
{
public static IEnumerable<T> TakeLast<T>(this IEnumerable<T> input, int count)
{
if (count <= 0)
yield break;
var inputList = input as IList<T>;
if (inputList != null)
{
int last = inputList.Count;
int first = last - count;
if (first < 0)
first = 0;
for (int i = first; i < last; i++)
yield return inputList[i];
}
else
{
// Use a ring buffer. We have to enumerate the input, and we don't know in advance how many elements it will contain.
T[] buffer = new T[count];
int index = 0;
count = 0;
foreach (T item in input)
{
buffer[index] = item;
index = (index + 1) % buffer.Length;
count++;
}
// The index variable now points at the next buffer entry that would be filled. If the buffer isn't completely
// full, then there are 'count' elements preceding index. If the buffer *is* full, then index is pointing at
// the oldest entry, which is the first one to return.
//
// If the buffer isn't full, which means that the enumeration has fewer than 'count' elements, we'll fix up
// 'index' to point at the first entry to return. That's easy to do; if the buffer isn't full, then the oldest
// entry is the first one. :-)
//
// We'll also set 'count' to the number of elements to be returned. It only needs adjustment if we've wrapped
// past the end of the buffer and have enumerated more than the original count value.
if (count < buffer.Length)
index = 0;
else
count = buffer.Length;
// Return the values in the correct order.
while (count > 0)
{
yield return buffer[index];
index = (index + 1) % buffer.Length;
count--;
}
}
}
public static IEnumerable<T> SkipLast<T>(this IEnumerable<T> input, int count)
{
if (count <= 0)
return input;
else
return input.SkipLastIter(count);
}
private static IEnumerable<T> SkipLastIter<T>(this IEnumerable<T> input, int count)
{
var inputList = input as IList<T>;
if (inputList != null)
{
int first = 0;
int last = inputList.Count - count;
if (last < 0)
last = 0;
for (int i = first; i < last; i++)
yield return inputList[i];
}
else
{
// Aim to leave 'count' items in the queue. If the input has fewer than 'count'
// items, then the queue won't ever fill and we return nothing.
Queue<T> elements = new Queue<T>();
foreach (T item in input)
{
elements.Enqueue(item);
if (elements.Count > count)
yield return elements.Dequeue();
}
}
}
}
The code is a bit chunky, but as a drop-in reusable component, it should perform as well as it can in most scenarios, and it'll keep the code that's using it nice and concise. :-)
My TakeLast for non-IList`1 is based on the same ring buffer algorithm as that in the answers by #Mark Byers and #MackieChan further up. It's interesting how similar they are -- I wrote mine completely independently. Guess there's really just one way to do a ring buffer properly. :-)
Looking at #kbrimington's answer, an additional check could be added to this for IQuerable<T> to fall back to the approach that works well with Entity Framework -- assuming that what I have at this point does not.
Honestly I'm not super proud of the answer, but for small collections you could use the following:
var lastN = collection.Reverse().Take(n).Reverse();
A bit hacky but it does the job ;)
My solution is based on ranges, introduced in C# version 8.
public static IEnumerable<T> TakeLast<T>(this IEnumerable<T> source, int N)
{
return source.ToArray()[(source.Count()-N)..];
}
After running a benchmark with most rated solutions (and my humbly proposed solution):
public static class TakeLastExtension
{
public static IEnumerable<T> TakeLastMarkByers<T>(this IEnumerable<T> source, int takeCount)
{
if (source == null) { throw new ArgumentNullException("source"); }
if (takeCount < 0) { throw new ArgumentOutOfRangeException("takeCount", "must not be negative"); }
if (takeCount == 0) { yield break; }
T[] result = new T[takeCount];
int i = 0;
int sourceCount = 0;
foreach (T element in source)
{
result[i] = element;
i = (i + 1) % takeCount;
sourceCount++;
}
if (sourceCount < takeCount)
{
takeCount = sourceCount;
i = 0;
}
for (int j = 0; j < takeCount; ++j)
{
yield return result[(i + j) % takeCount];
}
}
public static IEnumerable<T> TakeLastKbrimington<T>(this IEnumerable<T> source, int N)
{
return source.Skip(Math.Max(0, source.Count() - N));
}
public static IEnumerable<T> TakeLastJamesCurran<T>(this IEnumerable<T> source, int N)
{
return source.Reverse().Take(N).Reverse();
}
public static IEnumerable<T> TakeLastAlex<T>(this IEnumerable<T> source, int N)
{
return source.ToArray()[(source.Count()-N)..];
}
}
Test
[MemoryDiagnoser]
public class TakeLastBenchmark
{
[Params(10000)]
public int N;
private readonly List<string> l = new();
[GlobalSetup]
public void Setup()
{
for (var i = 0; i < this.N; i++)
{
this.l.Add($"i");
}
}
[Benchmark]
public void Benchmark1_MarkByers()
{
var lastElements = l.TakeLastMarkByers(3).ToList();
}
[Benchmark]
public void Benchmark2_Kbrimington()
{
var lastElements = l.TakeLastKbrimington(3).ToList();
}
[Benchmark]
public void Benchmark3_JamesCurran()
{
var lastElements = l.TakeLastJamesCurran(3).ToList();
}
[Benchmark]
public void Benchmark4_Alex()
{
var lastElements = l.TakeLastAlex(3).ToList();
}
}
Program.cs:
var summary = BenchmarkRunner.Run(typeof(TakeLastBenchmark).Assembly);
Command dotnet run --project .\TestsConsole2.csproj -c Release --logBuildOutput
The results were following:
// * Summary *
BenchmarkDotNet=v0.13.2, OS=Windows 10 (10.0.19044.1889/21H2/November2021Update)
AMD Ryzen 5 5600X, 1 CPU, 12 logical and 6 physical cores
.NET SDK=6.0.401
[Host] : .NET 6.0.9 (6.0.922.41905), X64 RyuJIT AVX2
DefaultJob : .NET 6.0.9 (6.0.922.41905), X64 RyuJIT AVX2
Method
N
Mean
Error
StdDev
Gen0
Gen1
Allocated
Benchmark1_MarkByers
10000
89,390.53 ns
1,735.464 ns
1,704.457 ns
-
-
248 B
Benchmark2_Kbrimington
10000
46.15 ns
0.410 ns
0.363 ns
0.0076
-
128 B
Benchmark3_JamesCurran
10000
2,703.15 ns
46.298 ns
67.862 ns
4.7836
0.0038
80264 B
Benchmark4_Alex
10000
2,513.48 ns
48.661 ns
45.517 ns
4.7607
-
80152 B
Turns out the solution proposed by #Kbrimington to be the most efficient in terms of memory alloc as well as raw performance.
Below the real example how to take last 3 elements from a collection (array):
// split address by spaces into array
string[] adrParts = adr.Split(new string[] { " " },StringSplitOptions.RemoveEmptyEntries);
// take only 3 last items in array
adrParts = adrParts.SkipWhile((value, index) => { return adrParts.Length - index > 3; }).ToArray();
Using This Method To Get All Range Without Error
public List<T> GetTsRate( List<T> AllT,int Index,int Count)
{
List<T> Ts = null;
try
{
Ts = AllT.ToList().GetRange(Index, Count);
}
catch (Exception ex)
{
Ts = AllT.Skip(Index).ToList();
}
return Ts ;
}
Little different implementation with usage of circular buffer. The benchmarks show that the method is circa two times faster than ones using Queue (implementation of TakeLast in System.Linq), however not without a cost - it needs a buffer which grows along with the requested number of elements, even if you have a small collection you can get huge memory allocation.
public IEnumerable<T> TakeLast<T>(IEnumerable<T> source, int count)
{
int i = 0;
if (count < 1)
yield break;
if (source is IList<T> listSource)
{
if (listSource.Count < 1)
yield break;
for (i = listSource.Count < count ? 0 : listSource.Count - count; i < listSource.Count; i++)
yield return listSource[i];
}
else
{
bool move = true;
bool filled = false;
T[] result = new T[count];
using (var enumerator = source.GetEnumerator())
while (move)
{
for (i = 0; (move = enumerator.MoveNext()) && i < count; i++)
result[i] = enumerator.Current;
filled |= move;
}
if (filled)
for (int j = i; j < count; j++)
yield return result[j];
for (int j = 0; j < i; j++)
yield return result[j];
}
}
//detailed code for the problem
//suppose we have a enumerable collection 'collection'
var lastIndexOfCollection=collection.Count-1 ;
var nthIndexFromLast= lastIndexOfCollection- N;
var desiredCollection=collection.GetRange(nthIndexFromLast, N);
---------------------------------------------------------------------
// use this one liner
var desiredCollection=collection.GetRange((collection.Count-(1+N)), N);

Categories

Resources