Cheapest way to copy an IEnumerable<T>?

Cheapest way to copy an IEnumerable<T>? - c#

I've got an IEnumerable<T>, and I need a copy of it. Anything that implements IEnumerable<T> will do just fine. What's the cheapest way to copy it? .ToArray() maybe?

ToArray is not necessarily faster than ToList. Just use ToList.
The point is as long as you don't know the number of elements of the original sequence before enumerating, you end up with resizing an array and adding elements to it like a List<T> does, so ToArray will have to do the same thing a List<T> does anyway. Besides, ToList gives you a List<T> and that's nicer than a raw array.
Of course, if you know the concrete type of the IEnumerable<T> instance, there can be faster methods, but that's not germane to the point.
Side note: using an array (unless you have to) is arguably a micro-optimization and should be avoided most of the time.

Enumerable::ToArray and Enumerable::ToList ultimately use the same technique to receive elements from the source into an internal array buffer and, once the size of that buffer is reached, they will allocate a new buffer double the size, memcpy over and continue adding elements, repeating this process until enumeration over the source is complete. The difference in the end is that ToArray, which uses a Buffer<T> implementation internally, must then allocate an exactly sized Array and copy the elements into it before returning the result. On the other hand, ToList just needs to return the List<T> with a potentially (likely) only partially filled array buffer inside of it.
Both implementations also have an optimization where if the source IEnumerable is an ICollection they will actually allocate the exact right buffer size to begin with using ICollection::Count and then use ICollection::CopyTo from the source to fill their buffers.
In the end you will find that they perform nearly identically in most situations, but the List<T> is technically a "heavier" class to hang on to in the end and the ToArray has that extra allocate + memcpy at the end (if the source isn't an ICollection) to be able to hand back the exactly right sized array. I usually stick with ToList myself unless I know I need to pass the result to something that requires an array like say maybe Task::WaitAll.

I was about to suggest the possibility of using .AsParallel().ToList() if you have TPL at your disposal, but informal testing on my dual-core laptop shows it to be 7x slower than just .ToList(). So, stick with Mehrdad's answer.

The second-to-cheapest way is to say new List<T>(myEnumerable).ToArray(). The cheapest way is to use either .ToArray() (from LINQ) or, if you don't have C# 3.5, to create your own buffer and add to it while doubling its size, then trim it at the end.

Related

How to get the size of an ImmutableQueue in less than linear time

I have an Immutable Queue declared like this:
public ImmutableQueue<JObject> MyImmutableQueue =>
ImmutableQueue.CreateRange<JObject>(myConcurrentQueue);
I want to reference it from my test class like this:
myClass.MyImmutableQueue.Count
However, I see that the Count method does not exist.
I checked the API here and there seems to be no method such as Count, Size or Length.
If there is no such method, probably I will make an extension method that iterates and counts the elements, but that's inefficient.
So, is there some method or some (less than linear time complexity) way of counting the elements of an ImmutableQueue?

There is no way to do that, because ImmutableQueue is implemented in a way that require full traversal to count elements. Even with reflection you cannot do that.
If you look at source code, you will see that internally it uses two ImmutableStacks, and ImmutableStack is represented with head and tail (where tail is another ImmutableStack). So to figure out number of elements you have to count number of elements in those stacks, and to do that - you have to traverse them completely. So count complexity of this implementation of ImmutableQueue is always linear.
So if you absolutely need that - just use IEnumerable.Count() extension method, because you cannot do any better anyway.

Does array resizing invoke the GC?

I looked into the implementation of Array.Resize() and noticed that a new array is created and returned. I'm aiming for zero memory allocation during gameplay and so I need to avoid creating any new reference types. Does resizing an array trigger the Garbage Collector on the previous array? I'm creating my own 2D array resizer, but it essentially functions in the same way as the .NET Resize() method.
If the new array is smaller than the previous one, but excess objects have already been placed back into a generic object pool, will this invoke the GC?
Arrays will constantly be created in my game loop, so I need to try and make it as efficient as possible. I'm trying to create an array pool as such, so that there's no need to keep creating them ingame. However, if the resize method does the same thing, then it makes little sense to not just instantiate a new array instead of having the pool.
Thanks for the help

Array.Resize doesn't actually change the original array at all - anyone who still has a reference to it will be able to use it as before. Therefore there's no optimization possible. Frankly it's a badly named method, IMO :(
From the docs:
This method allocates a new array with
the specified size, copies elements
from the old array to the new one, and
then replaces the old array with the
new one.
So no, it's not going to reuse the original memory or anything like that. It's just creating a shallow copy with a different size.

Yes, using Array.Resize causes a new array to be allocated and the old one to eventually be collected (unless there are still references to it somewhere).
A more low-level array resizer could possibly do some minor optimization in some cases (for example when the array is being made smaller or there happens to be memory available right after the array), but .NET's implementation doesn't do that.

Implicitly yes.
Explicitly no.

Any allocation will eventually be cleaned up by the GC when no more references exist, so yes.
If you want to avoid resizing your arrays, the best thing you could do would be to preallocate with a large enough size to avoid having to reallocate at all. In that case, you might as well just use a collection class with an initial capacity specified in the constructor, such as List.

Efficiency: Creating an array of doubles incrementally?

Consider the following code:
List<double> l = new List<double>();
//add unknown number of values to the list
l.Add(0.1); //assume we don't have these values ahead of time.
l.Add(0.11);
l.Add(0.1);
l.ToArray(); //ultimately we want an array of doubles
Anything wrong with this approach? Is there a more appropriate way to build an array, without knowing the size, or elements ahead of time?

There's nothing wrong with your approach. You are using the correct data type for the purpose.

After some observations you can get a better idea of the total elements in that list. Then you can create a new list with an initial capacity in the constructor:
List<double> l = new List<double>(capacity);
Other than this, it's the proper technique and data structure.
UPDATE:
If you:
Need only the Add and ToArray functions of the List<T> structure,
And you can't really predict the total capacity
And you end up with more than 1K elements
And better performance is really really (really!) your goal
Then you might want to write your own interface:
public interface IArrayBuilder<T>
{
void Add(T item);
T[] ToArray();
}
And then write your own implementation, which might be better than List<T>. Why is that? because List<T> holds a single array internally, and it increases its size when needed. The procedure of increasing the inner array costs, in terms of performance, since it allocates new memory (and perhaps copies the elements from the old array to the new one, I don't remember). However, if all of the conditions described above are true, all you need is to build an array, you don't really need all of the data to be stored in a single array internally.
I know it's a long shot, but I think it's better sharing such thoughts...

As others have already pointed out: This is the correct approach. I'll just add that if you can somehow avoid the array and use List<T> directly or perhaps IEnumerable<T>, you'll avoid copying the array as ToArray actually copies the internal array of the list instance.
Eric Lippert has a great post about arrays, that you may find relevant.

A dynamic data structure like a List is the correct way to implement this. The only real advantage arrays have over a List is the O(1) access performance (compared to O(n) in List). The flexibility more than makes up for this performance loss imho

A very basic auto-expanding list/array

I have a method which returns an array of fixed type objects (let's say MyObject).
The method creates a new empty Stack<MyObject>. Then, it does some work and pushes some number of MyObjects to the end of the Stack. Finally, it returns the Stack.ToArray().
It does not change already added items or their properties, nor remove them. The number of elements to add will cost performance. There is no need to sort/order the elements.
Is Stack a best thing to use? Or must I switch to Collection or List to ensure better performance and/or lower memory cost?

Stack<T> will not be any faster than List<T>.
For optimal performance, you should use a List<T> and set the Capacity to a number larger than or equal to the number of items you plan to add.

If the ordering doesn't matter and your method doesn't need to add/remove/edit items that have already been processed then why not return IEnumerable<MyObject> and just yield each item as you go?
Then your calling code can either use the IEnumerable<MyObject> sequence directly, or call ToArray, ToList etc as required.
For example...
// use the sequence directly
foreach (MyObject item in GetObjects())
{
Console.WriteLine(item.ToString());
}
// ...
// convert to an array
MyObject[] myArray = GetObjects().ToArray();
// ...
// convert to a list
List<MyObject> myList = GetObjects().ToList();
// ...
public IEnumerable<MyObject> GetObjects()
{
foreach (MyObject foo in GetObjectsFromSomewhereElse())
{
MyObject bar = DoSomeProcessing(foo);
yield return bar;
}
}

Stack<T> is not any faster than List<T> in this case, so I would probably use List, unless something about what you are doing is "stack-like". List<T> is the more standard data structure to use when what you want is basically a growable array, whereas stacks are usually used when you need LIFO behavior for the collection.

For this purpose, there is not any other collections in the framework that will perform considerably better than a Stack<T>.
However, both Stack<T> and List<T> auto-grows their internal array of items when the initial capacity is exceeded. This involves creating a new larger array and copying all items. This costs some performance.
If you know the number of items beforehand, initialize your collection to that capacity to avoid auto-growth. If you don't know exactly, choose a capacity that is unlikely to be insufficient.
Most of the built in collections take the initial capacity as a constructor argument:
var stack = new Stack<T>(200); // Initial capacity of 200 items.

Use a LinkedList maybe?
Though LinkedLists are only useful with sequential data.

You don't need Stack<> if all you're going to do is append. You can use List<>.Add (http://msdn.microsoft.com/en-us/library/d9hw1as6.aspx) and then ToArray.
(You'll also want to set initial capacity, as others have pointed out.)

If you need the semantics of a stack (last-in first-out), then the answer is, without any doubt, yes, a stack is your best solution. If you know from the start how many elements it will end up with, you can avoid the cost of automatic resizing by calling the constructor that receives a capacity.
If you're worried about the memory cost of copying the stack into an array, and you only need sequential access to the result, then, you can return the Stack<T> as an IEnumerable<T> instead of an array and iterate it with foreach.
All that said, unless this code proves it is problematic in terms of performance (i.e., by looking at data from a profiler), I wouldn't bother much and go with the semantics call.

C# List<T>.ToArray performance is bad?

I'm using .Net 3.5 (C#) and I've heard the performance of C# List<T>.ToArray is "bad", since it memory copies for all elements to form a new array. Is that true?

No that's not true. Performance is good since all it does is memory copy all elements (*) to form a new array.
Of course it depends on what you define as "good" or "bad" performance.
(*) references for reference types, values for value types.
EDIT
In response to your comment, using Reflector is a good way to check the implementation (see below). Or just think for a couple of minutes about how you would implement it, and take it on trust that Microsoft's engineers won't come up with a worse solution.
public T[] ToArray()
{
T[] destinationArray = new T[this._size];
Array.Copy(this._items, 0, destinationArray, 0, this._size);
return destinationArray;
}
Of course, "good" or "bad" performance only has a meaning relative to some alternative. If in your specific case, there is an alternative technique to achieve your goal that is measurably faster, then you can consider performance to be "bad". If there is no such alternative, then performance is "good" (or "good enough").
EDIT 2
In response to the comment: "No re-construction of objects?" :
No reconstruction for reference types. For value types the values are copied, which could loosely be described as reconstruction.

Reasons to call ToArray()
If the returned value is not meant to be modified, returning it as an array makes that fact a bit clearer.
If the caller is expected to perform many non-sequential accesses to the data, there can be a performance benefit to an array over a List<>.
If you know you will need to pass the returned value to a third-party function that expects an array.
Compatibility with calling functions that need to work with .NET version 1 or 1.1. These versions don't have the List<> type (or any generic types, for that matter).
Reasons not to call ToArray()
If the caller ever does need to add or remove elements, a List<> is absolutely required.
The performance benefits are not necessarily guaranteed, especially if the caller is accessing the data in a sequential fashion. There is also the additional step of converting from List<> to array, which takes processing time.
The caller can always convert the list to an array themselves.
taken from here

Yes, it's true that it does a memory copy of all elements. Is it a performance problem? That depends on your performance requirements.
A List contains an array internally to hold all the elements. The array grows if the capacity is no longer sufficient for the list. Any time that happens, the list will copy all elements into a new array. That happens all the time, and for most people that is no performance problem.
E.g. a list with a default constructor starts at capacity 16, and when you .Add() the 17th element, it creates a new array of size 32, copies the 16 old values and adds the 17th.
The size difference is also the reason why ToArray() returns a new array instance instead of passing the private reference.

This is what Microsoft's official documentation says about List.ToArray's time complexity
The elements are copied using Array.Copy, which is an O(n) operation, where n is Count.
Then, looking at Array.Copy, we see that it is usually not cloning the data but instead using references:
If sourceArray and destinationArray are both reference-type arrays or are both arrays of type Object, a shallow copy is performed. A shallow copy of an Array is a new Array containing references to the same elements as the original Array. The elements themselves or anything referenced by the elements are not copied. In contrast, a deep copy of an Array copies the elements and everything directly or indirectly referenced by the elements.
So in conclusion, this is a pretty efficient way of getting an array from a list.

it creates new references in an array, but that's just the only thing that that method could and should do...

Performance has to be understood in relative terms. Converting an array to a List involves copying the array, and the cost of that will depend on the size of the array. But you have to compare that cost to other other things your program is doing. How did you obtain the information to put into the array in the first place? If it was by reading from the disk, or a network connection, or a database, then an array copy in memory is very unlikely to make a detectable difference to the time taken.

For any kind of List/ICollection where it knows the length, it can allocate an array of exactly the right size from the start.
T[] destinationArray = new T[this._size];
Array.Copy(this._items, 0, destinationArray, 0, this._size);
return destinationArray;
If your source type is IEnumerable (not a List/Collection) then the source is:
items = new TElement[4];
..
if (no more space) {
TElement[] newItems = new TElement[checked(count * 2)];
Array.Copy(items, 0, newItems, 0, count);
items = newItems;
It starts at size 4 and grows exponentially, doubling each time it runs out of space. Each time it doubles, it has to reallocate memory and copy the data over.
If we know the source-data size, we can avoid this slight overhead. However in most cases eg array size <=1024, it will execute so quickly, that we don't even need to think about this implementation detail.
References: Enumerable.cs, List.cs (F12ing into them), Joe's answer

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Cheapest way to copy an IEnumerable<T>? - c#

I've got an IEnumerable<T>, and I need a copy of it. Anything that implements IEnumerable<T> will do just fine. What's the cheapest way to copy it? .ToArray() maybe?

I was about to suggest the possibility of using .AsParallel().ToList() if you have TPL at your disposal, but informal testing on my dual-core laptop shows it to be 7x slower than just .ToList(). So, stick with Mehrdad's answer.

The second-to-cheapest way is to say new List<T>(myEnumerable).ToArray(). The cheapest way is to use either .ToArray() (from LINQ) or, if you don't have C# 3.5, to create your own buffer and add to it while doubling its size, then trim it at the end.

Related

How to get the size of an ImmutableQueue in less than linear time

Does array resizing invoke the GC?

Efficiency: Creating an array of doubles incrementally?

A very basic auto-expanding list/array

C# List<T>.ToArray performance is bad?

Categories

Resources