C# List<T>.ToArray performance is bad? - c#

I'm using .Net 3.5 (C#) and I've heard the performance of C# List<T>.ToArray is "bad", since it memory copies for all elements to form a new array. Is that true?

No that's not true. Performance is good since all it does is memory copy all elements (*) to form a new array.
Of course it depends on what you define as "good" or "bad" performance.
(*) references for reference types, values for value types.
EDIT
In response to your comment, using Reflector is a good way to check the implementation (see below). Or just think for a couple of minutes about how you would implement it, and take it on trust that Microsoft's engineers won't come up with a worse solution.
public T[] ToArray()
{
T[] destinationArray = new T[this._size];
Array.Copy(this._items, 0, destinationArray, 0, this._size);
return destinationArray;
}
Of course, "good" or "bad" performance only has a meaning relative to some alternative. If in your specific case, there is an alternative technique to achieve your goal that is measurably faster, then you can consider performance to be "bad". If there is no such alternative, then performance is "good" (or "good enough").
EDIT 2
In response to the comment: "No re-construction of objects?" :
No reconstruction for reference types. For value types the values are copied, which could loosely be described as reconstruction.

Reasons to call ToArray()
If the returned value is not meant to be modified, returning it as an array makes that fact a bit clearer.
If the caller is expected to perform many non-sequential accesses to the data, there can be a performance benefit to an array over a List<>.
If you know you will need to pass the returned value to a third-party function that expects an array.
Compatibility with calling functions that need to work with .NET version 1 or 1.1. These versions don't have the List<> type (or any generic types, for that matter).
Reasons not to call ToArray()
If the caller ever does need to add or remove elements, a List<> is absolutely required.
The performance benefits are not necessarily guaranteed, especially if the caller is accessing the data in a sequential fashion. There is also the additional step of converting from List<> to array, which takes processing time.
The caller can always convert the list to an array themselves.
taken from here

Yes, it's true that it does a memory copy of all elements. Is it a performance problem? That depends on your performance requirements.
A List contains an array internally to hold all the elements. The array grows if the capacity is no longer sufficient for the list. Any time that happens, the list will copy all elements into a new array. That happens all the time, and for most people that is no performance problem.
E.g. a list with a default constructor starts at capacity 16, and when you .Add() the 17th element, it creates a new array of size 32, copies the 16 old values and adds the 17th.
The size difference is also the reason why ToArray() returns a new array instance instead of passing the private reference.

This is what Microsoft's official documentation says about List.ToArray's time complexity
The elements are copied using Array.Copy, which is an O(n) operation, where n is Count.
Then, looking at Array.Copy, we see that it is usually not cloning the data but instead using references:
If sourceArray and destinationArray are both reference-type arrays or are both arrays of type Object, a shallow copy is performed. A shallow copy of an Array is a new Array containing references to the same elements as the original Array. The elements themselves or anything referenced by the elements are not copied. In contrast, a deep copy of an Array copies the elements and everything directly or indirectly referenced by the elements.
So in conclusion, this is a pretty efficient way of getting an array from a list.

it creates new references in an array, but that's just the only thing that that method could and should do...

Performance has to be understood in relative terms. Converting an array to a List involves copying the array, and the cost of that will depend on the size of the array. But you have to compare that cost to other other things your program is doing. How did you obtain the information to put into the array in the first place? If it was by reading from the disk, or a network connection, or a database, then an array copy in memory is very unlikely to make a detectable difference to the time taken.

For any kind of List/ICollection where it knows the length, it can allocate an array of exactly the right size from the start.
T[] destinationArray = new T[this._size];
Array.Copy(this._items, 0, destinationArray, 0, this._size);
return destinationArray;
If your source type is IEnumerable (not a List/Collection) then the source is:
items = new TElement[4];
..
if (no more space) {
TElement[] newItems = new TElement[checked(count * 2)];
Array.Copy(items, 0, newItems, 0, count);
items = newItems;
It starts at size 4 and grows exponentially, doubling each time it runs out of space. Each time it doubles, it has to reallocate memory and copy the data over.
If we know the source-data size, we can avoid this slight overhead. However in most cases eg array size <=1024, it will execute so quickly, that we don't even need to think about this implementation detail.
References: Enumerable.cs, List.cs (F12ing into them), Joe's answer

Related

Arrays/Double Arrays vs Lists/Dictionaries [duplicate]

MyClass[] array;
List<MyClass> list;
What are the scenarios when one is preferable over the other? And why?
It is rare, in reality, that you would want to use an array. Definitely use a List<T> any time you want to add/remove data, since resizing arrays is expensive. If you know the data is fixed length, and you want to micro-optimise for some very specific reason (after benchmarking), then an array may be useful.
List<T> offers a lot more functionality than an array (although LINQ evens it up a bit), and is almost always the right choice. Except for params arguments, of course. ;-p
As a counter - List<T> is one-dimensional; where-as you have have rectangular (etc) arrays like int[,] or string[,,] - but there are other ways of modelling such data (if you need) in an object model.
See also:
How/When to abandon the use of Arrays in c#.net?
Arrays, What's the point?
That said, I make a lot of use of arrays in my protobuf-net project; entirely for performance:
it does a lot of bit-shifting, so a byte[] is pretty much essential for encoding;
I use a local rolling byte[] buffer which I fill before sending down to the underlying stream (and v.v.); quicker than BufferedStream etc;
it internally uses an array-based model of objects (Foo[] rather than List<Foo>), since the size is fixed once built, and needs to be very fast.
But this is definitely an exception; for general line-of-business processing, a List<T> wins every time.
Really just answering to add a link which I'm surprised hasn't been mentioned yet: Eric's Lippert's blog entry on "Arrays considered somewhat harmful."
You can judge from the title that it's suggesting using collections wherever practical - but as Marc rightly points out, there are plenty of places where an array really is the only practical solution.
Notwithstanding the other answers recommending List<T>, you'll want to use arrays when handling:
image bitmap data
other low-level data-structures (i.e. network protocols)
Unless you are really concerned with performance, and by that I mean, "Why are you using .Net instead of C++?" you should stick with List<>. It's easier to maintain and does all the dirty work of resizing an array behind the scenes for you. (If necessary, List<> is pretty smart about choosing array sizes so it doesn't need to usually.)
Arrays should be used in preference to List when the immutability of the collection itself is part of the contract between the client & provider code (not necessarily immutability of the items within the collection) AND when IEnumerable is not suitable.
For example,
var str = "This is a string";
var strChars = str.ToCharArray(); // returns array
It is clear that modification of "strChars" will not mutate the original "str" object, irrespective implementation-level knowledge of "str"'s underlying type.
But suppose that
var str = "This is a string";
var strChars = str.ToCharList(); // returns List<char>
strChars.Insert(0, 'X');
In this case, it's not clear from that code-snippet alone if the insert method will or will not mutate the original "str" object. It requires implementation level knowledge of String to make that determination, which breaks Design by Contract approach. In the case of String, it's not a big deal, but it can be a big deal in almost every other case. Setting the List to read-only does help but results in run-time errors, not compile-time.
If I know exactly how many elements I'm going to need, say I need 5 elements and only ever 5 elements then I use an array. Otherwise I just use a List<T>.
Arrays Vs. Lists is a classic maintainability vs. performance problem. The rule of thumb that nearly all developers follow is that you should shoot for both, but when they come in to conflict, choose maintainability over performance. The exception to that rule is when performance has already proven to be an issue. If you carry this principle in to Arrays Vs. Lists, then what you get is this:
Use strongly typed lists until you hit performance problems. If you hit a performance problem, make a decision as to whether dropping out to arrays will benefit your solution with performance more than it will be a detriment to your solution in terms of maintenance.
Most of the times, using a List would suffice. A List uses an internal array to handle its data, and automatically resizes the array when adding more elements to the List than its current capacity, which makes it more easy to use than an array, where you need to know the capacity beforehand.
See http://msdn.microsoft.com/en-us/library/ms379570(v=vs.80).aspx#datastructures20_1_topic5 for more information about Lists in C# or just decompile System.Collections.Generic.List<T>.
If you need multidimensional data (for example using a matrix or in graphics programming), you would probably go with an array instead.
As always, if memory or performance is an issue, measure it! Otherwise you could be making false assumptions about the code.
Another situation not yet mentioned is when one will have a large number of items, each of which consists of a fixed bunch of related-but-independent variables stuck together (e.g. the coordinates of a point, or the vertices of a 3d triangle). An array of exposed-field structures will allow the its elements to be efficiently modified "in place"--something which is not possible with any other collection type. Because an array of structures holds its elements consecutively in RAM, sequential accesses to array elements can be very fast. In situations where code will need to make many sequential passes through an array, an array of structures may outperform an array or other collection of class object references by a factor of 2:1; further, the ability to update elements in place may allow an array of structures to outperform any other kind of collection of structures.
Although arrays are not resizable, it is not difficult to have code store an array reference along with the number of elements that are in use, and replace the array with a larger one as required. Alternatively, one could easily write code for a type which behaved much like a List<T> but exposed its backing store, thus allowing one to say either MyPoints.Add(nextPoint); or MyPoints.Items[23].X += 5;. Note that the latter would not necessarily throw an exception if code tried to access beyond the end of the list, but usage would otherwise be conceptually quite similar to List<T>.
Rather than going through a comparison of the features of each data type, I think the most pragmatic answer is "the differences probably aren't that important for what you need to accomplish, especially since they both implement IEnumerable, so follow popular convention and use a List until you have a reason not to, at which point you probably will have your reason for using an array over a List."
Most of the time in managed code you're going to want to favor collections being as easy to work with as possible over worrying about micro-optimizations.
Lists in .NET are wrappers over arrays, and use an array internally. The time complexity of operations on lists is the same as would be with arrays, however there is a little more overhead with all the added functionality / ease of use of lists (such as automatic resizing and the methods that come with the list class). Pretty much, I would recommend using lists in all cases unless there is a compelling reason not to do so, such as if you need to write extremely optimized code, or are working with other code that is built around arrays.
Since no one mention: In C#, an array is a list. MyClass[] and List<MyClass> both implement IList<MyClass>. (e.g. void Foo(IList<int> foo) can be called like Foo(new[] { 1, 2, 3 }) or Foo(new List<int> { 1, 2, 3 }) )
So, if you are writing a method that accepts a List<MyClass> as an argument, but uses only subset of features, you may want to declare as IList<MyClass> instead for callers' convenience.
Details:
Why array implements IList?
How do arrays in C# partially implement IList<T>?
They may be unpopular, but I am a fan of Arrays in game projects.
- Iteration speed can be important in some cases, foreach on an Array has significantly less overhead if you are not doing much per element
- Adding and removing is not that hard with helper functions
- Its slower, but in cases where you only build it once it may not matter
- In most cases, less extra memory is wasted (only really significant with Arrays of structs)
- Slightly less garbage and pointers and pointer chasing
That being said, I use List far more often than Arrays in practice, but they each have their place.
It would be nice if List where a built in type so that they could optimize out the wrapper and enumeration overhead.
Populating a list is easier than an array. For arrays, you need to know the exact length of data, but for lists, data size can be any. And, you can convert a list into an array.
List<URLDTO> urls = new List<URLDTO>();
urls.Add(new URLDTO() {
key = "wiki",
url = "https://...",
});
urls.Add(new URLDTO()
{
key = "url",
url = "http://...",
});
urls.Add(new URLDTO()
{
key = "dir",
url = "https://...",
});
// convert a list into an array: URLDTO[]
return urls.ToArray();
Keep in mind that with List is not possible to do this:
List<string> arr = new List<string>();
arr.Add("string a");
arr.Add("string b");
arr.Add("string c");
arr.Add("string d");
arr[10] = "new string";
It generates an Exception.
Instead with arrays:
string[] strArr = new string[20];
strArr[0] = "string a";
strArr[1] = "string b";
strArr[2] = "string c";
strArr[3] = "string d";
strArr[10] = "new string";
But with Arrays there is not an automatic data structure resizing. You have to manage it manually or with Array.Resize method.
A trick could be initialize a List with an empty array.
List<string> arr = new List<string>(new string[100]);
arr[10] = "new string";
But in this case if you put a new element using Add method it will be injected in the end of the List.
List<string> arr = new List<string>(new string[100]);
arr[10] = "new string";
arr.Add("bla bla bla"); // this will be in the end of List
It completely depends on the contexts in which the data structure is needed. For example, if you are creating items to be used by other functions or services using List is the perfect way to accomplish it.
Now if you have a list of items and you just want to display them, say on a web page array is the container you need to use.

Why most of the data structures in generic collections use array despite of Large Object Heap fragmentation?

I could see that CoreCLR and CoreFx implicitly use array for most of the generic collections. what is the main driving factor to go with arrays and how it handles any side effects of LOH fragmentation.
What other then arrays should collections be?
More importnatly, what other then arrays could collections be?
In use collection boils down to "arrays - and stuff we wrap around arrays, for ease of use.":
The pure thing (arrays), wich do offer some conveniences like bounds checks in C#/.NET
Self growing arrays (Lists)
Two synchronized arrays that allow the mapping of any any input to any element (Dictionaries key/value pair)
Three synchornized array: Key, Value and a Hashvalue to quickly identify not-matching keys (HastTable).
Below the hood - regardless of how hard .NET makes it to use pointers - it all boils down to some code doing C/C++ style pointer arythmethic to get the next element.
Edit 1: As I learned in another place, .NET Dictionaries are actually implemented as HashLists. The HashList class is just the pre-generics version. Object has a GetHashCode function with sensible default behavior wich can be used, but also fully overwritten.
Fragmentation wise the "best" would be a array of references. It can be as small as the reference width (a Pointer or slightly bigger) and the GC can move around the instances to defragment memory. Of course then you get the slight overhead of accessing references rather the just counting/mathing up a pointer, so as usualy it is a memory vs speed tradeoff. However this might go into Speed Rant Territory of detail.
Edit 2: As Markus Appel pointed out in the comments, there is something even better for fragmentation avoidance: Linked lists. Even that single array of references - if you just make it big enough - will take quite some memory in one indivisible chunk. So it might run into object size limits or array indexer limits. A linked list will do neither. But as a result the performance is around a disk that was never defragmented.
Generics is just a convience to have typesafety in collections/other places. It avoids you having to use the dreaded Object as type, wich ruins all compile-time typesafety. Afaik they add nothing else to this situation. List<string> works the same as a StringList would.
Array access is faster as it is a linear storage. If Arrays can solve a problem well enough they are a better storage for traversal rather than always identifying where the next object is stored. For Large data structures this performance benefit will also be amplified.
Using arrays can cause fragmentation if used carelessly. In the general case though, the performance gains outweigh the cost.
When the buffer runs out, the collection allocates a new one with double the size. If the code inserts a lot of items without specifying a capacity, this results in log2(N) reallocations. If the code does specify a capacity though, even a very rough approximation, there may be no fragmentation issues at all.
Removal is another expensive case as the collection will have to move the items after the deleted item(s) to the left.
In general though, array storage offers far better performance than other storage structures though, both for reading, inserting and allocating memory. Deletions are rare in most cases.
For example, inserting N items in a linked list requires allocating N objects to hold that value and storing N pointers. That cost will be paid for every insertion, while the GC will have a lot more objects to track and collect. Inserting 100K items in a linked list would allocate 100K node objects that would need tracking.
With an array there won't be any allocations unless the buffer runs out. In the majority of cases insertion means simply writing to a buffer location and updating a count. When the buffer runs out there will be a single reallocation and an (expensive) copy operation. For 100K items, that's 17 allocations. In most cases, that's an acceptable cost.
To reduce or even get rid of allocations, the code can specify a capacity that's used as the initial buffer size. Specifying even a very rough estimate can reduce allocations a lot. Specifying 1024 as the initial capacity for 100K items would reduce reallocations to 7.

Best practice for iterating over an ad-hoc list of strings in C# [duplicate]

MyClass[] array;
List<MyClass> list;
What are the scenarios when one is preferable over the other? And why?
It is rare, in reality, that you would want to use an array. Definitely use a List<T> any time you want to add/remove data, since resizing arrays is expensive. If you know the data is fixed length, and you want to micro-optimise for some very specific reason (after benchmarking), then an array may be useful.
List<T> offers a lot more functionality than an array (although LINQ evens it up a bit), and is almost always the right choice. Except for params arguments, of course. ;-p
As a counter - List<T> is one-dimensional; where-as you have have rectangular (etc) arrays like int[,] or string[,,] - but there are other ways of modelling such data (if you need) in an object model.
See also:
How/When to abandon the use of Arrays in c#.net?
Arrays, What's the point?
That said, I make a lot of use of arrays in my protobuf-net project; entirely for performance:
it does a lot of bit-shifting, so a byte[] is pretty much essential for encoding;
I use a local rolling byte[] buffer which I fill before sending down to the underlying stream (and v.v.); quicker than BufferedStream etc;
it internally uses an array-based model of objects (Foo[] rather than List<Foo>), since the size is fixed once built, and needs to be very fast.
But this is definitely an exception; for general line-of-business processing, a List<T> wins every time.
Really just answering to add a link which I'm surprised hasn't been mentioned yet: Eric's Lippert's blog entry on "Arrays considered somewhat harmful."
You can judge from the title that it's suggesting using collections wherever practical - but as Marc rightly points out, there are plenty of places where an array really is the only practical solution.
Notwithstanding the other answers recommending List<T>, you'll want to use arrays when handling:
image bitmap data
other low-level data-structures (i.e. network protocols)
Unless you are really concerned with performance, and by that I mean, "Why are you using .Net instead of C++?" you should stick with List<>. It's easier to maintain and does all the dirty work of resizing an array behind the scenes for you. (If necessary, List<> is pretty smart about choosing array sizes so it doesn't need to usually.)
Arrays should be used in preference to List when the immutability of the collection itself is part of the contract between the client & provider code (not necessarily immutability of the items within the collection) AND when IEnumerable is not suitable.
For example,
var str = "This is a string";
var strChars = str.ToCharArray(); // returns array
It is clear that modification of "strChars" will not mutate the original "str" object, irrespective implementation-level knowledge of "str"'s underlying type.
But suppose that
var str = "This is a string";
var strChars = str.ToCharList(); // returns List<char>
strChars.Insert(0, 'X');
In this case, it's not clear from that code-snippet alone if the insert method will or will not mutate the original "str" object. It requires implementation level knowledge of String to make that determination, which breaks Design by Contract approach. In the case of String, it's not a big deal, but it can be a big deal in almost every other case. Setting the List to read-only does help but results in run-time errors, not compile-time.
If I know exactly how many elements I'm going to need, say I need 5 elements and only ever 5 elements then I use an array. Otherwise I just use a List<T>.
Arrays Vs. Lists is a classic maintainability vs. performance problem. The rule of thumb that nearly all developers follow is that you should shoot for both, but when they come in to conflict, choose maintainability over performance. The exception to that rule is when performance has already proven to be an issue. If you carry this principle in to Arrays Vs. Lists, then what you get is this:
Use strongly typed lists until you hit performance problems. If you hit a performance problem, make a decision as to whether dropping out to arrays will benefit your solution with performance more than it will be a detriment to your solution in terms of maintenance.
Most of the times, using a List would suffice. A List uses an internal array to handle its data, and automatically resizes the array when adding more elements to the List than its current capacity, which makes it more easy to use than an array, where you need to know the capacity beforehand.
See http://msdn.microsoft.com/en-us/library/ms379570(v=vs.80).aspx#datastructures20_1_topic5 for more information about Lists in C# or just decompile System.Collections.Generic.List<T>.
If you need multidimensional data (for example using a matrix or in graphics programming), you would probably go with an array instead.
As always, if memory or performance is an issue, measure it! Otherwise you could be making false assumptions about the code.
Another situation not yet mentioned is when one will have a large number of items, each of which consists of a fixed bunch of related-but-independent variables stuck together (e.g. the coordinates of a point, or the vertices of a 3d triangle). An array of exposed-field structures will allow the its elements to be efficiently modified "in place"--something which is not possible with any other collection type. Because an array of structures holds its elements consecutively in RAM, sequential accesses to array elements can be very fast. In situations where code will need to make many sequential passes through an array, an array of structures may outperform an array or other collection of class object references by a factor of 2:1; further, the ability to update elements in place may allow an array of structures to outperform any other kind of collection of structures.
Although arrays are not resizable, it is not difficult to have code store an array reference along with the number of elements that are in use, and replace the array with a larger one as required. Alternatively, one could easily write code for a type which behaved much like a List<T> but exposed its backing store, thus allowing one to say either MyPoints.Add(nextPoint); or MyPoints.Items[23].X += 5;. Note that the latter would not necessarily throw an exception if code tried to access beyond the end of the list, but usage would otherwise be conceptually quite similar to List<T>.
Rather than going through a comparison of the features of each data type, I think the most pragmatic answer is "the differences probably aren't that important for what you need to accomplish, especially since they both implement IEnumerable, so follow popular convention and use a List until you have a reason not to, at which point you probably will have your reason for using an array over a List."
Most of the time in managed code you're going to want to favor collections being as easy to work with as possible over worrying about micro-optimizations.
Lists in .NET are wrappers over arrays, and use an array internally. The time complexity of operations on lists is the same as would be with arrays, however there is a little more overhead with all the added functionality / ease of use of lists (such as automatic resizing and the methods that come with the list class). Pretty much, I would recommend using lists in all cases unless there is a compelling reason not to do so, such as if you need to write extremely optimized code, or are working with other code that is built around arrays.
Since no one mention: In C#, an array is a list. MyClass[] and List<MyClass> both implement IList<MyClass>. (e.g. void Foo(IList<int> foo) can be called like Foo(new[] { 1, 2, 3 }) or Foo(new List<int> { 1, 2, 3 }) )
So, if you are writing a method that accepts a List<MyClass> as an argument, but uses only subset of features, you may want to declare as IList<MyClass> instead for callers' convenience.
Details:
Why array implements IList?
How do arrays in C# partially implement IList<T>?
They may be unpopular, but I am a fan of Arrays in game projects.
- Iteration speed can be important in some cases, foreach on an Array has significantly less overhead if you are not doing much per element
- Adding and removing is not that hard with helper functions
- Its slower, but in cases where you only build it once it may not matter
- In most cases, less extra memory is wasted (only really significant with Arrays of structs)
- Slightly less garbage and pointers and pointer chasing
That being said, I use List far more often than Arrays in practice, but they each have their place.
It would be nice if List where a built in type so that they could optimize out the wrapper and enumeration overhead.
Populating a list is easier than an array. For arrays, you need to know the exact length of data, but for lists, data size can be any. And, you can convert a list into an array.
List<URLDTO> urls = new List<URLDTO>();
urls.Add(new URLDTO() {
key = "wiki",
url = "https://...",
});
urls.Add(new URLDTO()
{
key = "url",
url = "http://...",
});
urls.Add(new URLDTO()
{
key = "dir",
url = "https://...",
});
// convert a list into an array: URLDTO[]
return urls.ToArray();
Keep in mind that with List is not possible to do this:
List<string> arr = new List<string>();
arr.Add("string a");
arr.Add("string b");
arr.Add("string c");
arr.Add("string d");
arr[10] = "new string";
It generates an Exception.
Instead with arrays:
string[] strArr = new string[20];
strArr[0] = "string a";
strArr[1] = "string b";
strArr[2] = "string c";
strArr[3] = "string d";
strArr[10] = "new string";
But with Arrays there is not an automatic data structure resizing. You have to manage it manually or with Array.Resize method.
A trick could be initialize a List with an empty array.
List<string> arr = new List<string>(new string[100]);
arr[10] = "new string";
But in this case if you put a new element using Add method it will be injected in the end of the List.
List<string> arr = new List<string>(new string[100]);
arr[10] = "new string";
arr.Add("bla bla bla"); // this will be in the end of List
It completely depends on the contexts in which the data structure is needed. For example, if you are creating items to be used by other functions or services using List is the perfect way to accomplish it.
Now if you have a list of items and you just want to display them, say on a web page array is the container you need to use.

Cheapest way to copy an IEnumerable<T>?

I've got an IEnumerable<T>, and I need a copy of it. Anything that implements IEnumerable<T> will do just fine. What's the cheapest way to copy it? .ToArray() maybe?
ToArray is not necessarily faster than ToList. Just use ToList.
The point is as long as you don't know the number of elements of the original sequence before enumerating, you end up with resizing an array and adding elements to it like a List<T> does, so ToArray will have to do the same thing a List<T> does anyway. Besides, ToList gives you a List<T> and that's nicer than a raw array.
Of course, if you know the concrete type of the IEnumerable<T> instance, there can be faster methods, but that's not germane to the point.
Side note: using an array (unless you have to) is arguably a micro-optimization and should be avoided most of the time.
Enumerable::ToArray and Enumerable::ToList ultimately use the same technique to receive elements from the source into an internal array buffer and, once the size of that buffer is reached, they will allocate a new buffer double the size, memcpy over and continue adding elements, repeating this process until enumeration over the source is complete. The difference in the end is that ToArray, which uses a Buffer<T> implementation internally, must then allocate an exactly sized Array and copy the elements into it before returning the result. On the other hand, ToList just needs to return the List<T> with a potentially (likely) only partially filled array buffer inside of it.
Both implementations also have an optimization where if the source IEnumerable is an ICollection they will actually allocate the exact right buffer size to begin with using ICollection::Count and then use ICollection::CopyTo from the source to fill their buffers.
In the end you will find that they perform nearly identically in most situations, but the List<T> is technically a "heavier" class to hang on to in the end and the ToArray has that extra allocate + memcpy at the end (if the source isn't an ICollection) to be able to hand back the exactly right sized array. I usually stick with ToList myself unless I know I need to pass the result to something that requires an array like say maybe Task::WaitAll.
I was about to suggest the possibility of using .AsParallel().ToList() if you have TPL at your disposal, but informal testing on my dual-core laptop shows it to be 7x slower than just .ToList(). So, stick with Mehrdad's answer.
The second-to-cheapest way is to say new List<T>(myEnumerable).ToArray(). The cheapest way is to use either .ToArray() (from LINQ) or, if you don't have C# 3.5, to create your own buffer and add to it while doubling its size, then trim it at the end.

Does array resizing invoke the GC?

I looked into the implementation of Array.Resize() and noticed that a new array is created and returned. I'm aiming for zero memory allocation during gameplay and so I need to avoid creating any new reference types. Does resizing an array trigger the Garbage Collector on the previous array? I'm creating my own 2D array resizer, but it essentially functions in the same way as the .NET Resize() method.
If the new array is smaller than the previous one, but excess objects have already been placed back into a generic object pool, will this invoke the GC?
Arrays will constantly be created in my game loop, so I need to try and make it as efficient as possible. I'm trying to create an array pool as such, so that there's no need to keep creating them ingame. However, if the resize method does the same thing, then it makes little sense to not just instantiate a new array instead of having the pool.
Thanks for the help
Array.Resize doesn't actually change the original array at all - anyone who still has a reference to it will be able to use it as before. Therefore there's no optimization possible. Frankly it's a badly named method, IMO :(
From the docs:
This method allocates a new array with
the specified size, copies elements
from the old array to the new one, and
then replaces the old array with the
new one.
So no, it's not going to reuse the original memory or anything like that. It's just creating a shallow copy with a different size.
Yes, using Array.Resize causes a new array to be allocated and the old one to eventually be collected (unless there are still references to it somewhere).
A more low-level array resizer could possibly do some minor optimization in some cases (for example when the array is being made smaller or there happens to be memory available right after the array), but .NET's implementation doesn't do that.
Implicitly yes.
Explicitly no.
Any allocation will eventually be cleaned up by the GC when no more references exist, so yes.
If you want to avoid resizing your arrays, the best thing you could do would be to preallocate with a large enough size to avoid having to reallocate at all. In that case, you might as well just use a collection class with an initial capacity specified in the constructor, such as List.

Categories

Resources