C#: ToArray performance [duplicate]

C#: ToArray performance [duplicate] - c#

This question already has answers here:
Is it better to call ToList() or ToArray() in LINQ queries?
(16 answers)
Closed 9 years ago.
Background:
I admit I did not attempt to benchmark this, but I'm curious...
What are the CPU/memory characteristics of the Enumerable.ToArray<T> (and its cousin Enumerable.ToList<T>)?
Since IEnumerable does not advertise in advance how many elements it has, I (perhaps naively) presume ToArray would have to "guess" an initial array size, and then to resize/reallocate the array if the first guess appears to be too small, then to resize it yet again if the second guess appears to be too small etc... Which would give worse-than-linear performance.
I can imagine better approaches involving (hybrid) lists, but this would still require more than one allocation (though not reallocation) and quite bit of copying, though it could be linear overall despite the overhead.
Question:
Is there any "magic" taking place behind the scenes, that avoids the need for this repetitive resizing, and makes ToArray linear in space and time?
More generally, is there an "official" documentation on BCL performance characteristics?

No magic. Resizing happens if required.
Note that it is not always required. If the IEnumerable<T> being .ToArrayed also implements ICollection<T>, then the .Count property is used to pre-allocate the array (making the algorithm linear in space and time.) If not, however, the following (rough) code is executed:
foreach (TElement current in source)
{
if (array == null)
{
array = new TElement[4];
}
else
{
if (array.Length == num)
{
// Doubling happens *here*
TElement[] array2 = new TElement[checked(num * 2)];
Array.Copy(array, 0, array2, 0, num);
array = array2;
}
}
array[num] = current;
num++;
}
Note the doubling when the array fills.
Regardless, it's generally a good practice to avoid calling .ToArray() and .ToList() unless you absolute require it. Interrogating the query directly when needed is often a better choice.

I extracted the code behind .ToArray() method using .NET Reflector:
public static TSource[] ToArray<TSource>(this IEnumerable<TSource> source)
{
if (source == null)
{
throw Error.ArgumentNull("source");
}
Buffer<TSource> buffer = new Buffer<TSource>(source);
return buffer.ToArray();
}
and Buffer.ToArray:
internal TElement[] ToArray()
{
if (this.count == 0)
{
return new TElement[0];
}
if (this.items.Length == this.count)
{
return this.items;
}
TElement[] destinationArray = new TElement[this.count];
Array.Copy(this.items, 0, destinationArray, 0, this.count);
return destinationArray;
}
And inside the Buffer constructor it loops through all elements to calculate the real Count and array of Elements.

IIRC, it uses a doubling algorithm.
Remember that for most types, all you need to store are references. It's not like you're allocating enough memory to copy the entire object (unless of course you're using a lot of structs... tsk tsk).
It's still a good idea to avoid using .ToArray() or .ToList() until the last possible moment. Most of the time you can just keep using IEnumerable<T> all the way up until you either run a foreach loop or assign it to a data source.

Related

Expensive IEnumerable: Any way to prevent multiple enumerations without forcing an immediate enumeration? [duplicate]

This question already has answers here:
Is there an IEnumerable implementation that only iterates over it's source (e.g. LINQ) once?
(4 answers)
Closed 9 months ago.
I have a very large enumeration and am preparing an expensive deferred operation on it (e.g. sorting it). I'm then passing this into a function which may or may not consume the IEnumerable, depending on some logic of its own.
Here's an illustration:
IEnumerable<Order> expensiveEnumerable = fullCatalog.OrderBy(c => Prioritize(c));
MaybeFullFillSomeOrders(expensiveEnumerable);
// Elsewhere... (example use-case for multiple enumerations, not real code)
void MaybeFullFillSomeOrders(IEnumerable<Order> nextUpOrders){
if(notAGoodTime())
return;
foreach(var order in nextUpOrders)
collectSomeInfo(order);
processInfo();
foreach(var order in nextUpOrders) {
maybeFulfill(order);
if(atCapacity())
break;
}
}
I'm would like to prepare my input to the other function such that:
If they do not consume the enumerable, the performance price of sorting is not paid.
This already precludes calling e.g. ToList() or ToArray() on it
If they choose to enumerate multiple times (perhaps not realizing how expensive it would be in this case) I want some defence in place to prevent the multiple enumeration.
Ideally, the result is still an IEnumerable<T>
The best solution I've come up with is to use Lazy<>
var expensive = new Lazy<List<Order>>>(
() => fullCatalog.OrderBy(c => Prioritize(c)).ToList());
This appears to satisfy criteria 1 and 2, but has a couple of drawbacks:
I have to change the interface to all downstream usages to expect a Lazy.
The full list (which in this case was built up from a SelectMany() on serveral smaller partitions) would need to be allocated as a new single contiguous list in memory. I'm not sure there's an easy way around this if I want to "cache" the sort result, but if you know of one I'm all ears.
One idea I had to solve the first problem was to wrap Lazy<> in some custom class that either implements or can implicitly be converted to an IEnumerable<T>, but I'm hoping someone knows of a more elegant approach.

You certainly could write your own IEnumerable<T> implementation that wraps another one, remembering all the elements it's already seen (and whether it's exhausted or not). If you need it to be thread-safe that becomes trickier, and you'd need to remember that at any time there may be multiple iterators working against the same IEnumerable<T>.
Fundamentally I think it would come down to working out what to do when asked for the next element (which is somewhat-annoyingly split into MoveNext() and Current, but that can probably be handled...):
If you've already read the next element within another iterator, you can yield it from your buffer
If you've already discovered that there is no next element, you can return that immediately
Otherwise, you need to ask the original iterator for the next element, and remember if for all the other wrapped iterators.
The other aspect that's tricky is knowing when to dispose of the underlying IEnumerator<T> - if you don't need to do that, it makes things simpler.
As a very sketchy attempt that I haven't even attempted to compile, and which is definitely not thread-safe, you could try something like this:
public class LazyEnumerable<T> : IEnumerable<T>
{
private readonly IEnumerator<T> iterator;
private List<T> buffer;
private bool completed = false;
public LazyEnumerable(IEnumerable<T> original)
{
// TODO: You could be even lazier, only calling
// GetEnumerator when you first need an element
iterator = original.GetEnumerator();
}
IEnumerator GetEnumerator() => GetEnumerator();
public IEnumerator<T> GetEnumerator()
{
int index = 0;
while (true)
{
// If we already have the element, yield it
if (index < buffer.Count)
{
yield return buffer[index];
}
// If we've yielded everything in the buffer and some
// other iterator has come to the end of the original,
// we're done.
else if (completed)
{
yield break;
}
// Otherwise, see if there's anything left in the original
// iterator.
else
{
bool hasNext = iterator.MoveNext();
if (hasNext)
{
var current = iterator.Current;
buffer.Add(current);
yield return current;
}
else
{
completed = true;
yield break;
}
}
index++;
}
}
}

Why does LINQ orderby consume more memory?

I want to know why orderBy consumes more memory then simply copying the list and sorting.
void printMemoryUsage()
{
long memory = GC.GetTotalMemory(true);
long mb = 1024 * 1024;
Console.WriteLine("memory: " + memory/mb + " MB" );
}
var r = new Random();
var list = Enumerable.Range(0, 20*1024*1024).OrderBy(x => r.Next()).ToList();
printMemoryUsage();
var lsitCopy = list.OrderBy(x => x);
foreach(var v in lsitCopy)
{
printMemoryUsage();
break;
}
Console.ReadKey();
The result I got is:
memory: 128 MB
memory: 288 MB
But copying the list and sorting consume less memory.
void printMemoryUsage()
{
long memory = GC.GetTotalMemory(true);
long mb = 1024 * 1024;
Console.WriteLine("memory: " + memory/mb + " MB" );
}
var r = new Random();
var list = Enumerable.Range(0, 20*1024*1024).OrderBy(x => r.Next()).ToList();
printMemoryUsage();
var lsitCopy = list.ToList();
printMemoryUsage();
lsitCopy.Sort();
printMemoryUsage();
Console.ReadKey();
Results are:
memory: 128 MB
memory: 208 MB
memory: 208 MB
More testing shows that memory consumed by orderBy is twice the list size.

It's a bit unsurprising when you dive into how the two approaches are implemented internally. Take a look at the Reference Source for .NET.
In your second approach where you call the Sort() method on the list, the internal array in the List object is passed to the TrySZSort method that is written in native code outside of C#, which means no work for the garbage collector.
private static extern bool TrySZSort(Array keys, Array items, int left, int right);
Now, in your first approach you're using LINQ to sort the enumerable. What's really happening when you call .OrderBy() is an OrderedEnumerable<T> object is constructed. Just calling OrderBy doesn't sort the list; it is only sorted when it is enumerated by the GetEnumerator method being called. GetEnumerator is implicitly called behind the scenes when you call ToList or when you enumerate over using a construct like foreach.
You're actually sorting the list twice since you're enumerating the list once on this line:
var list = Enumerable.Range(0, 20*1024*1024).OrderBy(x => r.Next()).ToList();
and again when you enumerate via foreach on this line:
var lsitCopy = list.OrderBy(x => x);
foreach(var v in lsitCopy)
Since these LINQ methods are not using native code, they rely on the garbage collector to pick up after them. Each of the classes is also creating a bunch of objects (e.g. OrderedEnumerable creates a Buffer<TElement> with another copy of the array). All of these objects consume RAM.

I had to do some research on this one, and found some interesting information. The default List.Sort function performs an in-place sort (not a second copy), but does some via a call to Array.Sort, which ultimately calls through to TrySZSort, a heavily optimized native, unmanaged CLR function that selects the specific sort algorithm based on the input type, but in most cases performs what's called an Introspective Sort, which combines the best use cases of the QuickSort, HeapSort, and InsertSort for maximum efficiency. This is done in unmanaged code, meaning it's generally faster and more efficient.
If you're interested in going down the rabbit hole, the Array Sort source is here and the TrySZSort implementation is here. Ultimately though, the use of Unmanaged code means the garbage collector doesn't get involved, and thus less memory is used.
The implementation used by OrderBy is a standard Quicksort, and the OrderedEnumerable actually creates a second copy of the keys used in the sort (in your case the only field, though that doesn't have to be the case if you considered a larger class object with a single property or two used as the sorter), leading to exactly what you observed, which is additional usage equal to the size of the collection for the second copy. Assuming you then typed that out to a List or Array (rather than an OrderedEnumerable) and waited for or forced a garbage collection, you should recover most of that memory. The Enumerable.OrderBy method source is here if you want to dig in to it.

The source of extra memory used can be found in the implementation of OrderedEnumerable which is created on the line
IOrderedEnumerable<int> lsitCopy = list.OrderBy(x => x);
OrderedEnumerable is a generic implementation that sorts by any criteria you provide it, which is distinctly different to the implementation of List.Sort which sorts elements only by value. If you follow the coding of OrderedEnumerable you will find it creates a buffer into which your values are copied accounting for an extra 80MB (4*20*1024*1024) of memory. The additional 40MB (2*20*1024*1024) is associated with structures created to sort the list by the keys.
Another thing to note is not only does OrderBy(x => x) result in more memory use it also uses a lot more processing power, calling Sort by my testing is about 6 times faster than using OrderBy(x => x).
The List.Sort() method is backed by a native implementation heavily optimised method for sorting elements by their value, whereas the Linq OrderBy method is far more versatile and consequently less optimised for simply sorting the list by value...
IOrderedEnumerable<TSource> OrderBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
P.S I would suggest you stop using var instead of the actual variable types as it hides valuable information to the reader of the code about how the code is actually functioning. I recommend developers only use the var keyword for anonymous types

Connor answer gave a clue what is happening here. Implementation of OrderedEnumerable makes it clearer. GetEnumerator of OrderedEnumerable is
public IEnumerator<TElement> GetEnumerator() {
Buffer<TElement> buffer = new Buffer<TElement>(source);
if (buffer.count > 0) {
EnumerableSorter<TElement> sorter = GetEnumerableSorter(null);
int[] map = sorter.Sort(buffer.items, buffer.count);
sorter = null;
for (int i = 0; i < buffer.count; i++) yield return buffer.items[map[i]];
}
}
Buffer is another copy of the original data. And Map keeps the mapping of the order. So, if the code is
// memory_foot_print_1
var sortedList = originalList.OrderBy(v=>v)
foreach(var v in sortedList)
{
// memory_foot_print_2
...
}
Here, memory_foot_print_2 will be equal to memory_foot_print_1 + size_of(originalList) + size_of(new int[count_of(originalList)]) (assuming no GC)
Thus, if originalList is a list of ints of size 80Mb, memory_foot_print_2 - memory_foot_print_1 = 80 + 80= 160Mb. And if originalList is a list of logs of size 80Mb, memory_foot_print_2 - memory_foot_print_1 = 80+ 40 (size of map)= 120Mb (assuming int - 4bytes, longs- 8 bytes) which is what I was observing.
It leads to another question if it makes sense to use OrderBy for larger objects.

How to truncate an array in place in C#

I mean is it really possible? MSDN says that arrays are fixed-size and the only way to resize is "copy-to-new-place". But maybe it is possible with unsafe/some magic with internal CLR structures, they all are written in C++ where we have a full memory control and can call realloc and so on.
I have no code provided for this question, because I don't even know if it can exist.
I'm not talking about Array.Resize methods and so on, because they obviosly do not have needed behaviour.
Assume that we have a standard x86 process with 2GB ram, and I have 1.9GB filled by single array. Then I want to release half of it. So I want to write something like:
MagicClass.ResizeArray(ref arr, n)
And do not get OutOfMemoryException. Array.Resize will try to allocate another gigabyte of RAM and will fail with 1.9+1 > 2GB OutOfMemory.

You can try Array.Resize():
int[] myArray = new int[] { 1, 2, 3, 4 };
int myNewSize = 1;
Array.Resize(ref myArray, myNewSize);
// Test: 1
Console.Write(myArray.Length);

realloc will attempt to do the inplace resize - but it reserves the right to copy the whole thing elsewhere and return a pointer that's completely different.
Pretty much the same outward behaviour is exposed by .NET's List<T> class - which you should be using anyway if you find yourself changing array sizes often. It hides the actual array reference from you so that the change is propagated throughout all of the references to the same list. As you remove items from the end, only the length of the list changes while the inner array stays the same - avoiding the copying.
It doesn't release the memory (you can always do that explicitly with Capacity = XXX, but that makes a new copy of the array), but then again, unless you're working with large arrays, neither does realloc - and if you're working with large arrays, yada, yada - we've been there :)
realloc doesn't really make sense in the kind of memory model .NET has anyway - the heap is continously collected and compacted over time. So if you're trying to use it to avoid the copies when just trimming an array, while also keeping memory usage low... don't bother. At the next heap compaction, the whole memory above your array is going to be moved to fill in the blanks. Even if it were possible to do the realloc, the only benefit you have over simply copying the array is that you would keep your array in the old-living heap - and that isn't necessarily what you want anyway.

Neither array type in BCL supports what you want. That being said - you can implement your own type that would support what you need. It can be backed by standard array, but would implement own Length and indexer properties, that would 'hide' portion of array from you.
public class MyTruncatableArray<T>
{
private T[] _array;
private int _length;
public MyTruncatableArray(int size)
{
_array = new T[size];
_length = size;
}
public T this[int index]
{
get
{
CheckIndex(index, _length);
return _array[index];
}
set
{
CheckIndex(index, _length);
_array[index] = value;
}
}
public int Length
{
get { return _length; }
set
{
CheckIndex(value);
_length = value;
}
}
private void CheckIndex(int index)
{
this.CheckIndex(index, _array.Length);
}
private void CheckIndex(int index, int maxValue)
{
if (index < 0 || index > maxValue)
{
throw new ArgumentException("New array length must be positive and lower or equal to original size");
}
}
}
It really depend what exactly do need. (E.g. do you need to truncate just so that you can easier use it from your code. Or is perf/GC/memory consumption a concern? If the latter is the case - did you perform any measurements that proves standard Array.Resize method unusable for your case?)

Sugar coated arrays (dynamically resizable and set any element at random)

I want my cake and to eat it.
I like the way Lists in C# dynamically expand when you go beyond the initial capacity of the array. However this is not enough. I want to be able to do something like this:
int[] n = new int[]; // Note how I'm NOT defining how big the array is.
n[5] = 9
Yes, there'll be some sacrifice in speed, because behind the scenes, .NET would need to check to see if the default capacity has been exceeded. If it has, then it could expand the array by 5x or so.
Unfortunately with Lists, you're not really meant to set an arbitrary element, and although it is possible if you do this, it still isn't possible to set say, the fifth element straight away without initially setting the size of the List, let alone have it expand dynamically when trying.
For any solution, I'd like to be able to keep the simple square bracket syntax (rather than using a relatively verbose-looking method call), and have it relatively fast (preferably almost as fast as standard arrays) when it's not expanding the array.

Note that I don't necessarily advocate inheriting List, but if you really want this:
public class MyList<T> : List<T>
{
public T this[int i]
{
get {
while (i >= this.Count) this.Add(default(T));
return base[i];
}
set {
while (i >= this.Count) this.Add(default(T));
base[i] = value;
}
}
}
I'll add that if you expect most of the values of your "array" to remain empty over the life of your program, you'll get much greater efficiency by using a Dictionary<int, T>, especially as the size of the collection grows large.

A simple solution to the problem is to inherit from Dictionary<TKey, TValue> and just use the value generic:
public class MyCoolType<T> : Dictionary<int, T> { }
Then you would be able to use it like:
MyCoolType<int> n = new MyCoolType<int>();
n[5] = 9;
And a note on performance.
For insertions, this is much faster than a list since it does not require you to resize or insert elements at arbitrary positions in an array. List<T> uses an array as a backing field and when you resize it, it is expensive. (Edit: Lists have a default size and its not always that you are resizing it, but when you do, its expensive)
For look-ups, this is very nearly O(1) (source), so comparable to an Array look-up. Lists are O(n), which get progressively slower as you increase the number of contained elements.
Sparsely packing is much more memory efficient than using a List with dense packing as it doesn't require you to use empty items just to reach a specific index.
Other Notes:
In the other solutions, try inserting an item at index 570442959 for example, you'll get an OutOfMemoryException thrown (under 32 bit, but even 64-bit has problems). With this solution you can use any conceivable index that the int type supports, up to int.MaxValue.
Lists don't allow negative indexes, this will.
MyCoolType.Count is the equivalent of the array Length property here.
Here are the results of my performance test:
Inserting 1 million elements into MyList: 29.4294424 seconds
Inserting 1 million elements into CoolType: 0.127499 seconds
Looking up 1 million random elements MyList: 1.6330562 seconds
Looking up 1 million random elements CoolType: 1.304348 seconds
Full source to tests here: http://pastebin.com/kEdLgFaw
Note, to run these tests I had to set to X64 build, debug, and had to add the following to the app.config file:
<runtime>
<gcAllowVeryLargeObjects enabled="true" />
</runtime>

Here is your pi
static public List<int> AddToList(int index,int value, List<int> input)
{
if (index >= input.Count)
{
int[] temparray = new int[index - input.Count + 1];
input.AddRange(temparray);
}
return (input[index] = value);
}

You can define an extension method on List:
public static class ExtensionMethods {
public static void Set<T>(this List<T> list, int index, T element) {
if (index < list.Count) {
list[index] = element;
} else {
for (int i = list.Count; i < index; i++) {
list.Add(default(T));
}
list.Add(element);
}
}
}
and call list.Set(12, 1024) if you want the 12th element to be 1024.

How to initialize a List<T> to a given size (as opposed to capacity)?

.NET offers a generic list container whose performance is almost identical (see Performance of Arrays vs. Lists question). However they are quite different in initialization.
Arrays are very easy to initialize with a default value, and by definition they already have certain size:
string[] Ar = new string[10];
Which allows one to safely assign random items, say:
Ar[5]="hello";
with list things are more tricky. I can see two ways of doing the same initialization, neither of which is what you would call elegant:
List<string> L = new List<string>(10);
for (int i=0;i<10;i++) L.Add(null);
or
string[] Ar = new string[10];
List<string> L = new List<string>(Ar);
What would be a cleaner way?
EDIT: The answers so far refer to capacity, which is something else than pre-populating a list. For example, on a list just created with a capacity of 10, one cannot do L[2]="somevalue"
EDIT 2: People wonder why I want to use lists this way, as it is not the way they are intended to be used. I can see two reasons:
One could quite convincingly argue that lists are the "next generation" arrays, adding flexibility with almost no penalty. Therefore one should use them by default. I'm pointing out they might not be as easy to initialize.
What I'm currently writing is a base class offering default functionality as part of a bigger framework. In the default functionality I offer, the size of the List is known in advanced and therefore I could have used an array. However, I want to offer any base class the chance to dynamically extend it and therefore I opt for a list.

List<string> L = new List<string> ( new string[10] );

I can't say I need this very often - could you give more details as to why you want this? I'd probably put it as a static method in a helper class:
public static class Lists
{
public static List<T> RepeatedDefault<T>(int count)
{
return Repeated(default(T), count);
}
public static List<T> Repeated<T>(T value, int count)
{
List<T> ret = new List<T>(count);
ret.AddRange(Enumerable.Repeat(value, count));
return ret;
}
}
You could use Enumerable.Repeat(default(T), count).ToList() but that would be inefficient due to buffer resizing.
Note that if T is a reference type, it will store count copies of the reference passed for the value parameter - so they will all refer to the same object. That may or may not be what you want, depending on your use case.
EDIT: As noted in comments, you could make Repeated use a loop to populate the list if you wanted to. That would be slightly faster too. Personally I find the code using Repeat more descriptive, and suspect that in the real world the performance difference would be irrelevant, but your mileage may vary.

Use the constructor which takes an int ("capacity") as an argument:
List<string> = new List<string>(10);
EDIT: I should add that I agree with Frederik. You are using the List in a way that goes against the entire reasoning behind using it in the first place.
EDIT2:
EDIT 2: What I'm currently writing is a base class offering default functionality as part of a bigger framework. In the default functionality I offer, the size of the List is known in advanced and therefore I could have used an array. However, I want to offer any base class the chance to dynamically extend it and therefore I opt for a list.
Why would anyone need to know the size of a List with all null values? If there are no real values in the list, I would expect the length to be 0. Anyhow, the fact that this is cludgy demonstrates that it is going against the intended use of the class.

Create an array with the number of items you want first and then convert the array in to a List.
int[] fakeArray = new int[10];
List<int> list = fakeArray.ToList();

If you want to initialize the list with N elements of some fixed value:
public List<T> InitList<T>(int count, T initValue)
{
return Enumerable.Repeat(initValue, count).ToList();
}

Why are you using a List if you want to initialize it with a fixed value ?
I can understand that -for the sake of performance- you want to give it an initial capacity, but isn't one of the advantages of a list over a regular array that it can grow when needed ?
When you do this:
List<int> = new List<int>(100);
You create a list whose capacity is 100 integers. This means that your List won't need to 'grow' until you add the 101th item.
The underlying array of the list will be initialized with a length of 100.

This is an old question, but I have two solutions. One is fast and dirty reflection; the other is a solution that actually answers the question (set the size not the capacity) while still being performant, which none of the answers here do.
Reflection
This is quick and dirty, and should be pretty obvious what the code does. If you want to speed it up, cache the result of GetField, or create a DynamicMethod to do it:
public static void SetSize<T>(this List<T> l, int newSize) =>
l.GetType().GetField("_size", BindingFlags.NonPublic | BindingFlags.Instance).SetValue(l, newSize);
Obviously a lot of people will be hesitant to put such code into production.
ICollection<T>
This solution is based around the fact that the constructor List(IEnumerable<T> collection) optimizes for ICollection<T> and immediately adjusts the size to the correct amount, without iterating it. It then calls the collections CopyTo to do the copy.
The code for the List<T> constructor is as follows:
public List(IEnumerable<T> collection) {
....
ICollection<T> c = collection as ICollection<T>;
if (collection is ICollection<T> c)
{
int count = c.Count;
if (count == 0)
{
_items = s_emptyArray;
}
else {
_items = new T[count];
c.CopyTo(_items, 0);
_size = count;
}
}
So we can completely optimally pre-initialize the List to the correct size, without any extra copying.
How so? By creating an ICollection<T> object that does nothing other than return a Count. Specifically, we will not implement anything in CopyTo which is the only other function called.
private struct SizeCollection<T> : ICollection<T>
{
public SizeCollection(int size) =>
Count = size;
public void Add(T i){}
public void Clear(){}
public bool Contains(T i)=>true;
public void CopyTo(T[]a, int i){}
public bool Remove(T i)=>true;
public int Count {get;}
public bool IsReadOnly=>true;
public IEnumerator<T> GetEnumerator()=>null;
IEnumerator IEnumerable.GetEnumerator()=>null;
}
public List<T> InitializedList<T>(int size) =>
new List<T>(new SizeCollection<T>(size));
We could in theory do the same thing for AddRange/InsertRange for an existing array, which also accounts for ICollection<T>, but the code there creates a new array for the supposed items, then copies them in. In such case, it would be faster to just empty-loop Add:
public void SetSize<T>(this List<T> l, int size)
{
if(size < l.Count)
l.RemoveRange(size, l.Count - size);
else
for(size -= l.Count; size > 0; size--)
l.Add(default(T));
}

Initializing the contents of a list like that isn't really what lists are for. Lists are designed to hold objects. If you want to map particular numbers to particular objects, consider using a key-value pair structure like a hash table or dictionary instead of a list.

You seem to be emphasizing the need for a positional association with your data, so wouldn't an associative array be more fitting?
Dictionary<int, string> foo = new Dictionary<int, string>();
foo[2] = "string";

The accepted answer (the one with the green check mark) has an issue.
The problem:
var result = Lists.Repeated(new MyType(), sizeOfList);
// each item in the list references the same MyType() object
// if you edit item 1 in the list, you are also editing item 2 in the list
I recommend changing the line above to perform a copy of the object. There are many different articles about that:
String.MemberwiseClone() method called through reflection doesn't work, why?
https://code.msdn.microsoft.com/windowsdesktop/CSDeepCloneObject-8a53311e
If you want to initialize every item in your list with the default constructor, rather than NULL, then add the following method:
public static List<T> RepeatedDefaultInstance<T>(int count)
{
List<T> ret = new List<T>(count);
for (var i = 0; i < count; i++)
{
ret.Add((T)Activator.CreateInstance(typeof(T)));
}
return ret;
}

You can use Linq to cleverly initialize your list with a default value. (Similar to David B's answer.)
var defaultStrings = (new int[10]).Select(x => "my value").ToList();
Go one step farther and initialize each string with distinct values "string 1", "string 2", "string 3", etc:
int x = 1;
var numberedStrings = (new int[10]).Select(x => "string " + x++).ToList();

string [] temp = new string[] {"1","2","3"};
List<string> temp2 = temp.ToList();

After thinking again, I had found the non-reflection answer to the OP question, but Charlieface beat me to it. So I believe that the correct and complete answer is https://stackoverflow.com/a/65766955/4572240
My old answer:
If I understand correctly, you want the List<T> version of new T[size], without the overhead of adding values to it.
If you are not afraid the implementation of List<T> will change dramatically in the future (and in this case I believe the probability is close to 0), you can use reflection:
public static List<T> NewOfSize<T>(int size) {
var list = new List<T>(size);
var sizeField = list.GetType().GetField("_size",BindingFlags.Instance|BindingFlags.NonPublic);
sizeField.SetValue(list, size);
return list;
}
Note that this takes into account the default functionality of the underlying array to prefill with the default value of the item type. All int arrays will have values of 0 and all reference type arrays will have values of null. Also note that for a list of reference types, only the space for the pointer to each item is created.
If you, for some reason, decide on not using reflection, I would have liked to offer an option of AddRange with a generator method, but underneath List<T> just calls Insert a zillion times, which doesn't serve.
I would also like to point out that the Array class has a static method called ResizeArray, if you want to go the other way around and start from Array.
To end, I really hate when I ask a question and everybody points out that it's the wrong question. Maybe it is, and thanks for the info, but I would still like an answer, because you have no idea why I am asking it. That being said, if you want to create a framework that has an optimal use of resources, List<T> is a pretty inefficient class for anything than holding and adding stuff to the end of a collection.

A notice about IList:
MSDN IList Remarks:
"IList implementations fall into three categories: read-only, fixed-size, and variable-size. (...). For the generic version of this interface, see
System.Collections.Generic.IList<T>."
IList<T> does NOT inherits from IList (but List<T> does implement both IList<T> and IList), but is always variable-size.
Since .NET 4.5, we have also IReadOnlyList<T> but AFAIK, there is no fixed-size generic List which would be what you are looking for.

This is a sample I used for my unit test. I created a list of class object. Then I used forloop to add 'X' number of objects that I am expecting from the service.
This way you can add/initialize a List for any given size.
public void TestMethod1()
{
var expected = new List<DotaViewer.Interface.DotaHero>();
for (int i = 0; i < 22; i++)//You add empty initialization here
{
var temp = new DotaViewer.Interface.DotaHero();
expected.Add(temp);
}
var nw = new DotaHeroCsvService();
var items = nw.GetHero();
CollectionAssert.AreEqual(expected,items);
}
Hope I was of help to you guys.

A bit late but first solution you proposed seems far cleaner to me : you dont allocate memory twice.
Even List constrcutor needs to loop through array in order to copy it; it doesn't even know by advance there is only null elements inside.
1.
- allocate N
- loop N
Cost: 1 * allocate(N) + N * loop_iteration
2.
- allocate N
- allocate N + loop ()
Cost : 2 * allocate(N) + N * loop_iteration
However List's allocation an loops might be faster since List is a built-in class, but C# is jit-compiled sooo...

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C#: ToArray performance [duplicate] - c#

Related

Expensive IEnumerable: Any way to prevent multiple enumerations without forcing an immediate enumeration? [duplicate]

Why does LINQ orderby consume more memory?

How to truncate an array in place in C#

Sugar coated arrays (dynamically resizable and set any element at random)

How to initialize a List<T> to a given size (as opposed to capacity)?

Categories

Resources