Efficiency of List creation from array

Efficiency of List creation from array - c#

I need to create a list starting from an array already created immediately before and will only be converted to the list. So I could harness the array to the list without making a copy, but the constructor makes a copy. I can even understand the motivation for this. However there are cases that I can guarantee that the array does not have and will have no reference to it other than where it was created from.
Is there any way to make this construction more efficient and use the array internally in the list? I know there are implications if I misuse it.
The most obvious example for this is to get the result of a string.Split(). If you need a list your only obvious way out would be to do this conversion. For now I'm not considering writing a method to split directly into a list.

As far as I know, there is no official way to do that but it is still possible using System.Reflection. By looking at the source code of the List<T>, .NET Framework 4.7.2, the two important properties are _items and _size. There is also _version but that one changes only when you modify List<T>. Modification are Add, AddRange, Remove, etc. but also Reverse and Sort. So let's assume this is the same operation as creating the list from IEnumerable<T> where _version stays zero.
public static class ListExtensions
{
public static void SetUnderlyingArray<T>(this List<T> list, T[] array)
{
lock (list)
{
SetInternalArray(list, array);
SetInternalArraySize(list, array.Length);
}
}
private static void SetInternalArraySize<T>(this List<T> list, int size)
{
var prop = list.GetType().GetField(
"_size",
BindingFlags.NonPublic | BindingFlags.Instance);
prop.SetValue(list, size);
}
private static void SetInternalArray<T>(this List<T> list, T[] array)
{
var prop = list.GetType().GetField(
"_items",
BindingFlags.NonPublic | BindingFlags.Instance);
prop.SetValue(list, array);
}
}
and then set the underlying array
int[] array = Enumerable.Repeat(1, 1000000).ToArray();
List<int> list = new List<int>();
list.SetUnderlyingArray(array);
Note This solution is highly dependent on the details of the implementation and might be wrong if something change in the List<T> internals but it gives insight on how it could be accomplished.

If you don't specifically need a List<T> You could create a new class that implements IList<T> and doesn't make a copy.

Related

Replace items in a list -- in place

I'm facing an issue with some simple C# code which I would easily fix in C/C++.
I guess I'm missing something.
I want to do the following (modifying items in a list -- in place):
//pseudocode
void modify<T>(List<T> a) {
foreach(var item in a) {
if(condition(item)) {
item = somethingElse;
}
}
}
I understand that foreach loops on a collection viewed as immutable, so the code above can't work.
I therefore tried the following :
void modify<T>(List<T> a) {
using (var sequenceEnum = a.GetEnumerator())
{
while (sequenceEnum.MoveNext())
{
var m = sequenceEnum.Current;
if(condition(m)) {
sequenceEnum.Current = somethingElse;
}
}
}
}
Naively thinking that Enumerator was some kind of pointer to my Element. Apparently enumerators are also immutable.
In C++ I would write something like that:
template<typename T>
struct Node {
T* value;
Node* next;
}
being then able to modify *value without touching anything in Node and therefore in the parent collection:
Node<T>* current = a->head;
while(current != nullptr) {
if(condition(current->value))
current->value = ...
}
current = current->next;
}
Do I really have to to unsafe code?
Or am i stuck the awfulness of calling subscript inside the loop?

You could also use a simple for loop.
void modify<T>(List<T> a)
{
for (int i = 0; i < a.Count; i++)
{
if (condition(a[i]))
{
a[i] = default(T);
}
}
}

In short - do not modify lists. You can achieve desired effect with
a = a.Select(x => <some logic> ? x : default(T)).ToList()
In general lists in C# are immutable during iteration. You can hovewer use .RemoveAll or similar methods.

As described in documentation here, System.Collections.Generic.List<T> is a generic implementation of the System.Collections.ArrayList which has O(1) complexity in indexed accessor. Very much like C++ std::vector<>, complexity of insertion/addition of elements is unpredictible, but access is time constant (complexity-wise, with respect to caching, etc).
The equivalent of your C++ code snippet would be LinkedList
As for the immutability of your collection during iteration, it is clearly stated in the documentation of GetEnumerator method here. Indeed, during enumeration (within a foreach, or using IEnumerator.MoveNext directly):
Enumerators can be used to read the data in the collection, but they
cannot be used to modify the underlying collection.
Moreover, modifying the list will invalidate the enumerator and usually throw an exception:
An enumerator remains valid as long as the collection remains
unchanged. If changes are made to the collection, such as adding,
modifying, or deleting elements, the enumerator is irrecoverably
invalidated and its behavior is undefined.
I believe this interface contract consistency between various types of collections leads to your misunderstanding: it would possible to implement a mutable list, but it is not required by the interfaces contract.
Imagine you want to implement a list that is mutable during enumeration. Would the enumerator hold a reference to (or a way to retrieve) the entry or the entry itself ? The entry itself would make it unmutable, a reference would be invalid when inserting elements in a linked list for example.
The simple for loop proposed by #Igor seems to be the best way to go if you want to use the standard Collections library. Otherwise, you may need to reimplement it yourself.

Use something like this:
List<T> GetModified<T>(List<T> list, Func<T, bool> condition, Func<T> replacement)
{
return list.Select(m => if (condition(m))
{ return m; }
else
{ return replacement(); }).ToList();
}
Usage:
originalList = GetModified(originalList, i => i.IsAwesome(), null);
But this can also get you into trouble with cross-thread operations. Try to use immutable instances where possible, especially with IEnumerable.
If you really really want to modify the instance of list:
//if you ever want to also remove items, this is magic (why I iterate backwards)
for (int i = list.Count - 1; i >= 0; i--)
{
if (condition)
{
list[i] = whateverYouWant;
}
}

Why there is two completely different version of Reverse for List and IEnumerable?

For the List object, we have a method called Reverse().
It reverse the order of the list 'in place', it doesn't return anything.
For the IEnumerable object, we have an extension method called Reverse().
It returns another IEnumerable.
I need to iterate in reverse order throught a list, so I can't directly use the second method, because I get a List, and I don't want to reverse it, just iterate backwards.
So I can either do this :
for(int i = list.Count - 1; i >=0; i--)
Or
foreach(var item in list.AsEnumerable().Reverse())
I found it less readable than if I have an IEnumerable, just do
foreach(var item in list.Reverse())
I can't understand why this 2 methods have been implemented this way, with the same name. It is pretty annoying and confusing.
Why there is not an extension called BackwardsIterator() in the place of Reverse() working for all IEnumerable?
I'm very interested by the historical reason of this choice, more than the 'how to do it' stuff!

It is worth noting that the list method is a lot older than the extension method. The naming was likely kept the same as Reverse seems more succinct than BackwardsIterator.
If you want to bypass the list version and go to the extension method, you need to treat the list like an IEnumerable<T>:
var numbers = new List<int>();
numbers.Reverse(); // hits list
(numbers as IEnumerable<int>).Reverse(); // hits extension
Or call the extension method as a static method:
Enumerable.Reverse(numbers);
Note that the Enumerable version will need to iterate the underlying enumerable entirely in order to start iterating it in reverse. If you plan on doing this multiple times over the same enumerable, consider permanently reversing the order and iterating it normally.

Write your own BackwardsIterator then!
public static IEnumerable BackwardsIterator(this List lst)
{
for(int i = lst.Count - 1; i >=0; i--)
{
yield return lst[i];
}
}

The existence of List<T>.Reverse long preceded the existence of IEnumerable<T>.Reverse. The reason they are named the same is ... incompetence. It's a horrible botch; clearly the Linq IEnumerable<T> function should have been given a different name ... e.g., Backwards ... since they have quite different semantics. As it is, it lays an awful trap for programmers -- someone might change the type of list from List<T> to, e.g., Collection<T>, and suddenly list.Reverse();, rather than reversing list in place, simply returns an IEnumerable<T> that is discarded. It cannot be overstated just how incompetent it was of MS to give these methods the same name.
To avoid the problem you can define your own extension method
public static IEnumerable<T> Backwards<T>(this IEnumerable<T> source) => source.Reverse();
You can even add a special case for efficient processing of indexable lists:
public static IEnumerable<T> Backwards<T>(this IEnumerable<T> source) =>
source is IList<T> list ? Backwards<T>(list) : source.Reverse();
public static IEnumerable<T> Backwards<T>(this IList<T> list)
{
for (int x = list.Count; --x >= 0;)
yield return list[x];
}

What is the need Indexers in C#

Today I've gone through what indexers are, but I am bit confused. Is there really a need for indexers? What are the advantages of using an indexer..... thanks in advance

I guess the simplest answer is to look at how you'd use (say) List<T> otherwise. Would you rather write:
string foo = list[10];
or
string foo = list.Get(10);
Likewise for dictionaries, would you rather use:
map["foo"] = "bar";
or
map.Put("foo", "bar");
?
Just like properties, there's no real need for them compared with just named methods following a convention... but they make code easier to understand, in my view - and that's one of the most important things a feature can do.

Indexers let you get a reference to an object in a collection without having to traverse the whole collections.
Say you have several thousands of objects, and you need the one before last. Instead of iterating over all of the items in the collection, you simply use the index of the object you want.
Indexers do no have to be integers, so you can use a string, for example, (though you can use any object, so long as the collection supports it) as an indexer - this lets you "name" objects in a collection for later retrieval, also quite useful.

I think zedo got closest to the real reason IMHO that they have added this feature. It's for convenience in the same way that we have properties.
The code is easer to type and easier to read, with a simple abstraction to help you understand.
For instance:
string[] array;
string value = array[0];
List<string> list;
string value = list[0]; //Abstracts the list lookup to a call similar to array.
Dictionary<string, int> map;
int value = map["KeyName"]; //Overloaded with string lookup.

Indexers allow you to reference your class in the same way as an array which is useful when creating a collection class, but giving a class array-like behavior can be useful in other situations as well, such as when dealing with a large file or abstracting a set of finite resources.

yes , they are very use of
you can use indexers to get the indexed object.
Taken from MSDN
Indexers are most frequently implemented in types whose primary purpose is to encapsulate an internal collection or array.
Full Story

for some reason, use indexer can let you create meaningful index to store or map your data. then you can get it from other side by the meaningful index.

using System;
/* Here is a simple program. I think this will help you to understand */
namespace Indexers
{
class Demo
{
int[] a = new int[10];
public int Lengths
{
get
{
return a.Length;
}
}
public int this[int index]
{
get
{
return a[index];
}
set
{
a[index] = value;
}
}
}
class Program
{
static void Main(string[] args)
{
Demo d = new Demo(); // Notice here, this is a simple object
//but you can use this like an array
for (int i = 0; i < d.Lengths; i++)
{
d[i] = i;
}
for (int i = 0; i < d.Lengths; i++)
{
Console.WriteLine(d[i]);
}
Console.ReadKey();
}
}
}
/*Output:
0
1
2
3
4
5
6
7
8
9
*/

In Java what do arrays inherit from? Can I do this?

Sorry for the newbie question, I'm used to C# so my Java framework knowledge is not so good.
I have a couple of arrays:
int[] numbers = new int[10];
String[] names = new String[10];
//populate the arrays
Now I want to make a generic function which will print out the values in these arrays, something like the following (this should work in C#)
private void PrintAll(IEnumerable items)
{
foreach(object item in items)
Console.WriteLine(item.ToString());
}
All I would have to do now is to
PrintAll(names);
PrintAll(numbers);
How can I do this in Java? What is the inheritance tree for the array in Java?
Many thanks
Bones

Arrays only implement Serializable and Cloneable in Java1; so there is no generic way to do this. You'd have to implement a separate method for each type of array (since primitive arrays like int[] cannot be cast to Object[]).
But in this case, you don't have to because Arrays can do it for you:
System.out.println(Arrays.toString(names));
System.out.println(Arrays.toString(numbers));
This will yield something like:
[Tom, Dick, Harry]
[1, 2, 3, 4]
If that's not good enough, you're stuck having to implement a version of your method for each possible array type, like Arrays does.
public static void printAll(Object[] items) {
for (Object o : items)
System.out.println(o);
}
public static void printAll(int[] items) {
for (int i : items)
System.out.println(i);
}
public static void printAll(double[] items) {
for (double d : items)
System.out.println(d);
}
// ...
Note that the above only applies to arrays. Collection implements Iterable, so you can use:
public static <T> void printAll(Iterable<T> items) {
for (T t : items)
System.out.println(t);
}
1 See JLS §10.7 Array Members.

As the other answers state, int[] and String[] have no common superclass that will let you do it. One thing you can do is wrap the arrays in a list before passing them to your PrintAll() function. This is easily done using Arrays.asList(myArray). Then your PrintAll() function can take in a Collection or Iterable and iterate it that way.

You could try the following.
(It won't work for type int as it is a primitive type. You could use the object Integer instead.)
public void print(Object[] objects){
for (Object o: objects){
System.out.println(o);
}
}

Here's a way to find which is the superclass of an array (which is a normal Object)
String[] array = {"just", "a", "test"};
Object obj = array; // not really needed, just as example
System.out.println("class: " + obj.getClass());
System.out.println("super: " + obj.getClass().getSuperclass());
not the solution but answer to the question (title at least).
(I would suggest Arrays.toString as already done by mmyers)

To answer the question as to what class- have a look at the docs. java.lang.Object is the answer.
In terms of things you should know about for iteration- Have a look at the Java enhanced for each statement, and Interface Iterable
Iterable<E>. As others have commented, unfortunately Array does not implement Iterable<E>.

How to initialize a List<T> to a given size (as opposed to capacity)?

.NET offers a generic list container whose performance is almost identical (see Performance of Arrays vs. Lists question). However they are quite different in initialization.
Arrays are very easy to initialize with a default value, and by definition they already have certain size:
string[] Ar = new string[10];
Which allows one to safely assign random items, say:
Ar[5]="hello";
with list things are more tricky. I can see two ways of doing the same initialization, neither of which is what you would call elegant:
List<string> L = new List<string>(10);
for (int i=0;i<10;i++) L.Add(null);
or
string[] Ar = new string[10];
List<string> L = new List<string>(Ar);
What would be a cleaner way?
EDIT: The answers so far refer to capacity, which is something else than pre-populating a list. For example, on a list just created with a capacity of 10, one cannot do L[2]="somevalue"
EDIT 2: People wonder why I want to use lists this way, as it is not the way they are intended to be used. I can see two reasons:
One could quite convincingly argue that lists are the "next generation" arrays, adding flexibility with almost no penalty. Therefore one should use them by default. I'm pointing out they might not be as easy to initialize.
What I'm currently writing is a base class offering default functionality as part of a bigger framework. In the default functionality I offer, the size of the List is known in advanced and therefore I could have used an array. However, I want to offer any base class the chance to dynamically extend it and therefore I opt for a list.

List<string> L = new List<string> ( new string[10] );

I can't say I need this very often - could you give more details as to why you want this? I'd probably put it as a static method in a helper class:
public static class Lists
{
public static List<T> RepeatedDefault<T>(int count)
{
return Repeated(default(T), count);
}
public static List<T> Repeated<T>(T value, int count)
{
List<T> ret = new List<T>(count);
ret.AddRange(Enumerable.Repeat(value, count));
return ret;
}
}
You could use Enumerable.Repeat(default(T), count).ToList() but that would be inefficient due to buffer resizing.
Note that if T is a reference type, it will store count copies of the reference passed for the value parameter - so they will all refer to the same object. That may or may not be what you want, depending on your use case.
EDIT: As noted in comments, you could make Repeated use a loop to populate the list if you wanted to. That would be slightly faster too. Personally I find the code using Repeat more descriptive, and suspect that in the real world the performance difference would be irrelevant, but your mileage may vary.

Use the constructor which takes an int ("capacity") as an argument:
List<string> = new List<string>(10);
EDIT: I should add that I agree with Frederik. You are using the List in a way that goes against the entire reasoning behind using it in the first place.
EDIT2:
EDIT 2: What I'm currently writing is a base class offering default functionality as part of a bigger framework. In the default functionality I offer, the size of the List is known in advanced and therefore I could have used an array. However, I want to offer any base class the chance to dynamically extend it and therefore I opt for a list.
Why would anyone need to know the size of a List with all null values? If there are no real values in the list, I would expect the length to be 0. Anyhow, the fact that this is cludgy demonstrates that it is going against the intended use of the class.

Create an array with the number of items you want first and then convert the array in to a List.
int[] fakeArray = new int[10];
List<int> list = fakeArray.ToList();

If you want to initialize the list with N elements of some fixed value:
public List<T> InitList<T>(int count, T initValue)
{
return Enumerable.Repeat(initValue, count).ToList();
}

Why are you using a List if you want to initialize it with a fixed value ?
I can understand that -for the sake of performance- you want to give it an initial capacity, but isn't one of the advantages of a list over a regular array that it can grow when needed ?
When you do this:
List<int> = new List<int>(100);
You create a list whose capacity is 100 integers. This means that your List won't need to 'grow' until you add the 101th item.
The underlying array of the list will be initialized with a length of 100.

This is an old question, but I have two solutions. One is fast and dirty reflection; the other is a solution that actually answers the question (set the size not the capacity) while still being performant, which none of the answers here do.
Reflection
This is quick and dirty, and should be pretty obvious what the code does. If you want to speed it up, cache the result of GetField, or create a DynamicMethod to do it:
public static void SetSize<T>(this List<T> l, int newSize) =>
l.GetType().GetField("_size", BindingFlags.NonPublic | BindingFlags.Instance).SetValue(l, newSize);
Obviously a lot of people will be hesitant to put such code into production.
ICollection<T>
This solution is based around the fact that the constructor List(IEnumerable<T> collection) optimizes for ICollection<T> and immediately adjusts the size to the correct amount, without iterating it. It then calls the collections CopyTo to do the copy.
The code for the List<T> constructor is as follows:
public List(IEnumerable<T> collection) {
....
ICollection<T> c = collection as ICollection<T>;
if (collection is ICollection<T> c)
{
int count = c.Count;
if (count == 0)
{
_items = s_emptyArray;
}
else {
_items = new T[count];
c.CopyTo(_items, 0);
_size = count;
}
}
So we can completely optimally pre-initialize the List to the correct size, without any extra copying.
How so? By creating an ICollection<T> object that does nothing other than return a Count. Specifically, we will not implement anything in CopyTo which is the only other function called.
private struct SizeCollection<T> : ICollection<T>
{
public SizeCollection(int size) =>
Count = size;
public void Add(T i){}
public void Clear(){}
public bool Contains(T i)=>true;
public void CopyTo(T[]a, int i){}
public bool Remove(T i)=>true;
public int Count {get;}
public bool IsReadOnly=>true;
public IEnumerator<T> GetEnumerator()=>null;
IEnumerator IEnumerable.GetEnumerator()=>null;
}
public List<T> InitializedList<T>(int size) =>
new List<T>(new SizeCollection<T>(size));
We could in theory do the same thing for AddRange/InsertRange for an existing array, which also accounts for ICollection<T>, but the code there creates a new array for the supposed items, then copies them in. In such case, it would be faster to just empty-loop Add:
public void SetSize<T>(this List<T> l, int size)
{
if(size < l.Count)
l.RemoveRange(size, l.Count - size);
else
for(size -= l.Count; size > 0; size--)
l.Add(default(T));
}

Initializing the contents of a list like that isn't really what lists are for. Lists are designed to hold objects. If you want to map particular numbers to particular objects, consider using a key-value pair structure like a hash table or dictionary instead of a list.

You seem to be emphasizing the need for a positional association with your data, so wouldn't an associative array be more fitting?
Dictionary<int, string> foo = new Dictionary<int, string>();
foo[2] = "string";

The accepted answer (the one with the green check mark) has an issue.
The problem:
var result = Lists.Repeated(new MyType(), sizeOfList);
// each item in the list references the same MyType() object
// if you edit item 1 in the list, you are also editing item 2 in the list
I recommend changing the line above to perform a copy of the object. There are many different articles about that:
String.MemberwiseClone() method called through reflection doesn't work, why?
https://code.msdn.microsoft.com/windowsdesktop/CSDeepCloneObject-8a53311e
If you want to initialize every item in your list with the default constructor, rather than NULL, then add the following method:
public static List<T> RepeatedDefaultInstance<T>(int count)
{
List<T> ret = new List<T>(count);
for (var i = 0; i < count; i++)
{
ret.Add((T)Activator.CreateInstance(typeof(T)));
}
return ret;
}

You can use Linq to cleverly initialize your list with a default value. (Similar to David B's answer.)
var defaultStrings = (new int[10]).Select(x => "my value").ToList();
Go one step farther and initialize each string with distinct values "string 1", "string 2", "string 3", etc:
int x = 1;
var numberedStrings = (new int[10]).Select(x => "string " + x++).ToList();

string [] temp = new string[] {"1","2","3"};
List<string> temp2 = temp.ToList();

After thinking again, I had found the non-reflection answer to the OP question, but Charlieface beat me to it. So I believe that the correct and complete answer is https://stackoverflow.com/a/65766955/4572240
My old answer:
If I understand correctly, you want the List<T> version of new T[size], without the overhead of adding values to it.
If you are not afraid the implementation of List<T> will change dramatically in the future (and in this case I believe the probability is close to 0), you can use reflection:
public static List<T> NewOfSize<T>(int size) {
var list = new List<T>(size);
var sizeField = list.GetType().GetField("_size",BindingFlags.Instance|BindingFlags.NonPublic);
sizeField.SetValue(list, size);
return list;
}
Note that this takes into account the default functionality of the underlying array to prefill with the default value of the item type. All int arrays will have values of 0 and all reference type arrays will have values of null. Also note that for a list of reference types, only the space for the pointer to each item is created.
If you, for some reason, decide on not using reflection, I would have liked to offer an option of AddRange with a generator method, but underneath List<T> just calls Insert a zillion times, which doesn't serve.
I would also like to point out that the Array class has a static method called ResizeArray, if you want to go the other way around and start from Array.
To end, I really hate when I ask a question and everybody points out that it's the wrong question. Maybe it is, and thanks for the info, but I would still like an answer, because you have no idea why I am asking it. That being said, if you want to create a framework that has an optimal use of resources, List<T> is a pretty inefficient class for anything than holding and adding stuff to the end of a collection.

A notice about IList:
MSDN IList Remarks:
"IList implementations fall into three categories: read-only, fixed-size, and variable-size. (...). For the generic version of this interface, see
System.Collections.Generic.IList<T>."
IList<T> does NOT inherits from IList (but List<T> does implement both IList<T> and IList), but is always variable-size.
Since .NET 4.5, we have also IReadOnlyList<T> but AFAIK, there is no fixed-size generic List which would be what you are looking for.

This is a sample I used for my unit test. I created a list of class object. Then I used forloop to add 'X' number of objects that I am expecting from the service.
This way you can add/initialize a List for any given size.
public void TestMethod1()
{
var expected = new List<DotaViewer.Interface.DotaHero>();
for (int i = 0; i < 22; i++)//You add empty initialization here
{
var temp = new DotaViewer.Interface.DotaHero();
expected.Add(temp);
}
var nw = new DotaHeroCsvService();
var items = nw.GetHero();
CollectionAssert.AreEqual(expected,items);
}
Hope I was of help to you guys.

A bit late but first solution you proposed seems far cleaner to me : you dont allocate memory twice.
Even List constrcutor needs to loop through array in order to copy it; it doesn't even know by advance there is only null elements inside.
1.
- allocate N
- loop N
Cost: 1 * allocate(N) + N * loop_iteration
2.
- allocate N
- allocate N + loop ()
Cost : 2 * allocate(N) + N * loop_iteration
However List's allocation an loops might be faster since List is a built-in class, but C# is jit-compiled sooo...

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.