Significant differences in Array vs Array List? [duplicate] - c#

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
When to use ArrayList over array[] in c#?
From the perspective of memory or processor costs, does there appear to be a significant difference between an array and an arrayList object?

An array is a low-level data structure that essentially maps to a region in memory. An ArrayList is a variable length list implemented as an array of object that is re-allocated as the list grows.
ArrayList therefore has some overhead related to managing the size of the internal array, and more overhead related to casting objects to the correct type when you access the list.
Also, storing everything as object means that value types get boxed on write and unboxed on read, which is extremely detrimental to performance. Using List<T>, a similar but strongly-typed variable size list avoids this issue.
In fact, ArrayList is practically deprecated in favor of List<T> since .NET 2.0.

An array is a contiguous block of memory of fixed size, whereas an ArrayList (though you should prefer List since .NET 2.0) wraps an array to provide dynamically-resizable storage.
The "difference" between them being that, as far as they're encapsulated, an ArrayList is resizable, an array isn't. As far as the implementation is concerned: because an ArrayList wraps (and reallocates) arrays it will require more slightly more memory than an array (as it has to know the current number of elements, as opposed to its capacity), furthermore an ArrayList also requires CPU time to reallocate and copy its internal array if it ever reaches its internal capacity.
However, instantiating an ArrayList is no more expensive than allocating an array. The only difference there being the handful of instructions needed to initialize the ArrayList's state. The difference is negligible and not worth worrying about.
You'll find that if you are reallocating an array by yourself as the means of creating a resizable collection then you're better off using ArrayList/List as it has been thoroughly tested.

Related

Memory usage difference between Generic and Non-generic collections in .NET

I read about collections in .NET nowadays. As known, there is some advantages using generic collections over non-generic: they are type-safety and there is no casting, no boxing/unboxing. That's why generic collections have a performance advantage.
If we consider that non-generic collections store every member as object, then we can think that generics have also memory advantage. However, I didn't found any information about memory usage difference.
Can anyone clarify about the point?
If we consider that non-generic collections store every member as object, then we can think that generics have also memory advantage. However, I didn't found any information about memory usage difference. Can anyone clarify about the point?
Sure. Let's consider an ArrayList that contains ints vs a List<int>. Let's suppose there are 1000 ints in each list.
In both, the collection type is a thin wrapper around an array -- hence the name ArrayList. In the case of ArrayList, there's an underlying object[] that contains at least 1000 boxed ints. In the case of List<int>, there's an underlying int[] that contains at least 1000 ints.
Why did I say "at least"? Because both use a double-when-full strategy. If you set the capacity of a collection when you create it then it allocates enough space for that many things. If you don't, then the collection has to guess, and if it guesses wrong and you need more capacity, then it doubles its capacity. So, best case, our collection arrays are exactly the right size. Worst case, they are possibly twice as big as they need to be; there could be room for 2000 objects or 2000 ints in the arrays.
But let's suppose for simplicity that we're lucky and there are about 1000 in each.
To start with, what's the memory burden of just the array? An object[1000] takes up 4000 bytes on a 32 bit system and 8000 bytes on a 64 bit system, just for the references, which are pointer sized. An int[1000] takes up 4000 bytes regardless. (There are also a few extra bytes taken up by array bookkeeping, but these costs are small compared to the marginal costs.)
So already we see that the non-generic solution possibly consumes twice as much memory just for the array. What about the contents of the array?
Well, the thing about value types is they are stored right there in their own variable. There is no additional space beyond those 4000 bytes used to store the 1000 integers; they get packed right into the array. So the additional cost is zero for the generic case.
For the object[] case, each member of the array is a reference, and that reference refers to an object; in this case, a boxed integer. What's the size of a boxed integer?
An unboxed value type doesn't need to store any information about its type, because its type is determined by the type of the storage its in, and that's known to the runtime. A boxed value type needs to somewhere store the type of the thing in the box, and that takes space. It turns out that the bookkeeping overhead for an object in 32 bit .NET is 8 bytes, and 16 on 64 bit systems. That's just the overhead; we of course need 4 bytes for the int. But wait, it gets worse: on 64 bit systems, the box must be aligned to an 8 byte boundary, so we need another 4 bytes of padding on 64 bit systems.
Add it all up: Our int[] takes about 4KB on both 64 and 32 bit systems. Our object[] containing 1000 ints takes about 16KB on 32 bit systems, and 32K on 64 bit systems. So the memory efficiency of an int[] vs an object[] is either 4 or 8 times worse for the non-generic case.
But wait, it gets even worse. That's just size. What about access time?
To access an integer from an array of integers, the runtime must:
verify that the array is valid
verify that the index is valid
fetch the value from the variable at the given index
To access an integer from an array of boxed integers, the runtime must:
verify that the array is valid
verify that the index is valid
fetch the reference from the variable at the given index
verify that the reference is not null
verify that the reference is a boxed integer
extract the integer from the box
That's a lot more steps, so it takes a lot longer.
BUT WAIT IT GETS WORSE.
Modern processors use caches on the chip itself to avoid going back to main memory. An array of 1000 plain integers is highly likely to end up in the cache so that accesses to the first, second, third, etc, members of the array in quick succession are all pulled from the same cache line; this is insanely fast. But boxed integers can be all over the heap, which increases cache misses, which greatly slows down access even further.
Hopefully that sufficiently clarifies your understanding of the boxing penalty.
What about non-boxed types? Is there a significant difference between an array list of strings, and a List<string>?
Here the penalty is much, much smaller, since an object[] and a string[] have similar performance characteristics and memory layouts. The only additional penalty in this case is (1) not catching your bugs until runtime, (2) making the code harder to read and edit, and (3) the slight penalty of a run-time type check.
then we can think that generics have also memory advantage
This assumption is false, it only applies on value-types. So considder this:
new ArrayList { 1, 2, 3 };
This will implicetly cast every integer into object (known as boxing) in order to store it into your ArrayList. This will cause your memory-overhead here, because an object surely is bigger than a simple int.
For reference-types there´s no difference however as there´s no need for boxing.
Using the one or the other shouldn´t be driven bei neither any performance- nor memory-issues. However you should ask yourself what you want to do with the results. In particular if you know the type(s) stored in your collection at compile-time, there´s no reason to not put this information into the compile-process by using the right generic type-argument.
Anyway you should allways use generic collections instead of non-generic ones because of the mentioned type-safety.
EDIT: Your actual question if using a non-generic collection or a generic version is quite pointless: allways use the generic one. But not because of its memory-usage. See this:
ArrayList a = new ArrayList { 1, 2, 3};
vs.
List<object> a = new List<object> { 1, 2, 3 };
Both lists will consume same amount of memory, although the second one is generic. That´s because they both box your integers into object. So the answer to the question has nothing to do with memory.
On te other saying for reference-types there´s no memory-differencee at all:
ArrayList a = new ArrayList { myInstance, anotherInstance }
vs.
List<MyClass> a = new List<MyClass> { myInstance, anotherInstance }
will produce the same memory-outcome. However the second one is far easier to maintain as you can work with the instances directly without casting them.
Lets assume we have this statement :
int valueType = 1;
so now we have a value on the stack as follows :
stack
i = 1
Now consider we do this now :
object boxingObject = valueType;
Now we have two values stored in the memory, the reference for valueType in the stack and the value 1 in the heap:
stack
boxingObject
heap
1
So in case of boxing a value type there will be extra usage for memory as Microsoft docs states :
Boxing a value type allocates an object instance on the heap and copies the value into the new object.
See this link for full information.

Where the List<int> and int[] are allocated? [duplicate]

I'm learning C# and basically know the difference between arrays and Lists that the last is a generic and can dynamically grow but I'm wondering:
are List elements sequentially located in heap like array or is each element located "randomly" in a different locations?
and if that is true, does that affect the speed of access & data retrieval from memory?
and if that is true, is this what makes arrays a little faster than Lists?
Let's see the second and the third questions first:
and if that true does that affect the speed of access & data retrieval from memory ?
and if that true is this what makes array little faster than list ?
There is only a single type of "native" collection in .NET (with .NET I mean the CLR, so the runtime): the array (technically, if you consider a string a type of collection, then there are two native types of collections :-) ) (technically part 2: not all the arrays you think that are arrays are "native" arrays... Only the monodimensional 0 based arrays are "native" arrays. Arrays of type T[,] aren't, and arrays where the first element doesn't have an index of 0 aren't) . Every other collection (other than the LinkedList<>) is built atop it. If you look at the List<T> with IlSpy you'll see that at the base of it there is a T[] with an added int for the Count (the T[].Length is the Capacity). Clearly an array is a little faster than a List<T> because to use it, you have one less indirection (you access the array directly, instead of accessing the array that accesses the list).
Let's see the first question:
does List elements sequentially located in heap like array or each element is located randomly in different locations?
Being based on an array internally, clearly the List<> memorizes its elements like an array, so in a contiguous block of memory (but be aware that with a List<SomeObject> where SomeObject is a reference type, the list is a list of references, not of objects, so the references are put in a contiguous block of memory (we will ignore that with the advanced memory management of computers, the word "contiguous block of memory" isn't exact", it would be better to say "a contiguous block of addresses") )
(yes, even Dictionary<> and HashSet<> are built atop arrays. Conversely a tree-like collection could be built without using an array, because it's more similar to a LinkedList)
Some additional details: there are four groups of instructions in the CIL language (the intermediate language used in compiled .NET programs) that are used with "native" arrays:
Newarr
Ldelem and family Ldelem_*
Stelem and family Stelem_*
ReadOnly (don't ask me its use, I don't know, and the documentation isn't clear)
if you look at OpCodes.Newarr you'll see this comment in the XML documentation:
// Summary:
// Pushes an object reference to a new zero-based, one-dimensional array whose
// elements are of a specific type onto the evaluation stack.
Yes, elements in a List are stored contiguously, just like an array. A List actually uses arrays internally, but that is an implementation detail that you shouldn't really need to be concerned with.
Of course, in order to get the correct impression from that statement, you also have to understand a bit about memory management in .NET. Namely, the difference between value types and reference types, and how objects of those types are stored. Value types will be stored in contiguous memory. With reference types, the references will be stored in contiguous memory, but not the instances themselves.
The advantage of using a List is that the logic inside of the class handles allocating and managing the items for you. You can add elements anywhere, remove elements from anywhere, and grow the entire size of the collection without having to do any extra work. This is, of course, also what makes a List slightly slower than an array. If any reallocation has to happen in order to comply with your request, there'll be a performance hit as a new, larger-sized array is allocated and the elements are copied to it. But it won't be any slower than if you wrote the code to do it manually with a raw array.
If your length requirement is fixed (i.e., you never need to grow/expand the total capacity of the array), you can go ahead and use a raw array. It might even be marginally faster than a List because it avoids the extra overhead and indirection (although that is subject to being optimized out by the JIT compiler).
If you need to be able to dynamically resize the collection, or you need any of the other features provided by the List class, just use a List. The performance difference will be virtually imperceptible.

C# equivalent of C++ vector, with contiguous memory?

What's the C# equivalent of C++ vector?
I am searching for this feature:
To have a dynamic array of contiguously stored memory that has no performance penalty for access vs. standard arrays.
I was searching and they say .NET equivalent to the vector in C++ is the ArrayList, so:
Do ArrayList have that contiguous memory feature?
You could use a List<T> and when T is a value type it will be allocated in contiguous memory which would not be the case if T is a reference type.
Example:
List<int> integers = new List<int>();
integers.Add(1);
integers.Add(4);
integers.Add(7);
int someElement = integers[1];
use List<T>. Internally it uses arrays and arrays do use contiguous memory.
C# has a lot of reference types. Even if a container stores the references contiguously, the objects themselves may be scattered through the heap
First of all, stay away from Arraylist or Hashtable. Those classes are to be considered deprecated, in favor of generics. They are still in the language for legacy purposes.
Now, what you are looking for is the List<T> class. Note that if T is a value type you will have contiguos memory, but not if T is a reference type, for obvious reasons.
It looks like CLR / C# might be getting better support for Vector<> soon.
http://blogs.msdn.com/b/dotnet/archive/2014/04/07/the-jit-finally-proposed-jit-and-simd-are-getting-married.aspx

c# array vs generic list [duplicate]

This question already has answers here:
Array versus List<T>: When to use which?
(16 answers)
Closed 9 years ago.
i basically want to know the differences or advantages in using a generic list instead of an array in the below mentioned scenario
class Employee
{
private string _empName;
public string EmpName
{
get{ return _empName; }
set{ _empName = value; }
}
}
1. Employee[] emp
2. List<Employee> emp
can anyone please tell me the advantages or disadvantages and which one to prefer?
One big difference is that List<Employee> can be expanded (you can call Add on it) or contracted (you can call Remove on it) whereas Employee[] is fixed in size. Thus, Employee[] is tougher to work with unless the need calls for it.
The biggest difference is that arrays can't be made longer or shorter once they're created. List instances, however can have elements added or removed. There are other diffs too (e.g. different sets of methods available) but add/remove is the big difference.
I like List unless there's a really good reason to use an Array, since the flexibility of List is nice and the perf penalty is very small relative to the cost of most other things your code is usually doing.
If you want to dive into a lot of interesting technical detail, check out this StackOverflow thread which delves into the List vs. Array question in more depth.
With the generic list, you can Add / Remove etc cheaply (at least, at the far end). Resizing an array (to add/remove) is more expensive. The obvious downside is that a list has spare capacity so maybe wastes a few bytes - not worth worrying about in most cases, though (and you can trim it).
Generally, prefer lists unless you know your data never changes size.
API-wise, since LINQ there is little to choose between them (i.e. the extra methods on List<T> are largely duplicated by LINQ, so arrays get them for free).
Another advantage is that with a list you don't need to expose a setter:
private readonly List<Foo> items = new List<Foo>();
public List<Foo> Items { get { return items; } }
eliminating a range of null bugs, and allowing you to keep control over the data (especially if you use a different IList<> implementation that supports inspection / validation when changing the contents).
If you are exposing a collection in a public interface the .NET Framework Guidelines advise to use a List rather than T[]. (In fact, a BindingList< T >)
Internally, an array can be more appropriate if you have a collection which is a fixed, known size. Resizing an array is expensive compared to adding an element to the end of a List.
You need to know the size of an array at the time that it is created, but you cannot change its size after it has been created.
So, it uses dynamic memory allocation for the array at creation time. (This differs from static memory allocation as used for C++ arrays, where the size must be known at compile time.)
A list can grow dynamically AFTER it has been created, and it has the .Add() function to do that.
-from MSDN
Generics Vs Array Lists-SO General comparision.
Generic List vs Arrays-SO Why is generic list slower than array?
Which one to prefer? List<T>.
If you know the number of elements array is a good choice. If not use the list. Internally List<T> uses an array of T so the are actually more like than you may think.
With a List, you don't need to know the size of the array beforehand. You can dynamically add new Employee's based on the needs of your implementation.

What are the implications of performing a shallow copy on an array in order to resize it?

If my understanding of deep and shallow copying is correct my question is an impossible one.
If you have an array (a[10]) and perform a shallow copy (b[20]) wouldn't this be impossible as the data in b wouldn't be contiguous?
If i've got this completely wrong could someone advise a fast way to immitate (in c#) c++'s ability to do a realloc in order to resize an array.
NOTE
Im looking at the .Clone() and .Copy() members of the System.Array object.
You can't resize an existing array, however, you can use:
Array.Resize(ref arr, newSize);
This allocates a new array, copies the data from the old array into the new array, and updates the arr variable (which is passed by-ref in this case). Is that what you mean?
However, any other references still pointing at the old array will not be updated. A better option might be to work with List<T> - then you don't need to resize it manually, and you don't have the issue of out-of-date references. You just Add/Remove etc. Generally, you don't tend to use arrays directly very often. They have their uses, but they aren't the default case.
Re your comments;
boxing: List<T> doesn't box. That is one of the points about generics; under the hood, List<T> is a wrapper around T[], so a List<int> has an int[] - no boxing. The older ArrayList is a wrapper around object[], so that does box; of course, boxing isn't as bad as you might assume anyway.
workings of Array.Resize; if I recall, it finds the size of T, then uses Buffer.BlockCopy to blit the contents the actual details are hidden by an internal call - but essentially after allocating a new array it is a blit (memcpy) of the data between the two arrays, so it should be pretty quick; note that for reference-types this only copies the reference, not the object on the heap. However, if you are resizing regularly, List<T> would usually be a lot simpler (and quicker unless you basically re-implement what List<T> does re spare capacity to minimise the number of resizes).

Categories

Resources