Recently I asked a question on SO that had mentioned the possible use of an c# ArrayList for a solution. A comment was made that using an arraylist is bad. I would like to more about this. I have never heard this statement before about arraylists.
could sombody bring me up to speed on the possible performance problems with using arraylists
c#. .net-2
The main problem with ArrayList is that is uses object - it means you have to cast to and from whatever you are encapsulating. It is a remnant of the days before generics and is probably around for backwards compatibility only.
You do not have the type safety with ArrayList that you have with a generic list. The performance issue is in the need to cast objects back to the original (or have implicit boxing happen).
Implicit boxing will happen whenever you use a value type - it will be boxed when put into the ArrayList and unboxed when referenced.
The issue is not just that of performance, but also of readablity and correctness. Since generics came in, this object has become obsolete and would only be needed in .NET 1.0/1.1 code.
If you're storing a value type (int, float, double, etc - or any struct), ArrayList will cause boxing on every storage and unboxing on every element access. This can be a significant hit to performance.
In addition, there is a complete lack of type safety with ArrayList. Since everything is stored as "object", there's an extra burden on you, as a developer, to keep it safe.
In addition, if you want the behavior of storing objects, you can always use List<object>. There is no disadvantage to this over ArrayList, and it has one large (IMO) advantage: It makes your intent (storing an untyped object) clear from the start.
ArrayList really only exists, and should only be used, for .NET 1.1 code. There really is no reason to use it in .NET 2+.
ArrayList is not a generic type so it must store all items you place in it as objects. This is bad for two reasons. First, when putting value types in the ArrayList you force the compiler to box the value type into a reference type which can be costly. Second, you now have to cast everything you pull out of the array list. This is bad since you now need to make sure you know what objects are in there.
List avoids these issues since it is constructed with the proper type.
For example:
List<int> ints = new List<int>();
ints.Add(5); //no boxing
int num = ints[0]; // no casting
The generic List<T> is preferred since it is generic, which provides additional type information and removes the need to box/unbox value types added to it.
In addition to the performance issues, it's a matter of moving errors from runtime to compile time. Casting objects retrieved from ArrayLists must happen at runtime, and any type errors will happen during execution. Using a generic List<> all types are checked during compile time.
All the boxing and unboxing can be expensive and fragile. Microsoft made some nice improvments in terms of typing and performance in .NET 2.0 generics.
Here are some good reads:
Boxing and Unboxing of Value Types : What You Need to Know? at http://www.c-sharpcorner.com/uploadfile/stuart_fujitani/boxnunbox11192005055746am/boxnunbox.aspx
Performance: ArrayList vs List<> at http://allantech.blogspot.com/2007/03/performance-arraylist-vs-list.html
Related
The documentation for List<T> states:
If a value type is used for type T, the compiler generates an implementation of the List<T> class specifically for that value type. That means a list element of a List<T> object does not have to be boxed before the element can be used...
A question was raised in the comments about exactly what "the compiler" refers to here. That's tangential to the question, which is about what else "the compiler" (whatever that may mean) does this to.
Is this true of any other collection type? If it's only List<T>, is it good practice to always use List<T> for value types even when some other collection like Queue<T> better expresses your intent?
Short answer: use the most appropriate generic type that expresses your intent. Queue<T> is just fine if you want to represent a queue, for example.
Longer answer: quite honestly: that documentation is vague. When it mentions "the compiler", it isn't talking about the C# (build-time) compiler (which translates C# to IL), but rather to the JIT (runtime) compiler, which translates IL to CPU instructions (appropriately for your specific CPU and environment).
The feature it is using here is simply a feature of generics, which applies equally to any generic usage; it isn't specific to List<T> - the same ideas apply to any <T> type (or multi-generic-parameter types, too), including arrays.
The docs are also ... a little wooly and imprecise. The details probably don't really matter to most people, but it doesn't really do this per-T; or at least, not all T. Every value-type T (or permutation involving value-types, for multi-generic-parameter scenarios) gets a bespoke JIT, but all the reference-type T share a single implementation. This is tied to the fact that only value-type usages involve boxing, and that boxing is per-T; for reference-type usages, a type check is sufficient. And even for value-type scenarios, often the box step can be avoided via "constrained" calls.
If I was trying to be generous: the document is perhaps trying to contrast against non-generic collections (which you shouldn't really be using in %current year%), in a way that might make sense to someone more familiar with .NET 1.1; in doing so, they're... very far from precise.
I saw this reply from Jon on Initialize generic object with unknown type:
If you want a single collection to
contain multiple unrelated types of
values, however, you will have to use
List<object>
I'm not comparing ArrayList vs List<>, but ArrayList vs List<object>, as both will be exposing elements of type object. What would be the benefit of using either one in this case?
EDIT: It's no concern for type safety here, since both class is exposing object as its item. One still needs to cast from object to the desired type. I'm more interested in anything other than type safety.
EDIT: Thanks Marc Gravell and Sean for the answer. Sorry, I can only pick 1 as answer, so I'll up vote both.
You'll be able to use the LINQ extension methods directly with List<object>, but not with ArrayList, unless you inject a Cast<object>() / OfType<object> (thanks to IEnumerable<object> vs IEnumerable). That's worth quite a bit, even if you don't need type safety etc.
The speed will be about the same; structs will still be boxed, etc - so there isn't much else to tell them apart. Except that I tend to see ArrayList as "oops, somebody is writing legacy code again..." ;-p
One big benefit to using List<object> is that these days most code is written to use the generic classes/interfaces. I suspect that these days most people would write a method that takes a IList<object> instead of an IList. Since ArrayList doesn't implement IList<object> you wouldn't be able to use an array list in these scenarios.
I tend to think of the non-generic classes/interfaces as legacy code and avoid them whenever possible.
In this case, ArrayList vs. List<Object> then you won't notice any differences in speed. There might be some differences in the actual methods available on each of these, particular in .NET 3.5 and counting extension methods, but that has more to do with ArrayList being somewhat deprecated than anything else.
Yes, besides being typesafe, generic collections might be actually faster.
From the MSDN (http://msdn.microsoft.com/en-us/library/system.collections.generic.aspx)
The System.Collections.Generic
namespace contains interfaces and
classes that define generic
collections, which allow users to
create strongly typed collections that
provide better type safety and
performance than non-generic strongly
typed collections.
Do some benchmarking and you will know what performs best. I guestimate that the difference is very small.
List<> is a typesafe version of ArrayList. It will guarantee that you will get the same object type in the collection.
From what I've read, a design decision was made for certain Collections's Enumerator Types to be mutable structs instead of reference types for performance reasons. List.Enumerator is the most well known.
I was investigating some old code that used arrays, and was surprised to discover that C# Arrays return the type SZGenericArrayEnumerator as their generic enumerator type, which is a reference type.
I am wondering if anyone knows why Array's generic iterator was implemented as a reference type when so many other performance critical collections used mutable structs instead.
From what I've read, a design decision was made for certain Collections's Enumerator Types to be mutable structs instead of reference types for performance reasons.
Good question.
First off, you are correct. Though in general, mutable value types are a bad code smell, in this case they are justified:
The mutation is almost entirely concealed from the user.
It is highly unlikely that anyone is going to use the enumerator in a confusing manner.
The use of a mutable value type actually does solve a realistic performance problem in an extremely common scenario.
I am wondering if anyone knows why Array's generic iterator was implemented as a reference type when so many other performance critical collections used mutable structs instead.
Because if you're the sort of person who is concerned about the performance of enumerating an array then why are you using an enumerator in the first place? It's an array for heaven's sake; just write a for loop that iterates over its indicies like a normal person and never allocate the enumerator. (Or a foreach loop; the C# compiler will rewrite the foreach loop into the equivalent for loop if it knows that the loop collection is an array.)
The only reason why you'd obtain an enumerator from an array in the first place is if you are passing it to a method that takes an IEnumerator<T>, in which case if the enumerator is a struct then you're going to be boxing it anyway. Why take on the expense of making the value type and then boxing it? Just make it a reference type to begin with.
Arrays get some special treatment in the C# compiler. When you use foreach on them, the compiler translates it into a for loop. So there is no performance benefit in using struct enumerators.
List<T> on the other hand is a plain class without any special treatment, so using a struct results in better performance.
This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
What is boxing and unboxing and what are the trade offs?
Ok I understand the basic concept of what happens when you box and unbox.
Box throws the value type (stack object) into a System.Object and stores it on the heap
Unbox unpackages that object on the heap holding that value type and throws it back on the stack so it can be used.
Here is what I don't understand:
Why would this need to be done...specific real-world examples
Why is generics so efficient? They say because Generics doesn't need to unbox or box, ok..I don't get why...what's behind that in generics
Why is generics better than lets say other types. Lets say for example other collections?
so all in all I don't understand this in application in the real world in terms of code and then going further how it makes generics better...why it doesn't have to do any of this in the first place when using Generics.
Boxing needs to be done whenever you want to hold an int in an object variable.
A generic collection of ints contains an int[] instead of an object[].
Putting an int into the object[] behind a non-generic collection requires you to box the int.
Putting an int into the int[] behind a generic collection does not invlove any boxing.
Firstly, the stack and heap are implementation details. a value type isnt defined by being on the stack. there is nothing to say that the concept of stack and heap will be used for all systems able to host the CLR:
Link
That aside:
when a value type is boxed, the data in that value type is read, an object is created, and the data is copied to the new object.
if you are boxing all the items in a collection, this is a lot of overhead.
if you have a collection of value types and are iterating over them, this will happen for each read, then the items are then unboxed (the reverse of the process) just to read a value!!
Generic collections are strongly typed to the type being stored in them, and therefore no boxing or unboxing needs to occur.
Here is a response around the unboxing/boxing portion.
I'm not sure how it is implemented in
mono, but generic interfaces will help
because the compiler creates a new
function of the specific type for each
different type used (internally, there
are a few cases where it can utilize
the same generated function). If a
function of the specific type is
generated, there is no need to
box/unbox the type.
This is why the Collections.Generic
library was a big hit at .NET 2.0
because collections no longer required
boxing and became significantly more
efficient.
In regards to why are generics better then other collections outside the boxing/unboxing scope is that they also force type. No longer can you readily toss a collection around which can hold any type. It can prevent bugs at compile time, versus seeing them at run time.
MSDN has a nice article: Boxing and Unboxing (C# Programming Guide)
In relation to simple assignments, boxing and unboxing are computationally expensive processes. When a value type is boxed, a new object must be allocated and constructed. To a lesser degree, the cast required for unboxing is also expensive computationally.
Boxing is used to store value types in the garbage-collected heap. Boxing is an implicit conversion of a value type to the type object or to any interface type implemented by this value type. Boxing a value type allocates an object instance on the heap and copies the value into the new object.
Unboxing is an explicit conversion from the type object to a value type or from an interface type to a value type that implements the interface. An unboxing operation consists of:
Checking the object instance to make sure that it is a boxed value of the given value type.
Copying the value from the instance into the value-type variable.
Check also: Exploring C# Boxing
And read Jeffrey Richter's Type fundamentals. Here Two sample chapters plus full TOC from Jeffrey Richter's "CLR via C#" (Microsoft Press, 2010) he published some time ago.
Also some notes from Jeffrey Richter's book CLR via C#:
It’s possible to convert a value type to a reference type by using a mechanism called boxing.
Internally, here’s what happens when an instance of a value type is boxed:
Memory is allocated from the managed heap. The amount of memory allocated is the
size required by the value type’s fields plus the two additional overhead members (the
type object pointer and the sync block index) required by all objects on the managed
heap.
The value type’s fields are copied to the newly allocated heap memory.
The address of the object is returned. This address is now a reference to an object; the value type is now a reference type. The C# compiler automatically produces the IL code necessary to box a value type instance, but you still need to understand what’s going on internally so that you’re aware of code size and performance issues.
Note. It should be noted that the FCL now includes a new set of generic collection classes that make the non-generic collection classes obsolete. For example, you should use the System.Collections.Generic.List class instead of the System.Collections.ArrayList
class. The generic collection classes offer many improvements over the non-generic equivalents. For example, the API has been cleaned up and improved, and the performance of the collection classes has been greatly improved as well. But one of the biggest improvements is that the generic collection classes allow you to work with collections of value types without requiring that items in the collection be boxed/unboxed. This in itself greatly improves performance because far fewer objects will be created on the managed heap thereby reducing the number of garbage collections required by your application. Furthermore, you will get compile-time type safety, and your source code will be cleaner due to fewer casts. This will all be explained in further detail in Chapter 12,
“Generics.”
I don't want overquote full chapter here. Read his book and you gain some details on process and receive some answers. And BTW, answer to your question quite a few here on SO, around Web and in many books. It is fundamental knowledge you certainly have to understand.
Here is an interesting read from Eric Lippert (The truth about value types):
Link
regarding your statement:
Box throws the value type (stack object) into a System.Object and stores it on the heap Unbox unpackages that object on the heap holding that value type and throws it back on the stack so it can be used.
This needs to be done because at the IL level there are different instructions for value types than for reference types (ldfld vs ldflda , checkout the dissassembly for a method that calls someValueType.ToString() vs someReferenceType.ToString() and you'll see that the instructions are different).
These instructions are not compatible so, when you need to pass a value type to a method as an object, that value needs to be wrapped in a reference type (boxing). This is ineficient because the runtime needs to copy the value type and then create a new boxing type in order to pass one value.
Generics are faster because value types can be stored as values and not references so no boxing is needed. Take ArrayList vs List<int>. If you want to put 1 into an ArrayList, the CLR needs to box the int so that it can be stored in a object[]. List<T> however, uses a T[] to store the list contents so List uses a int[] which means that 1 doesn't need to be boxed in order to put it in the array.
To put it simple boxing and unboxing takes alot of time. Why - beacuse it's faster to use known type from the start then let this handle for runtime.
In colection of objects can contain differnt items : string, int, double, etc. and you must check every time that your operation with variable is corect.
Convert from one type to enother takes time.
Generic are much faster and encourage you to use them, old collections exist for backward compability
Suppose I want to store a bunch of variables of type Long in a List, but the system supported neither value-type generics nor boxing. The way to go about storing such values would be to define a new class "BoxedLong", which held a single field "Value" of type Long. Then to add a value to the list, one would create a new instance of a BoxedLong, set its Value field to the desired value, and store that in the list. To retrieve a value from the list, one would retrieve a BoxedLong object from the list, and take the value from its Value field.
When a value type is passed to something that expects an Object, the above is essentially what happens under the hood, except without the new identifier names.
When using generics with value types, the system doesn't use an value-holder class and pass it to routines which expect to work with objects. Instead, the system creates a new version of the routine that will work with the value type in question. If five different value types are passed to a generic routine, five different versions of the routine will be generated. In general, this will yield more code than would the use of a value-holder class, but the code will have to do less work every time a value is passed in or retrieved. Since most routines will have many values of each type passed in or out, the cost of generating different versions of the routine will be more than recouped by the elimination of boxing/unboxing operations.
I am running through some tests about using ArrayLists and List.
Speed is very important in my app.
I have tested creating 10000 records in each, finding an item by index and then updating that object for example:
List[i] = newX;
Using the arraylist seems much faster. Is that correct?
UPDATE:
Using the List[i] approach, for my List<T> approach I am using LINQ to find the index eg/
....
int index = base.FindIndex(x=>x.AlpaNumericString = "PopItem");
base[index] = UpdatedItem;
It is definately slower than
ArrayList.IndexOf("PopItem"))
base[index] = UpdatedItem;
A generic List (List<T>) should always be quicker than an ArrayList.
Firstly, an ArrayList is not strongly-typed and accepts types of object, so if you're storing value types in the ArrayList, they are going to be boxed and unboxed every time they are added or accessed.
A Generic List can be defined to accept only (say) int's so therefore no boxing or unboxing needs to occur when adding/accessing elements of the list.
If you're dealing with reference types, you're probably still better off with a Generic List over an ArrayList, since although there's no boxing/unboxing going on, your Generic List is type-safe, and there will be no implicit (or explicit) casts required when retrieving your strongly-typed object from the ArrayList's "collection" of object types.
There may be some edge-cases where an ArrayList is faster performing than a Generic List, however, I (personally) have not yet come across one. Even the MSDN documentation states:
Performance Considerations
In deciding whether to use the
List<(Of <(T>)>) or ArrayList class,
both of which have similar
functionality, remember that the
List<(Of <(T>)>) class performs better
in most cases and is type safe. If a
reference type is used for type T of
the List<(Of <(T>)>) class, the
behavior of the two classes is
identical. However, if a value type is
used for type T, you need to consider
implementation and boxing issues.
If a value type is used for type T,
the compiler generates an
implementation of the List<(Of <(T>)>)
class specifically for that value
type. That means a list element of a
List<(Of <(T>)>) object does not have
to be boxed before the element can be
used, and after about 500 list
elements are created the memory saved
not boxing list elements is greater
than the memory used to generate the
class implementation.
Make certain the value type used for
type T implements the IEquatable<(Of
<(T>)>) generic interface. If not,
methods such as Contains must call the
Object..::.Equals(Object) method,
which boxes the affected list element.
If the value type implements the
IComparable interface and you own the
source code, also implement the
IComparable<(Of <(T>)>) generic
interface to prevent the BinarySearch
and Sort methods from boxing list
elements. If you do not own the source
code, pass an IComparer<(Of <(T>)>)
object to the BinarySearch and Sort
methods
Moreover, I particularly like the very last section of that paragraph, which states:
It is to your advantage to use the type-specific implementation of the List<(Of <(T>)>) class instead of using the ArrayList class or writing a strongly typed wrapper collection yourself. The reason is your implementation must do what the .NET Framework does for you already, and the common language runtime can share Microsoft intermediate language code and metadata, which your implementation cannot.
Touché! :)
Based on your recent edit it seems as though you're not performing a 1:1 comparison here. In the List you have a class object and you're looking for the index based on a property, whereas in the ArrayList you just store the values of that property. If so, this is a severely flawed comparison.
To make it a 1:1 comparison you would add the values to the list only, not the class. Or, you would add the class items to the ArrayList. The former would allow you to use IndexOf on both collections. The latter would entail looping through your entire ArrayList and comparing each item till a match was found (and you could do the same for the List), or overriding object.Equals since ArrayList uses that for comparison.
For an interesting read, I suggest taking a look at Rico Mariani's post: Performance Quiz #7 -- Generics Improvements and Costs -- Solution. Even in that post Rico also emphasizes the need to benchmark different scenarios. No blanket statement is issued about ArrayLists, although the general consensus is to use generic lists for performance, type safety, and having a strongly typed collection.
Another related article is: Why should I use List and not ArrayList.
ArrayList seems faster? According to the documentation ( http://msdn.microsoft.com/en-us/library/6sh2ey19.aspx ) List should be faster when using a value type, and the same speed when using a reference type. ArrayList is slower with value types because it needs to box/unbox the values when you're accessing them.
I would expect them to be about the same if they are value-types. There is an extra cast/type-check for ArrayList, but nothing huge. Of course, List<T> should be preferred. If speed is the primary concern (which it almost always isn't, at least not in this way), then you might also want to profile an array (T[]) - harder (=more expensive) to add/remove, of course - but if you are just querying/assigning by index, it should be the fastest. I have had to resort to arrays for some very localised performance critical work, but 99.95% of the time this is overkill and should be avoided.
For example, for any of the 3 approaches (List<T>/ArrayList/T[]) I would expect the assignment cost to be insignificant to the cost of newing up the new instance to put into the storage.
Marc Gravell touched on this in his anwswer - I think it needs to be stressed.
It is usually a waste of time to prematurely optimize your code!
A better approach is to do a simple, well designed first implementation, and test it with anticipated real world data loads.
Often, you will find that it's "fast enough". (It helps to start out with a clear definition of "fast enough" - e.g. "Must be able to find a single CD in a 10,000 CD collection in 3 seconds or less")
If it's not, put a profiler on it. Almost invariably, the bottle neck will NOT be where you expect.
(I learned this the hard way when I brought a whole app to it's knees with single badly chosen string concatenation)