I want to clear concepts regarding Memory allocation of ArrayList vs Generic List, if both are value type and if both are reference type. Could any one hhelp to clear out?
The only difference in memory use is when you store a Value type. The ArrayList will have to Box (copy) the value. A boxed value will be placed on the Heap, consuming at least an extra header block (ca 20 bytes).
But this will only be significant when you store many millions of items, not something you do all the time.
They are both reference types. The only difference is that ArrayList is weakly typed. Value types such as int, bool etc that are stored in it are boxed into the object type. Then, you unbox them when you cast each item in the ArrayList.
Because everything is boxed into an object, you can store objects of different types in an ArrayList.
Generic List is strongly typed, that is, it can store objects of the same type. There's no boxing, so it's more efficient.
The boxing process allocates more memory to encapsulate the object into the weak type object.
If you stored only objects of reference types in the ArrayList, then boxing is not used, rather another mechanism is used called reference conversion.
ArrayList is a Reference Type,but not Typesafe and less efficient
List<T> or Generic list is a Reference Type,but is Type Safe and efficient
Here is the SO post on Memory Allocation of Reference Types
How memory is allocated to reference types in C#?
Related
All types are derived from the Object class, but the value
types aren’t allocated on the heap. Value type variables actually contain
their values. so how then can these types be stored in arrays and used in
methods that expect reference variables ? Can somebody please explain me how these value types are stored on heap when they are part of an array?
Boxing and Unboxing. Also see Here for info pertaining to arrays specifically (part way down). Note this is for object arrays, a valuetype array (e.g. int[]) doesn't have any (un)boxing.
Have a look at this question:
Arrays, heap and stack and value types
You can pass the instance of a value type to a method expecting an object (ref class). In this case boxing and unboxing happens.
Value type arrays do not require boxing or unboxing!
The CLR handles arrays of value types specially. Of course an array is a reference type which is allocated on the heap, but the value type values are embedded into the heap record (not on the stack).
Similarly, when a reference type class contains a value type field, the value of the field is embedded into the record on the heap..
Value types may be allocated on stack.
This can happen only if they are in parameters or local variables or fields in a another value type which is.
Value types in arrays and fields in classes are stored locally in array or class, instead of pointer being stored there - value types result in more local memory access (performance improvement)
and in case of arrays value n is right after value n-1 in memory, something which is not guaranteed with objects in array of reference types (including boxed values in array of object - also no grantee of continuity). In arrays of reference types it is the references that are continual.
I know I can add object of any type to an ArrayList instance. If I get it right, then reference types are casted to objects (value types are boxed). Also, does an ArrayList actually store lists of objects of reference type object?
Internally the ArrayList class uses a fixed size array object[] (object array) for storage. When you add elements those elements are automatically copied to their respective indexes in the array. When the max size is reached a new array is created with a larger size and the elements are recopied. So it's just a convenience wrapper around a static object array.
An ArrayList does not store objects, but merely the references to those objects.
An ArrayList is essentially a wrapper around an object[], with functionality to track space in the array and grow it (double it) as necessary. Note that usually List<T> is preferred, but to answer the question: yes, it just stores the references to the objects, which may well be boxed value-types.
A List<T> is pretty much the same, but around a T[], which means value types can be stored without boxing. Reference-types are still stored as references. You also get more type safety; i.e. you can't add the wrong thing nor cast a retrieved item improperly.
ArrayList is similar to List<object> and was created before .NET had generics.
It can store anything that derives from object, which is all reference and value types. So you could use it to store a lists of objects.
The internal storage for an ArrayList is an object array (object[]).
When storing reference types in the ArrayList, the reference is just cast to object and stored in the array. The reference type instance itself contains information about it's type, so it's possible to cast it back to the actual type when you get it from the ArrayList.
Value types are boxed inside an object, and the reference to that object is stored in the array. The object contains information about what type the value is, so that it can be unboxed correctly when you get it from the ArrayList.
They store any object.
Meaning it's possible to fill them with strings, integers, classes as long as they have been instantiated as an object.
The .NET 1.0 way of creating collection of integers (for example) was:
ArrayList list = new ArrayList();
list.Add(i); /* boxing */
int j = (int)list[0]; /* unboxing */
The penalty of using this is the lack of type safety and performance due to boxing and unboxing.
The .NET 2.0 way is to use generics:
List<int> list = new List<int>();
list.Add(i);
int j = list[0];
The price of boxing (to my understanding) is the need to create an object on the heap, copy the stack allocated integer to the new object and vice-versa for unboxing.
How does the use of generics overcome this? Does the stack-allocated integer stays on the stack and being pointed to from the heap (I guess this is not the case because of what will happen when it will get out of scope)? It seems like there is still a need of copying it somewhere else out of the stack.
What is really going on?
When it comes to collections, generics make it possible to avoid boxing/unboxing by utilizing actual T[] arrays internally. List<T> for example uses a T[] array to store its contents.
The array, of course, is a reference type and is therefore (in the current version of the CLR, yada yada) stored on the heap. But since it's a T[] and not an object[], the array's elements can be stored "directly": that is, they're still on the heap, but they're on the heap in the array instead of being boxed and having the array contain references to the boxes.
So for a List<int>, for example, what you'd have in the array would "look" like this:
[ 1 2 3 ]
Compare this to an ArrayList, which uses an object[] and would therefore "look" something like this:
[ *a *b *c ]
...where *a, etc. are references to objects (boxed integers):
*a -> 1
*b -> 2
*c -> 3
Excuse those crude illustrations; hopefully you know what I mean.
Your confusion is a result of misunderstanding what the relationship is between the stack, the heap, and variables. Here's the correct way to think about it.
A variable is a storage location that has a type.
The lifetime of a variable can either be short or long. By "short" we mean "until the current function returns or throws" and by "long" we mean "possibly longer than that".
If the type of a variable is a reference type then the contents of the variable is a reference to a long-lived storage location. If the type of a variable is a value type then the contents of the variable is a value.
As an implementation detail, a storage location which is guaranteed to be short-lived can be allocated on the stack. A storage location which might be long-lived is allocated on the heap. Notice that this says nothing about "value types are always allocated on the stack." Value types are not always allocated on the stack:
int[] x = new int[10];
x[1] = 123;
x[1] is a storage location. It is long-lived; it might live longer than this method. Therefore it must be on the heap. The fact that it contains an int is irrelevant.
You correctly say why a boxed int is expensive:
The price of boxing is the need to create an object on the heap, copy the stack allocated integer to the new object and vice-versa for unboxing.
Where you go wrong is to say "the stack allocated integer". It doesn't matter where the integer was allocated. What matters was that its storage contained the integer, instead of containing a reference to a heap location. The price is the need to create the object and do the copy; that's the only cost that is relevant.
So why isn't a generic variable costly? If you have a variable of type T, and T is constructed to be int, then you have a variable of type int, period. A variable of type int is a storage location, and it contains an int. Whether that storage location is on the stack or the heap is completely irrelevant. What is relevant is that the storage location contains an int, instead of containing a reference to something on the heap. Since the storage location contains an int, you do not have to take on the costs of boxing and unboxing: allocating new storage on the heap and copying the int to the new storage.
Is that now clear?
Generics allows the list's internal array to be typed int[] instead of effectively object[], which would require boxing.
Here's what happens without generics:
You call Add(1).
The integer 1 is boxed into an object, which requires a new object to be constructed on the heap.
This object is passed to ArrayList.Add().
The boxed object is stuffed into an object[].
There are three levels of indirection here: ArrayList -> object[] -> object -> int.
With generics:
You call Add(1).
The int 1 is passed to List<int>.Add().
The int is stuffed into an int[].
So there are only two levels of indirection: List<int> -> int[] -> int.
A few other differences:
The non-generic method will require a sum of 8 or 12 bytes (one pointer, one int) to store the value, 4/8 in one allocation and 4 in the other. And this will probably be more due to alignment and padding. The generic method will require only 4 bytes of space in the array.
The non-generic method requires allocating a boxed int; the generic method does not. This is faster and reduces GC churn.
The non-generic method requires casts to extract values. This is not typesafe and it's a bit slower.
An ArrayList only handles the type object so to use this class requires casting to and from object. In the case of value types, this casting involves boxing and unboxing.
When you use a generic list the compiler outputs specialized code for that value type so that the actual values are stored in the list rather than a reference to objects that contain the values. Therefore no boxing is required.
The price of boxing (to my understanding) is the need to create an object on the heap, copy the stack allocated integer to the new object and vice-versa for unboxing.
I think you are assuming that value types are always instantiated on the stack. This is not the case - they can be created either on the heap, on the stack or in registers. For more information about this please see Eric Lippert's article: The Truth About Value Types.
In .NET 1, when the Add method is called:
Space is allocated on the heap; a new reference is made
The contents of the i variable is copied into the reference
A copy of the reference is put at the end of the list
In .NET 2:
A copy of the variable i is passed to the Add method
A copy of that copy is put at the end of the list
Yes, the i variable is still copied (after all, it's a value type, and value types are always copied - even if they're just method parameters). But there's no redundant copy made on the heap.
Why are you thinking in terms of WHERE the values\objects are stored? In C# value types can be stored on stack as well as heap depending upon what the CLR chooses.
Where generics make a difference is WHAT is stored in the collection. In case of ArrayList the collection contains references to boxed objects where as the List<int> contains int values themselves.
This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
What is boxing and unboxing and what are the trade offs?
Ok I understand the basic concept of what happens when you box and unbox.
Box throws the value type (stack object) into a System.Object and stores it on the heap
Unbox unpackages that object on the heap holding that value type and throws it back on the stack so it can be used.
Here is what I don't understand:
Why would this need to be done...specific real-world examples
Why is generics so efficient? They say because Generics doesn't need to unbox or box, ok..I don't get why...what's behind that in generics
Why is generics better than lets say other types. Lets say for example other collections?
so all in all I don't understand this in application in the real world in terms of code and then going further how it makes generics better...why it doesn't have to do any of this in the first place when using Generics.
Boxing needs to be done whenever you want to hold an int in an object variable.
A generic collection of ints contains an int[] instead of an object[].
Putting an int into the object[] behind a non-generic collection requires you to box the int.
Putting an int into the int[] behind a generic collection does not invlove any boxing.
Firstly, the stack and heap are implementation details. a value type isnt defined by being on the stack. there is nothing to say that the concept of stack and heap will be used for all systems able to host the CLR:
Link
That aside:
when a value type is boxed, the data in that value type is read, an object is created, and the data is copied to the new object.
if you are boxing all the items in a collection, this is a lot of overhead.
if you have a collection of value types and are iterating over them, this will happen for each read, then the items are then unboxed (the reverse of the process) just to read a value!!
Generic collections are strongly typed to the type being stored in them, and therefore no boxing or unboxing needs to occur.
Here is a response around the unboxing/boxing portion.
I'm not sure how it is implemented in
mono, but generic interfaces will help
because the compiler creates a new
function of the specific type for each
different type used (internally, there
are a few cases where it can utilize
the same generated function). If a
function of the specific type is
generated, there is no need to
box/unbox the type.
This is why the Collections.Generic
library was a big hit at .NET 2.0
because collections no longer required
boxing and became significantly more
efficient.
In regards to why are generics better then other collections outside the boxing/unboxing scope is that they also force type. No longer can you readily toss a collection around which can hold any type. It can prevent bugs at compile time, versus seeing them at run time.
MSDN has a nice article: Boxing and Unboxing (C# Programming Guide)
In relation to simple assignments, boxing and unboxing are computationally expensive processes. When a value type is boxed, a new object must be allocated and constructed. To a lesser degree, the cast required for unboxing is also expensive computationally.
Boxing is used to store value types in the garbage-collected heap. Boxing is an implicit conversion of a value type to the type object or to any interface type implemented by this value type. Boxing a value type allocates an object instance on the heap and copies the value into the new object.
Unboxing is an explicit conversion from the type object to a value type or from an interface type to a value type that implements the interface. An unboxing operation consists of:
Checking the object instance to make sure that it is a boxed value of the given value type.
Copying the value from the instance into the value-type variable.
Check also: Exploring C# Boxing
And read Jeffrey Richter's Type fundamentals. Here Two sample chapters plus full TOC from Jeffrey Richter's "CLR via C#" (Microsoft Press, 2010) he published some time ago.
Also some notes from Jeffrey Richter's book CLR via C#:
It’s possible to convert a value type to a reference type by using a mechanism called boxing.
Internally, here’s what happens when an instance of a value type is boxed:
Memory is allocated from the managed heap. The amount of memory allocated is the
size required by the value type’s fields plus the two additional overhead members (the
type object pointer and the sync block index) required by all objects on the managed
heap.
The value type’s fields are copied to the newly allocated heap memory.
The address of the object is returned. This address is now a reference to an object; the value type is now a reference type. The C# compiler automatically produces the IL code necessary to box a value type instance, but you still need to understand what’s going on internally so that you’re aware of code size and performance issues.
Note. It should be noted that the FCL now includes a new set of generic collection classes that make the non-generic collection classes obsolete. For example, you should use the System.Collections.Generic.List class instead of the System.Collections.ArrayList
class. The generic collection classes offer many improvements over the non-generic equivalents. For example, the API has been cleaned up and improved, and the performance of the collection classes has been greatly improved as well. But one of the biggest improvements is that the generic collection classes allow you to work with collections of value types without requiring that items in the collection be boxed/unboxed. This in itself greatly improves performance because far fewer objects will be created on the managed heap thereby reducing the number of garbage collections required by your application. Furthermore, you will get compile-time type safety, and your source code will be cleaner due to fewer casts. This will all be explained in further detail in Chapter 12,
“Generics.”
I don't want overquote full chapter here. Read his book and you gain some details on process and receive some answers. And BTW, answer to your question quite a few here on SO, around Web and in many books. It is fundamental knowledge you certainly have to understand.
Here is an interesting read from Eric Lippert (The truth about value types):
Link
regarding your statement:
Box throws the value type (stack object) into a System.Object and stores it on the heap Unbox unpackages that object on the heap holding that value type and throws it back on the stack so it can be used.
This needs to be done because at the IL level there are different instructions for value types than for reference types (ldfld vs ldflda , checkout the dissassembly for a method that calls someValueType.ToString() vs someReferenceType.ToString() and you'll see that the instructions are different).
These instructions are not compatible so, when you need to pass a value type to a method as an object, that value needs to be wrapped in a reference type (boxing). This is ineficient because the runtime needs to copy the value type and then create a new boxing type in order to pass one value.
Generics are faster because value types can be stored as values and not references so no boxing is needed. Take ArrayList vs List<int>. If you want to put 1 into an ArrayList, the CLR needs to box the int so that it can be stored in a object[]. List<T> however, uses a T[] to store the list contents so List uses a int[] which means that 1 doesn't need to be boxed in order to put it in the array.
To put it simple boxing and unboxing takes alot of time. Why - beacuse it's faster to use known type from the start then let this handle for runtime.
In colection of objects can contain differnt items : string, int, double, etc. and you must check every time that your operation with variable is corect.
Convert from one type to enother takes time.
Generic are much faster and encourage you to use them, old collections exist for backward compability
Suppose I want to store a bunch of variables of type Long in a List, but the system supported neither value-type generics nor boxing. The way to go about storing such values would be to define a new class "BoxedLong", which held a single field "Value" of type Long. Then to add a value to the list, one would create a new instance of a BoxedLong, set its Value field to the desired value, and store that in the list. To retrieve a value from the list, one would retrieve a BoxedLong object from the list, and take the value from its Value field.
When a value type is passed to something that expects an Object, the above is essentially what happens under the hood, except without the new identifier names.
When using generics with value types, the system doesn't use an value-holder class and pass it to routines which expect to work with objects. Instead, the system creates a new version of the routine that will work with the value type in question. If five different value types are passed to a generic routine, five different versions of the routine will be generated. In general, this will yield more code than would the use of a value-holder class, but the code will have to do less work every time a value is passed in or retrieved. Since most routines will have many values of each type passed in or out, the cost of generating different versions of the routine will be more than recouped by the elimination of boxing/unboxing operations.
There are cases when an instance of a
value type needs to be treated as an
instance of a reference type. For
situations like this, a value type
instance can be converted into a
reference type instance through a
process called boxing. When a value
type instance is boxed, storage is
allocated on the heap and the
instance's value is copied into that
space. A reference to this storage is
placed on the stack. The boxed value
is an object, a reference type that
contains the contents of the value
type instance.
Understanding .NET's Common Type System
In Wikipedia there is an example for Java. But in C#, what are some cases where one would have to box a value type? Or would a better/similar question be, why would one want to store a value type on the heap (boxed) rather than on the stack?
In general, you typically will want to avoid boxing your value types.
However, there are rare occurances where this is useful. If you need to target the 1.1 framework, for example, you will not have access to the generic collections. Any use of the collections in .NET 1.1 would require treating your value type as a System.Object, which causes boxing/unboxing.
There are still cases for this to be useful in .NET 2.0+. Any time you want to take advantage of the fact that all types, including value types, can be treated as an object directly, you may need to use boxing/unboxing. This can be handy at times, since it allows you to save any type in a collection (by using object instead of T in a generic collection), but in general, it is better to avoid this, as you're losing type safety. The one case where boxing frequently occurs, though, is when you're using Reflection - many of the calls in reflection will require boxing/unboxing when working with value types, since the type is not known in advance.
There is almost never a good reason to deliberately box a value type. Almost always, the reason to box a value type is to store it in some collection that is not type aware. The old ArrayList, for example, is a collection of objects, which are reference types. The only way to collect, say, integers, is to box them as objects and pass them to ArrayList.
Nowadays, we have generic collections, so this is less of an issue.
Boxing generally happens automatically in .NET when they have to; often when you pass a value type to something that expects a reference type. A common example is string.Format(). When you pass primitive value types to this method, they are boxed as part of the call. So:
int x = 10;
string s = string.Format( "The value of x is {0}", x ); // x is boxed here
This illustrates a simple scenario where a value type (x) is automatically boxed to be passed to a method that expects an object. Generally, you want to avoid boxing value types when possible ... but in some cases it's very useful.
On an interesting aside, when you use generics in .NET, value types are not boxed when used as parameters or members of the type. Which makes generics more efficient than older C# code (such as ArrayList) that treat everything as {object} to be type agnostic. This adds one more reason to use generic collections, like List<T> or Dictionary<T,K> over ArrayList or Hashtable.
I would recommend you 2 nice articles of Eric Lippert
http://blogs.msdn.com/ericlippert/archive/2009/04/27/the-stack-is-an-implementation-detail.aspx
http://blogs.msdn.com/ericlippert/archive/2009/05/04/the-stack-is-an-implementation-detail-part-two.aspx
Here is the quote that I would 100% agree with
Using the stack for locals of value
type is just an optimization that the
CLR performs on your behalf.
The relevant feature of value types is
that they have the semantics of being
copied by value, not that sometimes
their deallocation can be optimized by
the runtime.
In 99% applications developers should not care about why Value types are in stack and not in the heap and what performance gain could we have here. Juts have in mind very simple rules:
Avoid boxing/unboxing when not
necessary, use generics collections.
Most problems occurs not when you
define your own types, but when you
use existing types inproperly
(defined by Microsoft or your
collegues)
Make your value types
simple. If you need to have a struct
with 10-20 fields, I suppose you'ld
better create a class. Imagine, all
that fields will be copied each time
when you occasionally pass it a
function by value...
I don't think it is very useful to have
value types with reference type
fields inside. Like struct with
String and object fields.
Define what type you need depending on
required functionality, not on where
it should be stored. Structs have
limited functionality comparing to
classes, so if struct cannot provide
the required functionality, like
default constructor, define class.
If something can perform any
actions with the data of other
types, it is usually defined as a
class. For structs operations with
different types should be defined
only if you can cast one type to
another. Say you can add int to
double because you can cast int to
double.
If something should be stateless, it is a class.
When you are hesitating, use reference types. :-)
Any rules allows exclusions in special cases, but do not try to over-optimize.
p.s.
I met some ASP.NET developers with 2-3 years experience who doesn't know the difference between stack and heap. :-( I would not hire such a person if I'm an interviewer, but not because boxing/unboxing could be a bottleneck in any of ASP.NET sites I've ever seen.
I think a good example of boxing in c# occurs in the non-generic collections like ArrayList.
One example would when a method takes an object parameter and a value type must be passed in.
Below is some examples of boxing/unboxing
ArrayList ints = new ArrayList();
myInts.Add(1); // boxing
myInts.Add(2); // boxing
int myInt = (int)ints [0]; // unboxing
Console.Write("Value is {0}", myInt); // boxing
One of the situations when this happens is for example if you have method that expect parameter of type object and you are passing in one of the primitive types, int for example. Or if you define parameter as 'ref' of type int.
The code
int x = 42;
Console.Writeline("The value of x is {0}", x );
actually boxes and unboxes because Writeline does an int cast inside. To avoid this you could do
int x = 42;
Console.Writeline("The value of x is {0}", x.ToString());
Beware of subtle bugs!
You can declare your own value types by declaring your own type as struct. Imagine you declare a struct with lots of properties and then put some instances inside an ArrayList. This boxes them of course. Now reference one through the [] operator, casting it to the type and set a property. You just set a property on a copy. The one in the ArrayList is still unmodified.
For this reason, value types must always be immutable, i.e. make all member variables readonly so that they can only be set in the constructor and do not have any mutable types as members.