Allocation of memory for an Array

Allocation of memory for an Array - c#

All types are derived from the Object class, but the value
types aren’t allocated on the heap. Value type variables actually contain
their values. so how then can these types be stored in arrays and used in
methods that expect reference variables ? Can somebody please explain me how these value types are stored on heap when they are part of an array?

Boxing and Unboxing. Also see Here for info pertaining to arrays specifically (part way down). Note this is for object arrays, a valuetype array (e.g. int[]) doesn't have any (un)boxing.

Have a look at this question:
Arrays, heap and stack and value types
You can pass the instance of a value type to a method expecting an object (ref class). In this case boxing and unboxing happens.
Value type arrays do not require boxing or unboxing!

The CLR handles arrays of value types specially. Of course an array is a reference type which is allocated on the heap, but the value type values are embedded into the heap record (not on the stack).
Similarly, when a reference type class contains a value type field, the value of the field is embedded into the record on the heap..

Value types may be allocated on stack.
This can happen only if they are in parameters or local variables or fields in a another value type which is.
Value types in arrays and fields in classes are stored locally in array or class, instead of pointer being stored there - value types result in more local memory access (performance improvement)
and in case of arrays value n is right after value n-1 in memory, something which is not guaranteed with objects in array of reference types (including boxed values in array of object - also no grantee of continuity). In arrays of reference types it is the references that are continual.

Related

ArrayList vs Generic List On Memory allocation in C#?

I want to clear concepts regarding Memory allocation of ArrayList vs Generic List, if both are value type and if both are reference type. Could any one hhelp to clear out?

The only difference in memory use is when you store a Value type. The ArrayList will have to Box (copy) the value. A boxed value will be placed on the Heap, consuming at least an extra header block (ca 20 bytes).
But this will only be significant when you store many millions of items, not something you do all the time.

They are both reference types. The only difference is that ArrayList is weakly typed. Value types such as int, bool etc that are stored in it are boxed into the object type. Then, you unbox them when you cast each item in the ArrayList.
Because everything is boxed into an object, you can store objects of different types in an ArrayList.
Generic List is strongly typed, that is, it can store objects of the same type. There's no boxing, so it's more efficient.
The boxing process allocates more memory to encapsulate the object into the weak type object.
If you stored only objects of reference types in the ArrayList, then boxing is not used, rather another mechanism is used called reference conversion.

ArrayList is a Reference Type,but not Typesafe and less efficient
List<T> or Generic list is a Reference Type,but is Type Safe and efficient
Here is the SO post on Memory Allocation of Reference Types
How memory is allocated to reference types in C#?

Is Class a Reference Type and Struct a Value Type?

I understand this topic is answered a lot. My question is specific to the way it is said or asked.
So am I right to say, that code written with a class keyword will be on the managed heap and is a reference type, and code that is written with a struct will be on stack and is a value type?

I used to think like this as well. However, I recently had a nice discussion with Jon Skeet (he may provide more details) and he explained me that a value type may be kept on the heap as well. The key is how long will that variable be used. If it's a short-lived value type variable, it will be left only at the stack. However, if it's used many times, the framework will keep it at the heap to save space at the stack.
IMO, the key difference between reference and value types relies on passing the object to another object or method. If it's a reference type, you are simply sharing its reference. If it's a value type, then you are making a copy of it.
About the subject of short x long-lived variable, here is the full picture:
in the Microsoft implementation of C# on the desktop CLR, value types
are stored on the stack when the value is a local variable or
temporary that is not a closed-over local variable of a lambda or
anonymous method, and the method body is not an iterator block, and
the jitter chooses to not enregister the value.
Source: The Truth About Value Types (it's also on the comments)

Any storage location (local variable, parameter, class field, struct field, or array slot) of a reference type will always either hold null, or else will hold a reference to an object on the heap. A storage location of a value type will hold all public and private fields of that type (a primitive value type is internally stored as a structure with one field, which is declared to be of that same primitive type; a little bit of compiler magic is used to recognize when special-case code must be used to work with that type). For every value type there is a corresponding heap-object type which has the same members; an attempt to store a value type in a reference-type storage location will create a new heap object of the appropriate heap type, copy the contents of the value-type fields to those of the new object, and store a reference to that new object in the requested storage location. This process is called "boxing". It's possible to copy the contents of a boxed heap object's fields to those of a value-type storage location, a process called "unboxing". Note that because boxed value types are accessed using reference-type storage locations, they behave like reference types rather than class types. C# tries to pretend that the type of a value-type storage location and the type of a boxed value-type instance are the same type, but the two types behave somewhat differently; pretending that they are the same simply adds confusion.

How is an array of value types stored in .NET object heap?

In .NET, Value type object such as int is stored in memory.
Reference type object requires separate allocations of memory for the reference and object, and the object is stored in .NET object heap.
And Array is created in the heap, so how an array of value types such as int[] stored in the heap? Does it mean value type object can be stored in the heap without boxing?

Yes, you are right. I suggest you read this:
https://ericlippert.com/2010/09/30/the-truth-about-value-types/
It's very very good, and it explains nearly everything you'll ever want to know.

Yes, an array is one way in which a value type value can be stored on the heap without boxing. Another is just having it in a normal class:
public class Foo
{
int value1;
string name;
// etc
}
All the variables associated with an instance of Foo are stored on the heap. The value of value1 is just the int, whereas the value of name is a string reference.
This is why the claim that "value types are stored on the stack, reference types are stored on the heap" is so obviously incorrect.
However, as Eric Lippert is rightly fond of pointing out, the stack/heap distinction is an implementation detail. For example, a future version of the CLR could store some objects on the stack, if it could work out that they wouldn't be needed after the method terminated.

Yes, it means that no boxing is done for reach element, because the entire array as a whole is "boxed" inside an Array object (although that's not what it's called).
There's really no requirement that says a value type has to be boxed before being placed on the heap. You can place a value type on the heap in three ways:
By wrapping it inside a regular object.
By boxing it.
By wrapping it inside an array object.
(There might be more ways but I don't think I've missed any.)

Just think of it this way, the object location in memory is defined by what kind of type it is and where it was declared. If the object is a value type, its value is stored where you declared the variable. If the object is a reference type, its reference is stored where you declared the variable while the actual object instance exists on the heap.
When you declare a local variable, you are declaring the variable on the stack. Therefore a value type's value will be on the stack. A reference type's reference will be on the stack, and the object instance is still on the heap.
If you declare an instance variable within a class (a reference type), you are effectively declaring the instance variables in the heap. A value type's value will be in the heap (in the object instance). A reference type's reference will also be in the heap (in the object instance), the object instance will be elsewhere in the heap.
If you declare an instance variable within a struct (a value type), where it resides depends on where the underlying struct was declared.
In the case of an array of int int[], arrays are reference types and you can think of the int values declared as "fields" to that type so your integers are effectively in the heap.

Array of structs - struct?

If we pass an array of structs as method parameter, in the method body do we have a reference to an array of structs, or a new array of structs?

You'll have a reference to an array of structs.
Array itself is a reference type, so an array of structs will be an object with the values stored inline.
If you pass an array to a method, you pass a reference to the array object. The reference itself is passed by value.

When you declare an array of value types, .NET allocates memory on the heap not stack. So it is always referred to be its reference.
The only exception is stackalloc where a memory area is allocated on the stack and can be used unsafely and it is faster that heap access.

Array is a class in the .net framework so if you create a struct arrays so you will have a reference type ,i am not commenting how and where these will be stored whether it is stack or heap because it is pure implementational details but Microsoft implementation of reference type will go on the HEAP.

Boxing and unboxing with generics

The .NET 1.0 way of creating collection of integers (for example) was:
ArrayList list = new ArrayList();
list.Add(i); /* boxing */
int j = (int)list[0]; /* unboxing */
The penalty of using this is the lack of type safety and performance due to boxing and unboxing.
The .NET 2.0 way is to use generics:
List<int> list = new List<int>();
list.Add(i);
int j = list[0];
The price of boxing (to my understanding) is the need to create an object on the heap, copy the stack allocated integer to the new object and vice-versa for unboxing.
How does the use of generics overcome this? Does the stack-allocated integer stays on the stack and being pointed to from the heap (I guess this is not the case because of what will happen when it will get out of scope)? It seems like there is still a need of copying it somewhere else out of the stack.
What is really going on?

When it comes to collections, generics make it possible to avoid boxing/unboxing by utilizing actual T[] arrays internally. List<T> for example uses a T[] array to store its contents.
The array, of course, is a reference type and is therefore (in the current version of the CLR, yada yada) stored on the heap. But since it's a T[] and not an object[], the array's elements can be stored "directly": that is, they're still on the heap, but they're on the heap in the array instead of being boxed and having the array contain references to the boxes.
So for a List<int>, for example, what you'd have in the array would "look" like this:
[ 1 2 3 ]
Compare this to an ArrayList, which uses an object[] and would therefore "look" something like this:
[ *a *b *c ]
...where *a, etc. are references to objects (boxed integers):
*a -> 1
*b -> 2
*c -> 3
Excuse those crude illustrations; hopefully you know what I mean.

Your confusion is a result of misunderstanding what the relationship is between the stack, the heap, and variables. Here's the correct way to think about it.
A variable is a storage location that has a type.
The lifetime of a variable can either be short or long. By "short" we mean "until the current function returns or throws" and by "long" we mean "possibly longer than that".
If the type of a variable is a reference type then the contents of the variable is a reference to a long-lived storage location. If the type of a variable is a value type then the contents of the variable is a value.
As an implementation detail, a storage location which is guaranteed to be short-lived can be allocated on the stack. A storage location which might be long-lived is allocated on the heap. Notice that this says nothing about "value types are always allocated on the stack." Value types are not always allocated on the stack:
int[] x = new int[10];
x[1] = 123;
x[1] is a storage location. It is long-lived; it might live longer than this method. Therefore it must be on the heap. The fact that it contains an int is irrelevant.
You correctly say why a boxed int is expensive:
The price of boxing is the need to create an object on the heap, copy the stack allocated integer to the new object and vice-versa for unboxing.
Where you go wrong is to say "the stack allocated integer". It doesn't matter where the integer was allocated. What matters was that its storage contained the integer, instead of containing a reference to a heap location. The price is the need to create the object and do the copy; that's the only cost that is relevant.
So why isn't a generic variable costly? If you have a variable of type T, and T is constructed to be int, then you have a variable of type int, period. A variable of type int is a storage location, and it contains an int. Whether that storage location is on the stack or the heap is completely irrelevant. What is relevant is that the storage location contains an int, instead of containing a reference to something on the heap. Since the storage location contains an int, you do not have to take on the costs of boxing and unboxing: allocating new storage on the heap and copying the int to the new storage.
Is that now clear?

Generics allows the list's internal array to be typed int[] instead of effectively object[], which would require boxing.
Here's what happens without generics:
You call Add(1).
The integer 1 is boxed into an object, which requires a new object to be constructed on the heap.
This object is passed to ArrayList.Add().
The boxed object is stuffed into an object[].
There are three levels of indirection here: ArrayList -> object[] -> object -> int.
With generics:
You call Add(1).
The int 1 is passed to List<int>.Add().
The int is stuffed into an int[].
So there are only two levels of indirection: List<int> -> int[] -> int.
A few other differences:
The non-generic method will require a sum of 8 or 12 bytes (one pointer, one int) to store the value, 4/8 in one allocation and 4 in the other. And this will probably be more due to alignment and padding. The generic method will require only 4 bytes of space in the array.
The non-generic method requires allocating a boxed int; the generic method does not. This is faster and reduces GC churn.
The non-generic method requires casts to extract values. This is not typesafe and it's a bit slower.

An ArrayList only handles the type object so to use this class requires casting to and from object. In the case of value types, this casting involves boxing and unboxing.
When you use a generic list the compiler outputs specialized code for that value type so that the actual values are stored in the list rather than a reference to objects that contain the values. Therefore no boxing is required.
The price of boxing (to my understanding) is the need to create an object on the heap, copy the stack allocated integer to the new object and vice-versa for unboxing.
I think you are assuming that value types are always instantiated on the stack. This is not the case - they can be created either on the heap, on the stack or in registers. For more information about this please see Eric Lippert's article: The Truth About Value Types.

In .NET 1, when the Add method is called:
Space is allocated on the heap; a new reference is made
The contents of the i variable is copied into the reference
A copy of the reference is put at the end of the list
In .NET 2:
A copy of the variable i is passed to the Add method
A copy of that copy is put at the end of the list
Yes, the i variable is still copied (after all, it's a value type, and value types are always copied - even if they're just method parameters). But there's no redundant copy made on the heap.

Why are you thinking in terms of WHERE the values\objects are stored? In C# value types can be stored on stack as well as heap depending upon what the CLR chooses.
Where generics make a difference is WHAT is stored in the collection. In case of ArrayList the collection contains references to boxed objects where as the List<int> contains int values themselves.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.