Boxing and Unboxing [duplicate]

Boxing and Unboxing [duplicate] - c#

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
What is boxing and unboxing and what are the trade offs?
Ok I understand the basic concept of what happens when you box and unbox.
Box throws the value type (stack object) into a System.Object and stores it on the heap
Unbox unpackages that object on the heap holding that value type and throws it back on the stack so it can be used.
Here is what I don't understand:
Why would this need to be done...specific real-world examples
Why is generics so efficient? They say because Generics doesn't need to unbox or box, ok..I don't get why...what's behind that in generics
Why is generics better than lets say other types. Lets say for example other collections?
so all in all I don't understand this in application in the real world in terms of code and then going further how it makes generics better...why it doesn't have to do any of this in the first place when using Generics.

Boxing needs to be done whenever you want to hold an int in an object variable.
A generic collection of ints contains an int[] instead of an object[].
Putting an int into the object[] behind a non-generic collection requires you to box the int.
Putting an int into the int[] behind a generic collection does not invlove any boxing.

Firstly, the stack and heap are implementation details. a value type isnt defined by being on the stack. there is nothing to say that the concept of stack and heap will be used for all systems able to host the CLR:
Link
That aside:
when a value type is boxed, the data in that value type is read, an object is created, and the data is copied to the new object.
if you are boxing all the items in a collection, this is a lot of overhead.
if you have a collection of value types and are iterating over them, this will happen for each read, then the items are then unboxed (the reverse of the process) just to read a value!!
Generic collections are strongly typed to the type being stored in them, and therefore no boxing or unboxing needs to occur.

Here is a response around the unboxing/boxing portion.
I'm not sure how it is implemented in
mono, but generic interfaces will help
because the compiler creates a new
function of the specific type for each
different type used (internally, there
are a few cases where it can utilize
the same generated function). If a
function of the specific type is
generated, there is no need to
box/unbox the type.
This is why the Collections.Generic
library was a big hit at .NET 2.0
because collections no longer required
boxing and became significantly more
efficient.
In regards to why are generics better then other collections outside the boxing/unboxing scope is that they also force type. No longer can you readily toss a collection around which can hold any type. It can prevent bugs at compile time, versus seeing them at run time.

MSDN has a nice article: Boxing and Unboxing (C# Programming Guide)
In relation to simple assignments, boxing and unboxing are computationally expensive processes. When a value type is boxed, a new object must be allocated and constructed. To a lesser degree, the cast required for unboxing is also expensive computationally.
Boxing is used to store value types in the garbage-collected heap. Boxing is an implicit conversion of a value type to the type object or to any interface type implemented by this value type. Boxing a value type allocates an object instance on the heap and copies the value into the new object.
Unboxing is an explicit conversion from the type object to a value type or from an interface type to a value type that implements the interface. An unboxing operation consists of:
Checking the object instance to make sure that it is a boxed value of the given value type.
Copying the value from the instance into the value-type variable.
Check also: Exploring C# Boxing
And read Jeffrey Richter's Type fundamentals. Here Two sample chapters plus full TOC from Jeffrey Richter's "CLR via C#" (Microsoft Press, 2010) he published some time ago.
Also some notes from Jeffrey Richter's book CLR via C#:
It’s possible to convert a value type to a reference type by using a mechanism called boxing.
Internally, here’s what happens when an instance of a value type is boxed:
Memory is allocated from the managed heap. The amount of memory allocated is the
size required by the value type’s fields plus the two additional overhead members (the
type object pointer and the sync block index) required by all objects on the managed
heap.
The value type’s fields are copied to the newly allocated heap memory.
The address of the object is returned. This address is now a reference to an object; the value type is now a reference type. The C# compiler automatically produces the IL code necessary to box a value type instance, but you still need to understand what’s going on internally so that you’re aware of code size and performance issues.
Note. It should be noted that the FCL now includes a new set of generic collection classes that make the non-generic collection classes obsolete. For example, you should use the System.Collections.Generic.List class instead of the System.Collections.ArrayList
class. The generic collection classes offer many improvements over the non-generic equivalents. For example, the API has been cleaned up and improved, and the performance of the collection classes has been greatly improved as well. But one of the biggest improvements is that the generic collection classes allow you to work with collections of value types without requiring that items in the collection be boxed/unboxed. This in itself greatly improves performance because far fewer objects will be created on the managed heap thereby reducing the number of garbage collections required by your application. Furthermore, you will get compile-time type safety, and your source code will be cleaner due to fewer casts. This will all be explained in further detail in Chapter 12,
“Generics.”
I don't want overquote full chapter here. Read his book and you gain some details on process and receive some answers. And BTW, answer to your question quite a few here on SO, around Web and in many books. It is fundamental knowledge you certainly have to understand.

Here is an interesting read from Eric Lippert (The truth about value types):
Link
regarding your statement:
Box throws the value type (stack object) into a System.Object and stores it on the heap Unbox unpackages that object on the heap holding that value type and throws it back on the stack so it can be used.

This needs to be done because at the IL level there are different instructions for value types than for reference types (ldfld vs ldflda , checkout the dissassembly for a method that calls someValueType.ToString() vs someReferenceType.ToString() and you'll see that the instructions are different).
These instructions are not compatible so, when you need to pass a value type to a method as an object, that value needs to be wrapped in a reference type (boxing). This is ineficient because the runtime needs to copy the value type and then create a new boxing type in order to pass one value.
Generics are faster because value types can be stored as values and not references so no boxing is needed. Take ArrayList vs List<int>. If you want to put 1 into an ArrayList, the CLR needs to box the int so that it can be stored in a object[]. List<T> however, uses a T[] to store the list contents so List uses a int[] which means that 1 doesn't need to be boxed in order to put it in the array.

To put it simple boxing and unboxing takes alot of time. Why - beacuse it's faster to use known type from the start then let this handle for runtime.
In colection of objects can contain differnt items : string, int, double, etc. and you must check every time that your operation with variable is corect.
Convert from one type to enother takes time.
Generic are much faster and encourage you to use them, old collections exist for backward compability

Suppose I want to store a bunch of variables of type Long in a List, but the system supported neither value-type generics nor boxing. The way to go about storing such values would be to define a new class "BoxedLong", which held a single field "Value" of type Long. Then to add a value to the list, one would create a new instance of a BoxedLong, set its Value field to the desired value, and store that in the list. To retrieve a value from the list, one would retrieve a BoxedLong object from the list, and take the value from its Value field.
When a value type is passed to something that expects an Object, the above is essentially what happens under the hood, except without the new identifier names.
When using generics with value types, the system doesn't use an value-holder class and pass it to routines which expect to work with objects. Instead, the system creates a new version of the routine that will work with the value type in question. If five different value types are passed to a generic routine, five different versions of the routine will be generated. In general, this will yield more code than would the use of a value-holder class, but the code will have to do less work every time a value is passed in or retrieved. Since most routines will have many values of each type passed in or out, the cost of generating different versions of the routine will be more than recouped by the elimination of boxing/unboxing operations.

Related

ArrayList vs Generic List On Memory allocation in C#?

I want to clear concepts regarding Memory allocation of ArrayList vs Generic List, if both are value type and if both are reference type. Could any one hhelp to clear out?

The only difference in memory use is when you store a Value type. The ArrayList will have to Box (copy) the value. A boxed value will be placed on the Heap, consuming at least an extra header block (ca 20 bytes).
But this will only be significant when you store many millions of items, not something you do all the time.

They are both reference types. The only difference is that ArrayList is weakly typed. Value types such as int, bool etc that are stored in it are boxed into the object type. Then, you unbox them when you cast each item in the ArrayList.
Because everything is boxed into an object, you can store objects of different types in an ArrayList.
Generic List is strongly typed, that is, it can store objects of the same type. There's no boxing, so it's more efficient.
The boxing process allocates more memory to encapsulate the object into the weak type object.
If you stored only objects of reference types in the ArrayList, then boxing is not used, rather another mechanism is used called reference conversion.

ArrayList is a Reference Type,but not Typesafe and less efficient
List<T> or Generic list is a Reference Type,but is Type Safe and efficient
Here is the SO post on Memory Allocation of Reference Types
How memory is allocated to reference types in C#?

How do you store an int or other "C# value types" on the heap (with C#)?

I'm engaged in educating myself about C# via Troelsen's Pro C# book.
I'm familiar with the stack and heap and how C# stores these sorts of things. In C++, whenever we use new we receive a pointer to something on the heap. However, in C# the behavior of new seems different to me:
when used with value types like an int, using new seems to merely call the int default constructor yet the value of such an int would still be stored on the stack
I understand that all objects/structs and such are stored on the heap, regardless of whether or not new is used.
So my question is: how can I instantiate an int on the heap? (And does this have something to do with 'boxing'?)

You can box any value type to System.Object type so it will be stored on the managed heap:
int number = 1;
object locatedOnTheHeap = number;
An other question is why you need this.
This is a classic example from the must-know MSDN paper: Boxing and Unboxing (C# Programming Guide)
When the CLR boxes a value type, it wraps the value inside a
System.Object and stores it on the managed heap.
Boxing is used to store value types in the garbage-collected heap.
Boxing is an implicit conversion of a value type to the type object or
to any interface type implemented by this value type. Boxing a value
type allocates an object instance on the heap and copies the value
into the new object.
.
I understand that all objects/structs and such are stored on the heap
BTW, IIRC sometimes JIT optimizes code so value type objects like of type like int are stored in the CPU registers rather than stack.

I do not know why you would want to do this, however, in theory you could indeed box your value. You would do this by boxing the int into an object (which is a reference type and will be placed on the stack:
object IAmARefSoIWillBeOnHeap = (object)1;
*As sll stated, you do not need the (object) as it will be an implicit cast. This is here merely for academic reasons, to show what is happening.
Here is a good article about reference versus value types, which is the difference of the heap versus the stack

A value type is "allocated" wherever it is declared:
As a local variable, typically on the stack (but to paraphrase Eric Lippert, the stack is an implementation detail, I suggest you read his excellent piece on his blog: The Truth About Value Types.)
As a field in a class, it expands the size of the instance with the size of the value type, and takes up space inside the instance
As such, this code:
var x = new SomeValueType();
does not allocate something on the heap by itself for that value type. If you close over it with an anonymous method or similar, the local variable will be transformed into the field of a class, and an instance of that class will be allocated on the heap, but in this case, the value type will be embedded into that class as a field.
The heap is for instances of reference types.
However, you've touched up something regarding boxing. You can box a value type value to make a copy of it and place that copy on the heap, wrapped in an object.
So this:
object x = new SomeValueType();
would first allocate the value type, then box it into an object, and store the reference to that object in x.

yet the value of such an int would still be stored on the stack
This is not necessarily true. Even when it is true, it's purely an implementation detail, and not part of the specification of the language. The main issue is that the type system does not necessarily correlate to the storage mechanism used by the runtime.
There are many cases where calling new on a struct will still result in an object that isn't on the stack. Boxing is a good example - when you box an object, you're basically pushing it into an object (effectively "copying" it to the heap), and referencing the object. Also, any time you're closing over a value type with a lambda, you'll end up "allocating on the heap."
That being said, I wouldn't focus on this at all - the issue really shouldn't about stack vs. heap in allocations, but rather about value type vs. reference type semantics and usage. As such, I'd strongly recommend reading Eric Lippert's The Truth About Value Types and Jon Skeet's References and Values. Both of these articles focus on the important aspects of struct vs. class semantics instead of necessarily looking at the storage.
As for ways to force the storage of an int on the heap, here are a couple of simple ones:
object one = 1; // Boxing
int two = 2; // Gets closed over, so ends up "on the heap"
Action closeOverTwo = () => { Console.WriteLine(two); }
// Do stuff with two here...
var three = new { Three = 3 }; // Wrap in a value type...

If you want an int on the heap, you can do this:
object o = 4;
But basically, you shouldn't want that. C# is designed for you not to think about such things. Here's a good place to start on that: http://blogs.msdn.com/b/ericlippert/archive/2009/04/27/the-stack-is-an-implementation-detail.aspx

So my question is: how can I instantiate an int on the heap? (And does
this have something to do with 'boxing'?)
Your understanding about objects and structs are correct. When you intialized either an object or a structure it goes on the heap.

What's Wrong with an ArrayList?

Recently I asked a question on SO that had mentioned the possible use of an c# ArrayList for a solution. A comment was made that using an arraylist is bad. I would like to more about this. I have never heard this statement before about arraylists.
could sombody bring me up to speed on the possible performance problems with using arraylists
c#. .net-2

The main problem with ArrayList is that is uses object - it means you have to cast to and from whatever you are encapsulating. It is a remnant of the days before generics and is probably around for backwards compatibility only.
You do not have the type safety with ArrayList that you have with a generic list. The performance issue is in the need to cast objects back to the original (or have implicit boxing happen).
Implicit boxing will happen whenever you use a value type - it will be boxed when put into the ArrayList and unboxed when referenced.
The issue is not just that of performance, but also of readablity and correctness. Since generics came in, this object has become obsolete and would only be needed in .NET 1.0/1.1 code.

If you're storing a value type (int, float, double, etc - or any struct), ArrayList will cause boxing on every storage and unboxing on every element access. This can be a significant hit to performance.
In addition, there is a complete lack of type safety with ArrayList. Since everything is stored as "object", there's an extra burden on you, as a developer, to keep it safe.
In addition, if you want the behavior of storing objects, you can always use List<object>. There is no disadvantage to this over ArrayList, and it has one large (IMO) advantage: It makes your intent (storing an untyped object) clear from the start.
ArrayList really only exists, and should only be used, for .NET 1.1 code. There really is no reason to use it in .NET 2+.

ArrayList is not a generic type so it must store all items you place in it as objects. This is bad for two reasons. First, when putting value types in the ArrayList you force the compiler to box the value type into a reference type which can be costly. Second, you now have to cast everything you pull out of the array list. This is bad since you now need to make sure you know what objects are in there.
List avoids these issues since it is constructed with the proper type.
For example:
List<int> ints = new List<int>();
ints.Add(5); //no boxing
int num = ints[0]; // no casting

The generic List<T> is preferred since it is generic, which provides additional type information and removes the need to box/unbox value types added to it.

In addition to the performance issues, it's a matter of moving errors from runtime to compile time. Casting objects retrieved from ArrayLists must happen at runtime, and any type errors will happen during execution. Using a generic List<> all types are checked during compile time.

All the boxing and unboxing can be expensive and fragile. Microsoft made some nice improvments in terms of typing and performance in .NET 2.0 generics.
Here are some good reads:
Boxing and Unboxing of Value Types : What You Need to Know? at http://www.c-sharpcorner.com/uploadfile/stuart_fujitani/boxnunbox11192005055746am/boxnunbox.aspx
Performance: ArrayList vs List<> at http://allantech.blogspot.com/2007/03/performance-arraylist-vs-list.html

Why do we need struct? (C#)

To use a struct, we need to instantiate the struct and use it just like a class. Then why don't we just create a class in the first place?

A struct is a value type so if you create a copy, it will actually physically copy the data, whereas with a class it will only copy the reference to the data

A major difference between the semantics of class and struct is that structs have value semantics. What is this means is that if you have two variables of the same type, they each have their own copy of the data. Thus if a variable of a given value type is set equal to another (of the same type), operations on one will not affect the other (that is, assignment of value types creates a copy). This is in sharp contrast to reference types.
There are other differences:
Value types are implicitly sealed (it is not possible to derive from a value type).
Value types can not be null.
Value types are given a default constructor that initialzes the value type to its default value.
A variable of a value type is always a value of that type. Contrast this with classes where a variable of type A could refer to a instance of type B if B derives from A.
Because of the difference in semantics, it is inappropriate to refer to structs as "lightweight classes."

All of the reasons I see in other answers are interesting and can be useful, but if you want to read about why they are required (at least by the VM) and why it was a mistake for the JVM to not support them (user-defined value types), read Demystifying Magic: High-level Low-level Programming. As it stands, C# shines in talking about the potential to bring safe, managed code to systems programming. This is also one of the reasons I think the CLI is a superior platform [than the JVM] for mobile computing. A few other reasons are listed in the linked paper.
It's important to note that you'll very rarely, if ever, see an observable performance improvement from using a struct. The garbage collector is extremely fast, and in many cases will actually outperform the structs. When you add in the nuances of them, they're certainly not a first-choice tool. However, when you do need them and have profiler results or system-level constructs to prove it, they get the job done.
Edit: If you wanted an answer of why we need them as opposed to what they do, ^^^

In C#, a struct is a value type, unlike classes which are reference types. This leads to a huge difference in how they are handled, or how they are expected to be used.
You should probably read up on structs from a book. Structs in C# aren't close cousins of class like in C++ or Java.

This is a myth that struct are always created on heap.
Ok it is right that struct is value type and class is reference type. But remember that
1. A Reference Type always goes on the Heap.
2. Value Types go where they were declared.
Now what that second line means is I will explain with below example
Consider the following method
public void DoCalulation()
{
int num;
num=2;
}
Here num is a local variable so it will be created on stack.
Now consider the below example
public class TestClass
{
public int num;
}
public void DoCalulation()
{
TestClass myTestClass = new TestClass ();
myTestClass.num=2;
}
This time num is the num is created on heap.Ya in some cases value types perform more than reference types as they don't require garbage collection.
Also remeber:
The value of a value type is always a value of that type.
The value of a reference type is always a reference.
And you have to think over the issue that if you expect that there will lot be instantiation then that means more heap space yow will deal with ,and more is the work of garbage collector.For that case you can choose structs.

Structs have many different semantics to classes. The differences are many but the primary reasons for their existence are:
They can be explicitly layed out in memmory
this allows certain interop scenarios
They may be allocated on the stack
Making some sorts of high performance code possible in a much simpler fashion

the difference is that a struct is a value-type
I've found them useful in 2 situations
1) Interop - you can specify the memory layout of a struct, so you can guarantee that when you invoke an unmanaged call.
2) Performance - in some (very limited) cases, structs can be faster than classes, In general, this requires structs to be small (I've heard 16 bytes or less) , and not be changed often.

One of the main reasons is that, when used as local variables during a method call, structs are allocated on the stack.
Stack allocation is cheap, but the big difference is that de-allocation is also very cheap. In this situation, the garbage collector doesn't have to track structs -- they're removed when returning from the method that allocated them when the stack frame is popped.
edit - clarified my post re: Jon Skeet's comment.

A struct is a value type (like Int32), whereas a class is a reference type. Structs get created on the stack rather than the heap. Also, when a struct is passed to a method, a copy of the struct is passed, but when a class instance is passed, a reference is passed.
If you need to create your own datatype, say, then a struct is often a better choice than a class as you can use it just like the built-in value types in the .NET framework. There some good struct examples you can read here.

Use cases for boxing a value type in C#?

There are cases when an instance of a
value type needs to be treated as an
instance of a reference type. For
situations like this, a value type
instance can be converted into a
reference type instance through a
process called boxing. When a value
type instance is boxed, storage is
allocated on the heap and the
instance's value is copied into that
space. A reference to this storage is
placed on the stack. The boxed value
is an object, a reference type that
contains the contents of the value
type instance.
Understanding .NET's Common Type System
In Wikipedia there is an example for Java. But in C#, what are some cases where one would have to box a value type? Or would a better/similar question be, why would one want to store a value type on the heap (boxed) rather than on the stack?

In general, you typically will want to avoid boxing your value types.
However, there are rare occurances where this is useful. If you need to target the 1.1 framework, for example, you will not have access to the generic collections. Any use of the collections in .NET 1.1 would require treating your value type as a System.Object, which causes boxing/unboxing.
There are still cases for this to be useful in .NET 2.0+. Any time you want to take advantage of the fact that all types, including value types, can be treated as an object directly, you may need to use boxing/unboxing. This can be handy at times, since it allows you to save any type in a collection (by using object instead of T in a generic collection), but in general, it is better to avoid this, as you're losing type safety. The one case where boxing frequently occurs, though, is when you're using Reflection - many of the calls in reflection will require boxing/unboxing when working with value types, since the type is not known in advance.

There is almost never a good reason to deliberately box a value type. Almost always, the reason to box a value type is to store it in some collection that is not type aware. The old ArrayList, for example, is a collection of objects, which are reference types. The only way to collect, say, integers, is to box them as objects and pass them to ArrayList.
Nowadays, we have generic collections, so this is less of an issue.

Boxing generally happens automatically in .NET when they have to; often when you pass a value type to something that expects a reference type. A common example is string.Format(). When you pass primitive value types to this method, they are boxed as part of the call. So:
int x = 10;
string s = string.Format( "The value of x is {0}", x ); // x is boxed here
This illustrates a simple scenario where a value type (x) is automatically boxed to be passed to a method that expects an object. Generally, you want to avoid boxing value types when possible ... but in some cases it's very useful.
On an interesting aside, when you use generics in .NET, value types are not boxed when used as parameters or members of the type. Which makes generics more efficient than older C# code (such as ArrayList) that treat everything as {object} to be type agnostic. This adds one more reason to use generic collections, like List<T> or Dictionary<T,K> over ArrayList or Hashtable.

I would recommend you 2 nice articles of Eric Lippert
http://blogs.msdn.com/ericlippert/archive/2009/04/27/the-stack-is-an-implementation-detail.aspx
http://blogs.msdn.com/ericlippert/archive/2009/05/04/the-stack-is-an-implementation-detail-part-two.aspx
Here is the quote that I would 100% agree with
Using the stack for locals of value
type is just an optimization that the
CLR performs on your behalf.
The relevant feature of value types is
that they have the semantics of being
copied by value, not that sometimes
their deallocation can be optimized by
the runtime.
In 99% applications developers should not care about why Value types are in stack and not in the heap and what performance gain could we have here. Juts have in mind very simple rules:
Avoid boxing/unboxing when not
necessary, use generics collections.
Most problems occurs not when you
define your own types, but when you
use existing types inproperly
(defined by Microsoft or your
collegues)
Make your value types
simple. If you need to have a struct
with 10-20 fields, I suppose you'ld
better create a class. Imagine, all
that fields will be copied each time
when you occasionally pass it a
function by value...
I don't think it is very useful to have
value types with reference type
fields inside. Like struct with
String and object fields.
Define what type you need depending on
required functionality, not on where
it should be stored. Structs have
limited functionality comparing to
classes, so if struct cannot provide
the required functionality, like
default constructor, define class.
If something can perform any
actions with the data of other
types, it is usually defined as a
class. For structs operations with
different types should be defined
only if you can cast one type to
another. Say you can add int to
double because you can cast int to
double.
If something should be stateless, it is a class.
When you are hesitating, use reference types. :-)
Any rules allows exclusions in special cases, but do not try to over-optimize.
p.s.
I met some ASP.NET developers with 2-3 years experience who doesn't know the difference between stack and heap. :-( I would not hire such a person if I'm an interviewer, but not because boxing/unboxing could be a bottleneck in any of ASP.NET sites I've ever seen.

I think a good example of boxing in c# occurs in the non-generic collections like ArrayList.

One example would when a method takes an object parameter and a value type must be passed in.

Below is some examples of boxing/unboxing
ArrayList ints = new ArrayList();
myInts.Add(1); // boxing
myInts.Add(2); // boxing
int myInt = (int)ints [0]; // unboxing
Console.Write("Value is {0}", myInt); // boxing

One of the situations when this happens is for example if you have method that expect parameter of type object and you are passing in one of the primitive types, int for example. Or if you define parameter as 'ref' of type int.

The code
int x = 42;
Console.Writeline("The value of x is {0}", x );
actually boxes and unboxes because Writeline does an int cast inside. To avoid this you could do
int x = 42;
Console.Writeline("The value of x is {0}", x.ToString());
Beware of subtle bugs!
You can declare your own value types by declaring your own type as struct. Imagine you declare a struct with lots of properties and then put some instances inside an ArrayList. This boxes them of course. Now reference one through the [] operator, casting it to the type and set a property. You just set a property on a copy. The one in the ArrayList is still unmodified.
For this reason, value types must always be immutable, i.e. make all member variables readonly so that they can only be set in the constructor and do not have any mutable types as members.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.