Boxing and unboxing: when does it come up? - c#

So I understand what boxing and unboxing is. When's it come up in real-world code, or in what examples is it an issue? I can't imagine doing something like this example:
int i = 123;
object o = i; // Boxing
int j = (int)o; // Unboxing
...but that's almost certainly extremely oversimplified and I might have even done boxing/unboxing without knowing it before.

It's much less of an issue now than it was prior to generics. Now, for example, we can use:
List<int> x = new List<int>();
x.Add(10);
int y = x[0];
No boxing or unboxing required at all.
Previously, we'd have had:
ArrayList x = new ArrayList();
x.Add(10); // Boxing
int y = (int) x[0]; // Unboxing
That was my most common experience of boxing and unboxing, at least.
Without generics getting involved, I think I'd probably say that reflection is the most common cause of boxing in the projects I've worked on. The reflection APIs always use "object" for things like the return value for a method - because they have no other way of knowing what to use.
Another cause which could catch you out if you're not aware of it is if you use a value type which implements an interface, and pass that value to another method which has the interface type as its parameter. Again, generics make this less of a problem, but it can be a nasty surprise if you're not aware of it.

Boxing (in my experience) usually occurs in these cases:
A value type is passed to a method that accepts an argument of type Object.
A value type is added to a non-generic collection (like an ArrayList).
Other times you can see boxing and unboxing is when you use reflection as the .NET framework's reflection API makes heavy use of Object.

Boxing/unboxing occurs when a value type (like a struct, int, long) is passed somewhere that accepts a reference type - such as object.
This occurs when you explicitly create a method that takes parameters of type object that will be passed value types. It also comes up when you use the older non-generic collections to store value types (typically primitives).
You will also see boxing occuring when you use String.Format() and pass primitives to it. This is because String.Format() accepts a params object[] - which results in boxing of the additional parameters in the call.
Using reflection to invoke methods can also result in boxing/unboxing, because the reflection APIs have no choice but to return object since the real type is not known at compile time (and the Reflection APIs cannot be generic).
The newer generic collections do not result in boxing/unboxing, and so are preferable to the older collections for this reason (eg ArrayList, Hashtable, etc). Not to mention they are typesafe.
You can avoid boxing concerns by changing methods that accept objects to be generic. For example:
public void string Decorate( object a ) // passing a value type results in boxing
{
return a.ToString() + " Some other value";
}
vs:
public void string Decorate<T>( T a )
{
return a.ToString() + " some other value";
}

Here is a really nasty one :)
SqlCommand cmd = <a command that returns a scalar value stored as int>;
// This code works very well.
int result = (int)cmd.ExecuteScalar();
// This code will throw an exception.
uint result = (uint)cmd.ExecuteScalar();
The second execute fails because it tries to unbox an Int32 into an UInt32 which is not possible. So you have to unbox first and than cast.
uint result = (uint)(int)cmd.ExecuteScalar();

Boxing and unboxing is really moving from value type to reference type. So, think of it as moving from the stack to the heap and back again.
There certainly are cases where this is relevant. The inclusion of generics in the 2.0 framework cut a lot of common boxing cases out of practice.

It happens all the time when people do not know what the implications are, simply don't care or sometimes one cannot help but accept boxing as the lesser evel.
Strongly typed datarows will box/unbox pretty much all the time when you access a value-type property.
Also, using a value type as an interface reference will box it as well. Or getting a delegate from an instance method of a value type. (The delegate's target is of type Object)

Since the advent of strongly-typed lists and dictionaries using generics with C# 2.0 (Visual Studio 2005), I think the importance of keeping boxing/unboxing in mind have been amazingly minimized. Add to that nullable types (int?, etc.) and using the null coalescing operator (??) and it really shouldn't be much of a concern at all and would likely not see it in any code that's not 1.1 Framework or earlier.

"The type parameter for an ArrayList must be a class, not a primitive type, so Java provides wrapper classes for the primitive types, like "Integer" for int, "Double" for double, etc.
For more explanation:
An array is a numbered sequence of elements, and each element acts like a separate variable."
Java provides a special syntax for "for" loops over the elements of arrays (and other collection types in Java). The simplified syntax is called a "for-each" loop. For example, the following statement prints each String in an array called "words".
for (String word : words) System.out.println(word);
An array has a number of elements that are set when the array object is created and cannot be changed. Java provides the "ArrayList" class for the functionality of a dynamic array, an array that can change in size. ArrayList is an example of a parameterized type, a type that depends on another type." Eck (2019)
References :
Introduction to Programming Using Java, Eck (2019) describes as follows in Chapter 7

Related

Which is the best practice in C# for type casting? [duplicate]

This question already has answers here:
Direct casting vs 'as' operator?
(16 answers)
Difference between is and as keyword
(13 answers)
Closed 7 years ago.
Which method is best practice to type casting and checking ?
Employee e = o as Employee;
if(e != null)
{
//DO stuff
}
OR
if(o is Employee)
{
Employee e = (Employee) o;
//DO stuff
}
At least there are two possibilities for casting, one for type checking and a combination of both called pattern matching. Each has its own purpose and it depends on the situation:
Hard cast
var myObject = (MyType)source;
You normally do that if you are absolutely sure if the given object is of that type. A situation where you use it, if you subscribed to an event handler and you cast the sender object to the correct type to work on that.
private void OnButtonClick(object sender, EventArgs e)
{
var button = (Button)sender;
button.Text = "Disabled";
button.Enabled = false;
}
Soft cast
var myObject = source as MyType;
if (myObject != null)
// Do Something
This will normally be used if you can't know if you really got this kind of type. So simply try to cast it and if it is not possible, simply give a null back. A common example would be if you have to do something only if some interface is fullfilled:
var disposable = source as IDisposable;
if(disposable != null)
disposable.Dispose();
Also the as operator can't be used on a struct. This is simply because the operator wants to return a null in case the cast fails and a struct can never be null.
Type check
var isMyType = source is MyType;
This is rarely correctly used. This type check is only useful if you only need to know if something is of a specific type, but you don't have to use that object.
if(source is MyType)
DoSomething();
else
DoSomethingElse();
Pattern matching
if (source is MyType myType)
DoSomething(myType);
Pattern matching is the latest feature within the dotnet framework that is relevant to casts. But you can also handle more complicated cases by using the switch statement and the when clause:
switch (source)
{
case SpecialType s when s.SpecialValue > 5
DoSomething(s);
case AnotherType a when a.Foo == "Hello"
SomethingElse(a);
}
I think this is a good question, that deserves a serious and detailed answer. Type casts is C# are a lot of different things actually.
Unlike C#, languages like C++ are very strict about these, so I'll use the naming there as reference. I always think it's best to understand how things work, so I'll break it all down here for you with the details. Here goes:
Dynamic casts and static casts
C# has value types and reference types. Reference types always follow an inheritance chain, starting with Object.
Basically if you do (Foo)myObject, you're actually doing a dynamic cast, and if you're doing (object)myFoo (or simply object o = myFoo) you're doing a static cast.
A dynamic cast requires you to do a type check, that is, the runtime will check if the object you are casting to will be of the type. After all, you're casting down the inheritance tree, so you might as well cast to something else completely. If this is the case, you'll end up with an InvalidCastException. Because of this, dynamic casts require runtime type information (e.g. it requires the runtime to know what object has what type).
A static cast doesn't require a type check. In this case we're casting up in the inheritance tree, so we already know that the type cast will succeed. No exception will be thrown, ever.
Value type casts are a special type of cast that converts different value types (f.ex. from float to int). I'll get into that later.
As, is, cast
In IL, the only things that are supported are castclass (cast) and isinst (as). The is operator is implemented as a as with a null check, and is nothing more than a convenient shorthand notation for the combination of them both. In C#, you could write is as: (myObject as MyFoo) != null.
as simply checks if an object is of a specific type and returns null if it's not. For the static cast case, we can determine this compile-time, for the dynamic cast case we have to check this at runtime.
(...) casts again check if the type is correct, and throw an exception if it's not. It's basically the same as as, but with a throw instead of a null result. This might make you wonder why as is not implemented as an exception handler -- well, that's probably because exceptions are relatively slow.
Boxing
A special type of cast happens when you box a value type into an object. What basically happens is that the .NET runtime copies your value type on the heap (with some type information) and returns the address as a reference type. In other words: it converts a value type to a reference type.
This happens when you have code like this:
int n = 5;
object o = n; // boxes n
int m = (int)o; // unboxes o
Unboxing requires you to specify a type. During the unboxing operation, the type is checked (like the dynamic cast case, but it's much simpler because the inheritance chain of a value type is trivial) and if the type matches, the value is copied back on the stack.
You might expect value type casts to be implicit for boxing -- well, because of the above they're not. The only unboxing operation that's allowed, is the unboxing to the exact value type. In other words:
sbyte m2 = (sbyte)o; // throws an error
Value type casts
If you're casting a float to an int, you're basically converting the value. For the basic types (IntPtr, (u)int 8/16/32/64, float, double) these conversions are pre-defined in IL as conv_* instructions, which are the equivalent of bit casts (int8 -> int16), truncation (int16 -> int8), and conversion (float -> int32).
There are some funny things going on here by the ways. The runtime seems to work on multitudes of 32-bit values on the stack, so you need conversions even on places where you wouldn't expect them. For example, consider:
sbyte sum = (sbyte)(sbyte1 + sbyte2); // requires a cast. Return type is int32!
int sum = int1 + int2; // no cast required, return type is int32.
Sign extension might be tricky to wrap your head around. Computers store signed integer values as 1-complements. In hex notation, int8, this means that the value -1 is 0xFF. So what happens if we cast it to an int32? Again, the 1-complement value of -1 is 0xFFFFFFFF - so we need to propagate the most significant bit to the rest of 'added' bits. If we're doing unsigned extensions, we need to propagate zero's.
To illustrate this point, here's a simple test case:
byte b1 = 0xFF;
sbyte b2 = (sbyte)b1;
Console.WriteLine((int)b1);
Console.WriteLine((int)b2);
Console.ReadLine();
The first cast to int is here zero extended, the second cast to int is sign extended. You also might want to play with the "x8" format string to get the hex output.
For the exact difference between bit casts, truncation and conversion, I refer to the LLVM documentation that explains the differences. Look for sext/zext/bitcast/fptosi and all the variants.
Implicit type conversion
One other category remains, and that's the conversion operators. MSDN details how you can overload the conversion operators. Basically what you can do is implement your own conversion, by overloading an operator. If you want the user to explicitly specify that you intend to cast, you add the explicit keyword; if you want implicit conversions to happen automagically, you add implicit. Basically you'll get:
public static implicit operator byte(Digit d) // implicit digit to byte conversion operator
{
return d.value; // implicit conversion
}
... after which you can do stuff like
Digit d = new Digit(123);
byte b = d;
Best practices
First off, understand the differences, which means implementing small test programs until you understand the distinction between all of the above. There's no surrogate for understanding How Stuff Works.
Then, I'd stick to these practices:
The shorthands are there for a reason. Use the notation that's the shortest, it's probably the best one.
Don't use casts for static casts; only use casts for dynamic casts.
Only use boxing if you need it. The details of this go well beyond this answer; basically what I'm saying is: use the correct type, don't wrap everything.
Notice compiler warnings about implicit conversions (f.ex. unsigned/signed) and always resolve them with explicit casts. You don't want to get surprises with strange values due to sign/zero extension.
In my opinion, unless you know exactly what you're doing, it's best to simply avoid the implicit/explicit conversion -- a simple method call is usually better. The reason for this is that you might end up with an exception on the loose, that you didn't see coming.
With the second method, if the cast fails an exception is thrown.
When casting using as, you can only use reference types. so if you are typecasting to a value type, you must still use int e = (int) o; method.
a good rule of thumb, is : if you can assign null as a value to the object, you can type cast using as.
that said, null comparison is faster than throwing and catching an exception, so in most cases, using as should be faster.
I can't honestly say with certainty if this applies with your is check in place though. It could fail under some multi threading conditions where another thread changes the object you're casting.
I would use the as (safe-cast) operator if I need to use the object after casting. Then I check for null and work with the instance. This method is more efficient than is + explicit cast
In general, the as operator is more efficient because it actually returns the cast value if the cast can be made successfully. The is operator returns only a Boolean value. It can therefore be used when you just want to determine an object's type but do not have to actually cast it.
(more information here).
I am not sure about it but I think that is is using as under the hood and just returns if the object after casting is null (in case of reference types) / an exception was thrown (in case of value types) or not.
Well, it's a matter of taste and specifics of problem that you're dealing with. Let's have a look at two examples with generic methods.
For generic method with 'class' constraint (the safest approach with double cast):
public void MyMethod<T>(T myParameter) where T : class
{
if(myParameter is Employee)
{
// we can use 'as' operator because T is class
Employee e = myParameter as Employee;
//DO stuff
}
}
Also you can do someting like this (one cast operation here but defined variable of type that may or may not be correct) :
public void MyMethod<T>(T myParameter) where T : class
{
Employee e;
if((e = myParameter as Employee) != null)
{
//DO stuff with e
}
}
For generic method with 'struct' constraint :
public void MyMethod<T>(T myParameter) where T : struct
{
if(myParameter is int)
{
// we cant use 'as' operator here because ValueType cannot be null
// explicit conversion doesn't work either because T could be anything so :
int e = Convert.ToInt32(myParameter);
//DO stuff
}
}
Simple scenario with explicit cast:
int i = 5;
object o = (object)i; // boxing
int i2 = (int)o; // unboxing
We can use explicit cast here because we are 100% sure of what types do we use.

Why does ToList<Interface> not work for value types?

If I implement an interface for a value type and try to cast it to a List of it's interface type, why does this result in an error whereas the reference type converts just fine?
This is the error:
Cannot convert instance argument type
System.Collections.Generic.List<MyValueType> to
System.Collections.Generic.IEnumerable<MyInterfaceType>
I have to explicitely use the Cast<T> method to convert it, why?
Since IEnumerable is a readonly enumeration through a collection, it doesn't make any sense to me that it cannot be cast directly.
Here's example code to demonstrate the issue:
public interface I{}
public class T : I{}
public struct V: I{}
public void test()
{
var listT = new List<T>();
var listV = new List<V>();
var listIT = listT.ToList<I>(); //OK
var listIV = listV.ToList<I>(); //FAILS to compile, why?
var listIV2 = listV.Cast<I>().ToList(); //OK
}
Variance (covariance or contravariance) doesn't work for value types, only reference types:
Variance applies only to reference types; if you specify a value type for a variant type parameter, that type parameter is invariant for the resulting constructed type. (MSDN)
The values contained inside reference type variables are references (for example, addresses) and data addresses have the same size and are interpreted the same way, without any required change in their bit patterns.
In contrast, the values contained inside value type variables do not have the same size or the same semantics. Using them as reference types requires boxing and boxing requires type-specific instructions to be emitted by the compiler. It's not practical or efficient (sometimes maybe not even possible) for the compiler to emit boxing instructions for any possible kind of value type, therefore variance is disallowed altogether.
Basically, variance is practical thanks to the extra layer of indirection (the reference) from the variable to the actual data. Because value types lack that layer of indirection, they lack variance capabilities.
Combine the above with how LINQ operations work:
A Cast operation upcasts/boxes all elements (by accessing them through the non-generic IEnumerable, as you pointed out) and then verifies that all elements in a sequence can be successfully cast/unboxed to the provided type and then does exactly that. The ToList operation enumerates the sequence and returns a list from that enumeration.
Each one has its own job. If (say) ToList did the job of both, it would have the performance overhead of both, which is undesirable for most other cases.

Passing C# value type by reference to avoid boxing

One way to avoid boxing in C# is to pass the value type by reference. I have read that a generic method can also be used to avoid boxing. Although writing a generic method solely for the purpose of avoiding boxing seems to be a little extreme - if the type will always be the same.
My question is - if writing code for the best performance and to avoid boxing, is it reasonable to pass all value types (like an int) by reference - even though the method in question is only working on the object and not creating it? Are there any drawbacks to this?
The best way to avoid boxing of value types is: just use them as values!
I think you have completely misread that reference. What it says is that using ref parameters does not cause boxing. It does not say that it is a way to avoid boxing.
Boxing happens when a value type is used in a reference context, such as being cast to an Object. This article says that passing parameters by reference must not be confused with the concept of reference types, but confuse them seems to be exactly what you've done.
Summary by 280Z28:
In other words, avoid the following two operations:
Casting or assigning the value to a variable of type object (or passing the value as an argument for a method parameter of type object).
Casting or assigning the value to a variable which is an interface type (such as IEnumerable), or passing the value as an argument for a method parameter which is an interface type.
There are exceptions to this rule (e.g. calling some generic methods), and there are cases where boxing can occur in other contexts, but these are the primary situations to be aware of when you are trying to avoid unnecessary boxing of value types.

Confusion regarding boxing of value types

In the following code...
int i=5;
object o = 5;
Console.WriteLine(o); //prints 5
I have three questions:
1) What additional/useful functionality is acquired by the 5 residing in the variable o that the 5 represented by the variable i does not have ?
2) If some code is expecting a value type then we can just pass it the int i , but if its expecting a reference type , its probably not interested in the 5 boxed in o anyway . So when are boxing conversions explicitly used in code ?
3) How come the Console.WriteLine(o) print out a 5 instead of System.Object ??
What additional/useful functionality is acquired by the 5 residing in the variable o that the 5 represented by the variable i does not have ?
It's rare that you want to box something, but occasionally it is necessary to do so. In older versions of .NET boxing was often necessary because some methods only worked with object (e.g. ArrayList's methods). This is much less of a problem now that there is generics, so boxing occurs less frequently in newer code.
If some code is expecting a value type then we can just pass it the int i, but if its expecting a reference type, its probably not interested in the 5 boxed in o anyway . So when are boxing conversions explicitly used in code ?
In practice boxing usually happens automatically for you. You could explicitly box a variable if you want to make it more clear to the reader of your code that boxing is happening. This might be relevant if performance could be an issue.
How come the Console.WriteLine(o) print out a 5 instead of System.Object ??
Because ToString is a virtual method on object which means that the implementation that is called depends on the runtime type, not the static type. Since int overrides ToString with its own implementation, it is int.ToString that is called, not the default implementation provided by object.
object o = 5;
Console.WriteLine(o.GetType()); // outputs System.Int32, not System.Object
1) On its own, there is not much point. But imagine you wish to store something in a generic way, and you don't know whether that thing is a value or an object. With boxing, you can convert the value into an object, and then treat everything as an object. Wihtout it, you would need a special case to be able to hold a value or an object. (THis is most useful in containers such as lists, allowing you to mix values like 5 with references to objects like a FileStream).
2) Boxing conversions usually only happen implicitly, except in example code illustrating boxing.
3) The WriteLine code probably calls the virtual Object.ToString() method. If the class of the Object it calls this on does not override ToString, then it will call the base class (object) implementation, but most types (including System.Int although int is a value type, it is still derived from System.Object) override this to provide a more useful context-specific result.
What additional/useful functionality is acquired by the 5 residing in the variable o that the 5 represented by the variable i does not have ?
There is no additional functionality acquired by a boxed value type, apart from the fact that it can be passed by referenced to code that requires that.
So when are boxing conversions explicitly used in code ?
I can't spontaneously think of a scenario when you would need to explicitly box an int to an object, since there is always an implicit conversion in that direction (although I would not be surprised if there are cases when an explicit conversion is required).
How come the Console.WriteLine(o) print out a 5 instead of System.Object ??
It calls ToString on the object passed. In fact, it starts by trying to convert the object to an IFormattable and, if successful (which it will be in the case of an int) then calls the ToString overload that is defined in that interface. This will return "5".
Additional functionality: The object is a full-fledged object. You can call methods on it and use it as you would any other object:
System.Console.WriteLine("type: {0}", o.GetType());
System.Console.WriteLine("hash code: {0}", o.GetHashCode());
The int variable is a value type, not an object.
XXX: This is incorrect; see comments. I would venture instead that the one difference in how you might use the two is that object o = 5 is nullable (you can set o = null), while the value type is not - if int i = 5, then i is always an int.
Explicit boxing: As you said, the boxed version is used by coding manipulating objects as objects rather than integers in particular. This is what enables non-type-safe generic data structures. Now that type-safe generic data structures are available, you are unlikely to be doing much casting and boxing/unboxing.
Why "5": Because the object knows how to print itself using ToString().

Contravariant Delegates Value Types

Can anyone shed light on why contravariance does not work with C# value types?
The below does not work
private delegate Asset AssetDelegate(int m);
internal string DoMe()
{
AssetDelegate aw = new AssetDelegate(DelegateMethod);
aw(32);
return "Class1";
}
private static House DelegateMethod(object m)
{
return null;
}
The problem is that an int is not an object.
An int can be boxed to an object. The resulting object (aka boxed int) is, of course, an object, but it's not exactly an int anymore.
Note that the "is" I'm using above is not the same as the C# operator is. My "is" means "is convertible to by implicit reference conversion". This is the meaning of "is" used when we talk about covariance and contravariance.
An int is implicit convertible to an object, but this is not a reference conversion. It has to be boxed.
An House is implicit convertible to an Asset through a reference conversion. There's no need to create or modify any objects.
Consider the example below. Both variables house and asset are referencing the very same object. The variables integer and boxedInt, on the other hand, hold the same value, but they reference different things.
House house = new House();
Asset asset = house;
int integer = 42;
object boxedInt = integer;
Boxing and Unboxing is not as simple as it may look like. It has many subtleties, and might affect your code in unexpected ways. Mixing boxing with covariance and contravariance is an easy way to make anyone dazzle.
I agree with Anthony Pegram's comment - it is based on reference types having a different memory footprint than the value types: the CLR can implicitly use a class of one type as a class of its super type, but when you start using value types, the CLR will need to box your integer so it can work in the place of the object.
If you're looking to make it work anyway, I have a tendency to wrap the declaration in an expression:
AssetDelegate aw = new AssetDelegate((m) => DelegateMethod(m));
I don't know if that's good practice or not as far as syntax goes, but remember that boxing and unboxing is expensive.

Categories

Resources