Why are ints and doubles immutable? What is the purpose of returning a new object each time you want to change the value?
The reason I ask is because I'm making a class: BoundedInt, which has a value and an upper and lower bound. So I was wondering: should I make this type immutable too? (Or should it be a struct?)
Firstly:
What is the purpose of returning a new object each time you want to change the value?
I think you might be mistaken about how value types work. This isn't some costly operation like you may be imagining; it's simply the overwriting of data (as opposed to, e.g., dynamic allocation of new memory).
Secondly: here's a very simple example of why numbers are immutable:
5.Increase(1);
Console.WriteLine(5); // What should happen here?
Granted, that is a contrived example. So let's consider a couple more involved ideas.
Mutable reference type
First, there's this one: what if Integer were a mutable reference type?
class Integer
{
public int Value;
}
Then we could have code like this:
class Something
{
public Integer Integer { get; set; }
}
And:
Integer x = new Integer { Value = 10 };
Something t1 = new Something();
t1.Integer = x;
Something t2 = new Something();
t2.Integer = t1.Integer;
t1.Integer.Value += 1;
Console.WriteLine(t2.Integer.Value); // Would output 11
This seems to defy intuition: that the line t2.Integer = t1.Integer would simply copy a value (actually, it does; but that "value" is in fact a reference) and thus that t2.Integer would remain independent of t1.Integer.
Mutable value type
This could be approached another way, of course, keeping Integer as a value type but maintaining its mutability:
struct Integer
{
public int Value;
// just for kicks
public static implicit operator Integer(int value)
{
return new Integer { Value = value };
}
}
But now let's say we do this:
Integer x = 10;
Something t = new Something();
t.Integer = x;
t.Integer.Value += 1; // This actually won't compile; but if it did,
// it would be modifying a copy of t.Integer, leaving
// the actual value at t.Integer unchanged.
Console.WriteLine(t.Integer.Value); // would still output 10
Basically, immutability of values is something that is highly intuitive. The opposite is highly unintuitive.
I guess that is subjective, though, in all fairness ;)
Integer variables are mutable. However, integer literals are constants, hence immutable.
int i = 0;
// Mutation coming!
i += 3;
// The following line will not compile.
3 += 7;
It's possible to make an integer field immutable, using readonly. Likewise, an integer property could be get-only.
As a mutable object, you have to lock an int variable before you change it (in any multi-threaded code that writes to your int from separate threads).
Why? Let's say you were incrementing an int, like this:
myInt++
Under the hood, this is a 32-bit number. Theoretically, on a 32 bit computer you could add 1 to it, and this operation might be atomic; that is, it would be accomplished in one step, because it would be accomplished in a CPU register. Unfortunately, it's not; there is more going on than this.
What if another thread mutated this number while it was in the middle of being incremented? Your number would get corrupted.
However, if you make a thread-safe copy of your object before you increment it, operate on your thread-safe copy, and return a new object when your increment is complete, you guarantee that your increment is thread safe; it cannot be affected by any operations on the original object that take place on other threads, because you're no longer working with the original object. In effect, you have made your object immutable.
This is the basic principle behind functional programming; by making objects immutable, and returning new objects from functions, you get thread safety for free.
It makes sense to have BoundedInt as a mutable type because it represents a variable that at any point in time has a specific value and that value can be changed but only within a certain range.
However integers themselves aren't variables so they should not be mutable.
Anything with value semantics should be immutable in C#.
Mutable classes can't have value semantics because you can't override the assignment operator.
MyClass o1=new MyClass();
MyClass o2=o1;
o1.Mutate();
//o2 got mutated too
//=> no value but reference semantics
Mutable structs are ugly because you can easily call a mutating method on a temporary variable. In particular properties return temporary variables.
MyStruct S1;
MyStruct S2{get;set;}
S1.Mutate(); //Changes S1
S2.Mutate();//Doesn't change S2
That's why I don't like that most Vector libraries use mutating methods like Normalize in their Vector struct.
I'm working on an academic project with Neural Networks. These networks do heavy computation with doubles. I run it on amazon cloud for days on 32 core servers. When profiling the application, the top performance problem is allocation of double!!
It would be fair to have a dedicated namespace with mutable types. "unsafe" keywords can be enforced for additional precaution.
Related
It took me ages to understand that boxing/unboxing isn't a process of copying variable ['s value] from stack to heap but just the process of conversion between value<->reference. All of this because all the examples I saw were like:
int i = 12;
object o = i;
int j = (int)o;
Accompanied by a terrible graphs (in many different examples I saw they are the same) that looked like this:
which lead me to the wrong conclusion that boxing is the process of moving from stack to heap with value->reference conversion happening (and vice versa).
Now I understand it just the conversion process, itself but there are few nuances I need in-depth help with:
1. How does it looks in terms of memory schematics when boxing/unboxing happens with instance variables/class field?
By default, all these variables are already allocated in heap. Any examples of boxing in this scope and how does it behave? No need to draw it if you dont want, written explanation will do.
2. What happens here, for example:
int i = 12;
object o = 12; // boxing? if so - why?
int i = (int)o; // unboxing?
int k = (int)o; // Same?
3. If boxing/unboxing considered "bad" in terms of memory/performance - how do you handle it in cases where you cant do that? For example:
int i = 10;
ArrayList arrlst = new ArrayList();
arrlst.Add(i);
int j = (int)arrlst[0];
What the proper solution here besides "use generics" (non-applicable case, for example).
Original Answer
Boxing/Unboxing is not moving to and from the heap, but about indirection. When the variable gets boxed what you get is a new object (ok, that is in heap, that is an implementation detail) that has a copy of the value.
Now, you take an the object and read one of its fields... what happens? You get a value. The implementation detail is that it is loaded in the stack[*] That value you get can be boxed (you can create a new object that holds a reference to it).
[*]: You would then, for example, call a method (or an operator) which will read its parameters from the stack (The semantics in MSIL are stack manipulation).
By the way, when you get the field and box it, what is in the box is a copy. Think about it, what you boxed came from the stack (you first copy it from the heap to the stack, then box it. At least that is the semantics in MSIL). Example:
void Main()
{
var t = new test();
t.boxme = 1;
object box = t.boxme;
t.boxme = 2;
Console.WriteLine(box); // outputs 1
}
class test
{
public int boxme;
}
Tested on LINQPad.
Extended Answer
Here I will go over the points in the edited question...
1. How does it looks in terms of memory schematics when boxing/unboxing happens with instance variables/class field?
By default, all these variables are already allocated in heap. Any examples of boxing in this scope and how does it behave? No need to draw it if you dont want, written explanation will do.
I get you want an explanation of how boxing works on an instance field. Since the code above demonstrates a use of box on an instance field, I will go over that code.
Before diving in the code I want to mention that I use the word "stack" because - as I said in the original answer - that is the semantics of the language. Yet, it does not have to be a literal stack in practice. The jitter will very likely optimize the code to take advantage of CPU registers. Therefore, when you see that I say that we put things in the stack to take them out right away... yeah, the jitter will probably use a register there. In fact, we will be placing some things on the stack repeatedly; the jitter may decide that it is worth to reuse a register for those things.
First off, we are using a very simple, not practical class test with a single field boxme:
class test
{
public int boxme;
}
The only other thing I have to say about this class is remind you that the compiler will generate a constructor, which takes no parameters. With that in mind, let us go over the code in Main line by line...
var t = new test();
This line does two operations:
Call the constructor of the class test. It will create a new object on the heap and push a reference to it on the stack.
Set the local variable t to what we pop from the stack.
t.boxme = 1;
This line does three operations:
Push the value of the local variable t on top of the stack.
Push the value 1 on top of the stack.
Set the field boxme to a value popped from the stack (1) of an object to which we pop a reference from the stack.
object box = t.boxme;
As you may guess, this line is what we are for here. It does four operations total:
Push the value of the local variable t on top of the stack.
Push the value of the field boxme (of an object to which pop a reference from the stack) on top of the stack.
BOX: pop from the stack, copy the value (and the fact that it is an int) to a new object (created in the heap), push the reference to it on the stack.
Set the local variable box to what we pop from the stack.
t.boxme = 2;
Esentially the same as t.boxme = 1; but we push 2 instead of 1.
Console.WriteLine(box);
Push the value of the local variable box on top of the stack.
Call the method System.Console.WriteLine with whatever we pop from the stack as parameter.
The user sees "1".
2. What happens here, for example:
int i = 12;
object o = 12; // boxing? if so - why?
int i = (int)o; // unboxing?
int k = (int)o; // Same?
Yay, more code...
int i = 12;
Push the value 12 on top of the stack.
Set the local variable i to what we pop from the stack.
No surprises so far.
object o = 12; // boxing? if so - why?
Yes, boxing.
Push the value 12 on top of the stack.
BOX: pop from the stack, copy the value (and the fact that it is an int) to a new object (created in the heap), push the reference to it on the stack.
Set the local variable o to what we pop from the stack.
Why? Because the 32 bits which make the int look nothing like a reference type. If you want a reference type with the value of an int you need to put the value of int somewhere it can be referenced (put it on the heap) and then you can have your object.
int i = (int)o; // unboxing?
A local variable named 'i' is already defined in this scope
I think you mean:
i = (int)o; // unboxing?
Yes, unboxing.
Push the value of the local variable o on the top of the stack.
Unbox: read the value of the object we pop from the stack, and push that value on the stack.
Set the local variable i to what we pop from the stack.
int k = (int)o; // Same?
Yes. Just a different local variable.
3. If boxing/unboxing considered "bad" in terms of memory/performance - how do you handle it in cases where you cant do that? For example:
int i = 10;
ArrayList arrlst = new ArrayList();
arrlst.Add(i);
int j = (int)arrlst[0];
1. Use generics
int i = 10;
var arrlst = new List<T>();
arrlst.Add(i);
int j = arrlst[0];
I have to admit. Sometimes use generics is not the answer.
2. Use ref
C# 7.0 has ref return and locals should cover some cases were we needed boxing/unboxing in the past.
By using ref, what you pass is a reference to the value that is stored in stack. Since the idea of ref is that you can modify the original, using box, (copying the value to the heap) would defy its purpose.
3. Keep an eye on box lifespan
You may try to reuse your references instead of unnecessarily boxing the same value multiple times. That could help to keep the number of boxes low, and the garbage collector will pick on the fact that these are long-lived boxes and check them less often.
On the other hand, the garbage collector will deal very efficiently with short-lived boxes. Thus, if you cannot avoid making a lot of boxing/unboxing, try to make the boxes short lived.
4. Try using reference types
If you are having, performance problems because you have many long-lived boxes... you probably need to make some classes. If you are using reference types to begin with, there is no need to box them.
Although that can be problematic if you need structs for interop... hmm... probably not what you are looking for, but have a look at ref struct. Span<T> et. al. can save you allocations in other ways.
5. Let it be
If you cannot do it without boxing, you cannot do it without boxing.
For example, if you need a generic container that makes atomic operations on the members of the generic type... but you also need to allow the generic type to be a value type... what do you do then? Well, you got to initialize the container with the type object when you need to store some not atomic value type.
No, ref will not save you in that case, because ref does not guarantee atomicity.
Instead of working harder in getting a performance gain from optimizing the use of boxing/unboxing... look for other ways to improve the performance. For example, that generic container I was talking about can be expensive, but if it allows you to parallelize some algorithm and that gives you a performance boost greater than that cost, then it is justified.
Example:
// Potentially large struct.
struct Foo
{
public int A;
public int B;
// etc.
}
Foo[] arr = new Foo[100];
If Foo is a 100 byte structure, how many bytes will be copied in memory during execution of the following statement:
int x = arr[0].A
That is, is arr[0] evaluated to some temporary variable (a 100 byte copy of an instance of Foo), followed by the copying of .A into variable x (a 4 byte copy).
Or is some combination of the compiler, JITer and CLR able to optimise this statement such that the 4 bytes of A are copied directly into x.
If an optimisation is performed, does it still hold when the items are held in a List<Foo> or when an array is passed as an IList<Foo> or an ArraySegment<Foo>?
Value types are copied by value -- hence the name. So then we must consider at what times a copy must be made of a value. This comes down to analyzing correctly when a particular entity refers to a variable, or a value. If it refers to a value then that value was copied from somewhere. If it refers to a variable then its just a variable, and can be treated like any other variable.
Suppose we have
struct Foo { public int A; public int B; }
Ignore for the moment the design flaws here; public fields are a bad code smell, as are mutable structs.
If you say
Foo f = new Foo();
what happens? The spec says:
A new eight byte variable f is created.
A temporary eight byte storage location temp is created.
temp is filled in with eight bytes of zeros.
temp is copied to f.
But that is not what actually happens; the compiler and runtime are smart enough to notice that there is no observable difference between the required workflow and the workflow "create f and fill it with zeros", so that happens. This is a copy elision optimization.
EXERCISE: devise a program in which the compiler cannot copy-elide, and the output makes it clear that the compiler does not perform a copy elision when initializing a variable of struct type.
Now if you say
f.A = 123;
then f is evaluated to produce a variable -- not a value -- and then from that A is evaluated to produce a variable, and four bytes are written to that variable.
If you say
int x = f.A;
then f is evaluated as a variable, A is evaluated as a variable, and the value of A is written to x.
If you say
Foo[] fs = new Foo[1];
then variable fs is allocated, the array is allocated and initialized with zeros, and the reference to the array is copied to fs. When you say
fs[0].A = 123;
Same as before. f[0] is evaluated as a variable, so A is a variable, so 123 is copied to that variable.
When you say
int x = fs[0].A;
same as before: we evaluate fs[0] as a variable, fetch from that variable the value of A, and copy it.
But if you say
List<Foo> list = new List<Foo>();
list.Add(new Foo());
list[0].A = 123;
then you will get a compiler error, because list[0] is a value, not a variable. You can't change it.
If you say
int x = list[0].A;
then list[0] is evaluated as a value -- a copy of the value stored in the list is made -- and then a copy of A is made in x. So there is an extra copy here.
EXERCISE: Write a program that illustrates that list[0] is a copy of the value stored in the list.
It is for this reason that you should (1) not make big structs, and (2) make them immutable. Structs get copied by value, which can be expensive, and values are not variables, so it is hard to mutate them.
What makes array indexer return a variable but list indexer not? Is array treated in a special way?
Yes. Arrays are very special types that are built deeply into the runtime and have been since version 1.
The key feature here is that an array indexer logically produces an alias to the variable contained in the array; that alias can then be used as the variable itself.
All other indexers are actually pairs of get/set methods, where the get returns a value, not a variable.
Can I create my own class to behave the same as array in this regard
Before C# 7, not in C#. You could do it in IL, but of course then C# wouldn't know what to do with the returned alias.
C# 7 adds the ability for methods to return aliases to variables: ref returns. Remember, ref (and out) parameters take variables as their operands and cause the callee to have an alias to that variable. C# 7 adds the ability to do this to locals and returns as well.
The entire struct is already in memory. When you access arr[0].A, you aren't copying anything, and no new memory is needed. You're looking up an object reference (that might be on the call stack, but a struct might be wrapped by a reference type on the heap, too) for the location of arr[0], adjusting for the offset for the A property, and then accessing only that integer. There will not be a need to read the full struct just to get A.
Neither List<Foo> or ArraySegment<Foo> really changes anything important here so far.
However, if you were to pass arr[0] to a function or assign it to a new variable, that would result in copying the Foo object. This is one difference between a struct (value type) and a class (reference type) in .Net; a class would only copy the reference, and List<Foo> and ArraySegment<Foo> are both reference types.
In .Net, especially as a newcomer the platform, you should strongly prefer class over struct most of the time, and it's not just about the copying the full object vs copying the reference. There are some other subtle semantic differences that even I admittedly don't fully understand. Just remember that class > struct until you have a good empirical reason to change your mind.
According to Wikipedia's article on value objects, C# value objects are both immutable and copied attribute-wise.
If they're immutable, why make copies? Even if it helps memory locality, is that general enough of an optimization to make it the default behavior?
Edit: Oh, I think I misunderstood immutability.
So does immutability mean you can't modify the attributes individually, but you can replace the entire internals from an existing object? But doesn't that violate "if two things are equal, they will always be equal"?
So does immutability mean you can't modify the attributes individually, but you can replace the entire internals from an existing object?
Yes.
But doesn't that violate "if two things are equal, they will always be equal"?
No. Why would it? If you replace the internals from an existing object, you get a new object with different internals.
I'm not agreeing with the given claims, but I'll attempt to explain what I believe they intend to say.
The fact that structure types are immutable means that
public struct S { int i; }
public S f() { /* omitted */ }
public void g() { f().i = 3; }
is a compile-time error: it wouldn't make sense to modify f()'s result, because the modification would be immediately lost.
In contrast,
public struct S { int i; }
public S f() { /* omitted */ }
public void g() { var s = f(); s.i = 3; }
is fine, but s.i = 3; can be interpreted as rewriting all of s: it can be interpreted as equivalent to (pseudo-code) s = { 3 };, where { 3 } constructs a whole new S value object.
But doesn't that violate "if two things are equal, they will always be equal"?
By their interpretation, this is still true. After s.i = 3;, s is a whole new value. Before the assignment to s.i, s was equal to f()'s result. After the assignment to s.i, s itself fundamentally changes, and it's not just a modification of a property of that object, you've got a whole new object, which was never equal to any other object except perhaps by chance.
Their interpretation is consistent with how C# actually works, although their phrasing is not how I usually see it, or how I would put it. Beware that other documentation may make different claims that at first glance will seem to totally contradict these.
Everything is copied by value unless you use the ref keyword. The difference between value types and reference types is:
variables/fields whose type is a value-type are allocated where they are declared. This can be the current stack frame if they are local method variables. But it can also be the heap if they are part of an object already on the heap.
variables/fields whose type is a reference-type contain a reference to an object that is allocated on the heap.
Since value-types are allocated "in-place" when you assign a variable to another, your're actually copying the object's members. When you assign a reference-type variable to another you're copying the reference to the same object on the heap. Either way, you're always copying the content of the variable.
Value types behavior shows that whatever value we are holding cannot be changed through some other variable .
But I still have a confusion in my mind about what i mentioned in the title of this post . Can anyone clarify?
Value types can be either mutable or (modulo some weird edge cases) immutable, depending on how you write them.
Mutable:
public struct MutableValueType
{
public int MyInt { get; set; }
}
Immutable:
public struct ImmutableValueType
{
private readonly int myInt;
public ImmutableValueType(int i) { this.myInt = i; }
public int MyInt { get { return this.myInt; } }
}
The built-in value types (int, double and the like) are immutable, but you can very easily create your own mutable structs.
One piece of advice: don't. Mutable value types are a bad idea, and should be avoided. For example, what does this code do:
SomeType t = new SomeType();
t.X = 5;
SomeType u = t;
t.X = 10;
Console.WriteLine(u.X);
It depends. If SomeType is a value type, it prints 5, which is a pretty confusing result.
See this question for more info on why you should avoid mutable value types.
all primitive value types like int, double,float are immutable.But structs by itself are mutable.so you have to take measures to make them as immutable as it can create lot of confusions.
Any value-type instance which holds any information can be mutated by code which can write the storage location wherein it are contained, and no value type-instance can be mutated by code which cannot write the storage location wherein it is contained. These characteristics make privately-held storage locations of mutable value types ideal data containers in many scenarios, since they combine the updating convenience that stems from mutability, with the control that would come from immutability. Note that it is possible to write the code for a value type in such a way that it's impossible to mutate an existing instance without first having an instance (perhaps a newly created temporary instance) which contains the desired data, and overwriting the contents of the former instance with the contents of the latter, but that won't make the value type any more or less mutable than it would have been absent such ability. In many cases, it merely serves to make mutation awkward and to make it look as though a statement like:
MyKeyValuePair =
new KeyValuePair<long,long>(MyKeyValuePair.Key+1, MyKeyValuePair.Value+1>;
will create a new instance but leave the existing instance unaffected. If KeyValuePair were an immutable class, and one thread was performing a MyKeyValuePair.ToString() while another thread was executing the above code, the ToString call would act upon either the old or new instance, and would thus yield either both old values or both new values. Because KeyValuePair is a struct, however, the above statement will create a new instance, but it won't make MyKeyValuePair refer to the new instance--it will merely use the new instance as a template whose fields will be copied to MyKeyValuePair. If KeyValuePair were a mutable struct, the most natural expression of the likely-intended meaning for the above code would be more like:
MyKeyValuePair.Key += 1;
MyKeyValuePair.Value += 1;
or perhaps:
var temp = MyKeyValuePair;
MyKeyValuePair.Key = temp.Key+1;
MyKeyValuePair.Value = temp.Value+1;
and the threading implications would be much clearer.
I am reading Eric Liperts' blog about Mutating Readonly Structs and I see many references here in SO to this blog as an argument why value types must be immutable.
But still one thing is not clear, says that when you access value type you always get the copy of it and here is the example :
struct Mutable
{
private int x;
public int Mutate()
{
this.x = this.x + 1;
return this.x;
}
}
class Test
{
public readonly Mutable m = new Mutable();
static void Main(string[] args)
{
Test t = new Test();
System.Console.WriteLine(t.m.Mutate());
System.Console.WriteLine(t.m.Mutate());
System.Console.WriteLine(t.m.Mutate());
}
}
And the question is this why when I change the
public readonly Mutable m = new Mutable();
to
public Mutable m = new Mutable();
everything starts to work es expected.
Please can you explain more clear why Value Types must be immutable.
I know that it is good for thread safety, but in this case same can be applied to reference types.
Structs with mutating methods behave strangely in several situations.
The example you already discovered is a readonly field. A defensive copy is necessary because you don't want to mutate a readonly field.
But also when used as properties. Once again an implicit copy happens, and only the copy is mutated. Even if the property has a setter.
struct Mutable
{
private int x;
public int Mutate()
{
this.x = this.x + 1;
return this.x;
}
}
Mutable property{get;set;}
void Main()
{
property=new Mutable();
property.Mutate().Dump();//returns 1
property.Mutate().Dump();//returns 1 :(
}
This shows that mutating methods are problematic on structs. But it doesn't show that a mutable struct with either public fields or properties that have a setter is problematic.
The thread-safety is a clear technical reason. It applies to value types as well as to reference types (see System.String).
The more general guideline "value types should be immutable" is different. It is about readability of code, and comes mainly from the confusion that mutable values can cause. This code snippet is just one example. Most people would not expect the 1,1,1 outcome.
I don't know C# so I'll try to answer the 2nd part of your question.
Why value types must be immutable?
There are two types of objects from Domain Driven Design's point of view:
value objects/types - their identity is determined by their value (e.g. numbers: 2 is always 2 - an identity of number two is always the same, so 2 == 2 is always true)
entities (reference types) - they can consist of other value types and their identity is determined by their identity itself (e.g. people: even if there was man looking exactly like you, it wouldn't be you)
If value types were mutable, then imagine what could happen if it would be possible to change the value of the number two: 2 == 1 + 1 wouldn't be guarantied to be true.
See these links for more:
Value vs Entity objects (Domain Driven Design)
http://devlicio.us/blogs/casey/archive/2009/02/13/ddd-entities-and-value-objects.aspx
I think the tricky thing about that example is that one could argue it shouldn't be possible. You made an instance of Mutable read-only and yet you can change its value through the Mutate() function, therefore violating the concept of immutability, in a sense. Strictly speaking, however, it works because the private field x is not readonly. If you make one simple change in the mutable class then immutability will actually be enforced:
private readonly int x;
Then the Mutate() function will produce a compiler error.
The example shows clearly how copy-by-value works in the context of readonly variables. Whenever you call m you are creating a copy of the instance, as opposed to a copy of a reference to the instance -- the latter would occur if Mutable were a class instead of a struct.
Since everytime you call m you are calling 1) a copy of the instance, and 2) a copy of an instance that is read-only, the value of x is always going to be 0 at the time the copying takes place. When you call Mutate() on the copy it increments x to 1, which works because x itself is NOT readonly. But next time you call Mutate() you are still calling it on the original default value of 0. As he says in the article "m is immutable, but the copy is not". Every copy of the original instance will have x as 0 because the object being copied never changes whereas its copies can be changed.
Maybe that helps.