Why are value objects immutable and copy-by-value?

Why are value objects immutable and copy-by-value? - c#

According to Wikipedia's article on value objects, C# value objects are both immutable and copied attribute-wise.
If they're immutable, why make copies? Even if it helps memory locality, is that general enough of an optimization to make it the default behavior?
Edit: Oh, I think I misunderstood immutability.
So does immutability mean you can't modify the attributes individually, but you can replace the entire internals from an existing object? But doesn't that violate "if two things are equal, they will always be equal"?

So does immutability mean you can't modify the attributes individually, but you can replace the entire internals from an existing object?
Yes.
But doesn't that violate "if two things are equal, they will always be equal"?
No. Why would it? If you replace the internals from an existing object, you get a new object with different internals.

I'm not agreeing with the given claims, but I'll attempt to explain what I believe they intend to say.
The fact that structure types are immutable means that
public struct S { int i; }
public S f() { /* omitted */ }
public void g() { f().i = 3; }
is a compile-time error: it wouldn't make sense to modify f()'s result, because the modification would be immediately lost.
In contrast,
public struct S { int i; }
public S f() { /* omitted */ }
public void g() { var s = f(); s.i = 3; }
is fine, but s.i = 3; can be interpreted as rewriting all of s: it can be interpreted as equivalent to (pseudo-code) s = { 3 };, where { 3 } constructs a whole new S value object.
But doesn't that violate "if two things are equal, they will always be equal"?
By their interpretation, this is still true. After s.i = 3;, s is a whole new value. Before the assignment to s.i, s was equal to f()'s result. After the assignment to s.i, s itself fundamentally changes, and it's not just a modification of a property of that object, you've got a whole new object, which was never equal to any other object except perhaps by chance.
Their interpretation is consistent with how C# actually works, although their phrasing is not how I usually see it, or how I would put it. Beware that other documentation may make different claims that at first glance will seem to totally contradict these.

Everything is copied by value unless you use the ref keyword. The difference between value types and reference types is:
variables/fields whose type is a value-type are allocated where they are declared. This can be the current stack frame if they are local method variables. But it can also be the heap if they are part of an object already on the heap.
variables/fields whose type is a reference-type contain a reference to an object that is allocated on the heap.
Since value-types are allocated "in-place" when you assign a variable to another, your're actually copying the object's members. When you assign a reference-type variable to another you're copying the reference to the same object on the heap. Either way, you're always copying the content of the variable.

Related

modify a value-type variable in a using statement

In C#, if I have the following struct:
internal struct myStruct : IDisposable
{
public int x;
public void Dispose()
{
x = 0;
}
}
then do this in Main:
using (myStruct myStruct = new myStruct())
{
myStruct.x = 5;
}
it fails saying that myStruct is readonly. That makes sense as myStruct is a value-type.
Now if I add the folling function to the struct:
public void myfunc(int x)
{
this.x = x;
}
and change the Main code to this:
using (myStruct myStruct = new myStruct())
{
myStruct.myfunc(5);
Console.WriteLine(myStruct.x);
}
it works. Why ?

The short answer is "because the C# specification says so". Which, I admit, may be a bit unsatisfying. But that's how it is.
The motivation is, I'm sure, as commenter Blogbeard suggests: while it's practical to enforce read-only on the field access, it's not practical to do so from within a type. After all, the type itself has no way to know how a variable containing a value of that type was declared.
The key part of the C# specification (from the v5.0 spec) is here, on page 258 (in the section on the using statement):
Local variables declared in a resource-acquisition are read-only, and must include an initializer. A compile-time error occurs if the embedded statement attempts to modify these local variables (via assignment or the ++ and operators), take the address of them, or pass them as ref or out parameters.
Since in the case of a value type, the variable itself contains the value of the object rather than a reference to an object, modifying any field of the object via that variable is the same as modifying the variable, and is so a "modification via assignment", which is specifically prohibited by the specification.
This is exactly the same as if you had declared the value type variable as a field in another object, with the readonly modifier.
But note that this is a compile-time rule, enforced by the C# compiler, and that there's no way for the compiler to similarly enforce the rule for a value type that modifies itself.
I will point out that this is one of many excellent reasons that one should never ever implement a mutable value type. Mutable value types frequently wind up being able to be modified when you don't want them to be, while at the same time find themselves failing to be modified when you do want them to be (in completely different scenarios from this one).
If you treat a value type as something that is truly a value, i.e. a single value that is itself never changing, they work much better and find themselves in the middle of many fewer bugs. :)

When would a value type contain a reference type?

I understand that the decision to use a value type over a reference type should be based on the semantics, not performance. I do not understand why value types can legally contain reference type members? This is for a couple reasons:
For one, we should not build a struct to require a constructor.
public struct MyStruct
{
public Person p;
// public Person p = new Person(); // error: cannot have instance field initializers in structs
MyStruct(Person p)
{
p = new Person();
}
}
Second, because of value type semantics:
MyStruct someVariable;
someVariable.p.Age = 2; // NullReferenceException
The compiler does not allow me to initialize Person at the declaration. I have to move this off to the constructor, rely on the caller, or expect a NullReferenceException. None of these situations are ideal.
Does the .NET Framework have any examples of reference types within value types? When should we do this (if ever)?

Instances of a value type never contain instances of a reference type. The reference-typed object is somewhere on the managed heap, and the value-typed object may contain a reference to the object. Such a reference has a fixed size. It is perfectly common to do this — for example every time you use a string inside a struct.
But yes, you cannot guarantee initialization of a reference-typed field in a struct because you cannot define a parameter-less constructor (nor can you guarantee it ever gets called, if you define it in a language other than C#).
You say you should "not build a struct to require a constructor". I say otherwise. Since value-types should almost always be immutable, you must use a constructor (quite possibly via a factory to a private constructor). Otherwise it will never have any interesting contents.
Use the constructor. The constructor is fine.
If you don't want to pass in an instance of Person to initialize p, you could use lazy initialization via a property. (Because obviously the public field p was just for demonstration, right? Right?)
public struct MyStruct
{
public MyStruct(Person p)
{
this.p = p;
}
private Person p;
public Person Person
{
get
{
if (p == null)
{
p = new Person(…); // see comment below about struct immutability
}
return p;
}
}
// ^ in most other cases, this would be a typical use case for Lazy<T>;
// but due to structs' default constructor, we *always* need the null check.
}

There are two primary useful scenarios for a struct holding a class-type field:
The struct holds a possibly-mutable reference to an immutable object (`String` being by far the most common). An reference to an immutable object will behave as a cross between a nullable value type and a normal value type; it doesn't have the "Value" and "HasValue" properties of the former, but it will have null as a possible (and default) value. Note that if the field is accessed through a property, that property may return a non-null default when the field is null, but should not modify the field itself.
The struct holds an "immutable" reference to a possibly-mutable object and serves to wrap the object or its contents. `List.Enumerator` is probably the most common struct using this pattern. Having struct fields pretend to be immutable is something of a dodgy construct(*), but in some contexts it can work out pretty well. In most instances where this pattern is applied, the behavior of a struct will be essentially like that of a class, except that performance will be better(**).
(*) The statement structVar = new structType(whatever); will create a new instance of structType, pass it to the constructor, and then mutate structVar by copying all public and private fields from that new instance into structVar; once that is done, the new instance will be discarded. Consequently, all struct fields are mutable, even if they "pretend" to be otherwise; pretending they are immutable can be dodgy unless one knows that the way structVar = new structType(whatever); is actually implemented will never pose a problem.
(**) Structs will perform better in some circumstances; classes will perform better in others. Generally, so-called "immutable" structs are chosen over classes in situations where they are expected to perform better, and where the corner cases where their semantics would differ from those of classes are not expected to pose problems.
Some people like to pretend that structs are like classes, but more efficient, and dislike using structs in ways that take advantage of the fact that they're not classes. Such people would probably only be inclined toward using scenario (2) above. Scenario #1 can be very useful with mutable structs, especially with types like String which behave essentially as values.

I wanted to add to Marc's answer, but I had too much to say for a comment.
If you look at the C# specifications, it says of struct constructors:
Struct constructors are invoked with the new operator, but that does
not imply that memory is being allocated. Instead of dynamically
allocating an object and returning a reference to it, a struct
constructor simply returns the struct value itself (typically in a
temporary location on the stack), and this value is then copied as
necessary.
(You can find a copy of the spec under
C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC#\Specifications\1033)
So, a struct constructor is inherently different than a class constructor.
In addition to this, structs are expected to be copied by value, and thus:
With structs, the variables each have their own copy of the data, and
it is not possible for operations on one to affect the other.
Any time I've seen a reference type in a struct, it has been a string. This works because strings are immutable. I am guessing your Person object is not immutable and can introduce very odd and severe bugs because of the divergence from the expected behavior of a struct.
That being said, the errors you're seeing with the constructor of your struct may be that you have a public field p with the same name as your parameter p and not referring to the struct's p as this.p, or that you're missing the keyword struct.

immutable value types

I am reading Eric Liperts' blog about Mutating Readonly Structs and I see many references here in SO to this blog as an argument why value types must be immutable.
But still one thing is not clear, says that when you access value type you always get the copy of it and here is the example :
struct Mutable
{
private int x;
public int Mutate()
{
this.x = this.x + 1;
return this.x;
}
}
class Test
{
public readonly Mutable m = new Mutable();
static void Main(string[] args)
{
Test t = new Test();
System.Console.WriteLine(t.m.Mutate());
System.Console.WriteLine(t.m.Mutate());
System.Console.WriteLine(t.m.Mutate());
}
}
And the question is this why when I change the
public readonly Mutable m = new Mutable();
to
public Mutable m = new Mutable();
everything starts to work es expected.
Please can you explain more clear why Value Types must be immutable.
I know that it is good for thread safety, but in this case same can be applied to reference types.

Structs with mutating methods behave strangely in several situations.
The example you already discovered is a readonly field. A defensive copy is necessary because you don't want to mutate a readonly field.
But also when used as properties. Once again an implicit copy happens, and only the copy is mutated. Even if the property has a setter.
struct Mutable
{
private int x;
public int Mutate()
{
this.x = this.x + 1;
return this.x;
}
}
Mutable property{get;set;}
void Main()
{
property=new Mutable();
property.Mutate().Dump();//returns 1
property.Mutate().Dump();//returns 1 :(
}
This shows that mutating methods are problematic on structs. But it doesn't show that a mutable struct with either public fields or properties that have a setter is problematic.

The thread-safety is a clear technical reason. It applies to value types as well as to reference types (see System.String).
The more general guideline "value types should be immutable" is different. It is about readability of code, and comes mainly from the confusion that mutable values can cause. This code snippet is just one example. Most people would not expect the 1,1,1 outcome.

I don't know C# so I'll try to answer the 2nd part of your question.
Why value types must be immutable?
There are two types of objects from Domain Driven Design's point of view:
value objects/types - their identity is determined by their value (e.g. numbers: 2 is always 2 - an identity of number two is always the same, so 2 == 2 is always true)
entities (reference types) - they can consist of other value types and their identity is determined by their identity itself (e.g. people: even if there was man looking exactly like you, it wouldn't be you)
If value types were mutable, then imagine what could happen if it would be possible to change the value of the number two: 2 == 1 + 1 wouldn't be guarantied to be true.
See these links for more:
Value vs Entity objects (Domain Driven Design)
http://devlicio.us/blogs/casey/archive/2009/02/13/ddd-entities-and-value-objects.aspx

I think the tricky thing about that example is that one could argue it shouldn't be possible. You made an instance of Mutable read-only and yet you can change its value through the Mutate() function, therefore violating the concept of immutability, in a sense. Strictly speaking, however, it works because the private field x is not readonly. If you make one simple change in the mutable class then immutability will actually be enforced:
private readonly int x;
Then the Mutate() function will produce a compiler error.
The example shows clearly how copy-by-value works in the context of readonly variables. Whenever you call m you are creating a copy of the instance, as opposed to a copy of a reference to the instance -- the latter would occur if Mutable were a class instead of a struct.
Since everytime you call m you are calling 1) a copy of the instance, and 2) a copy of an instance that is read-only, the value of x is always going to be 0 at the time the copying takes place. When you call Mutate() on the copy it increments x to 1, which works because x itself is NOT readonly. But next time you call Mutate() you are still calling it on the original default value of 0. As he says in the article "m is immutable, but the copy is not". Every copy of the original instance will have x as 0 because the object being copied never changes whereas its copies can be changed.
Maybe that helps.

Why are C# number types immutable?

Why are ints and doubles immutable? What is the purpose of returning a new object each time you want to change the value?
The reason I ask is because I'm making a class: BoundedInt, which has a value and an upper and lower bound. So I was wondering: should I make this type immutable too? (Or should it be a struct?)

Firstly:
What is the purpose of returning a new object each time you want to change the value?
I think you might be mistaken about how value types work. This isn't some costly operation like you may be imagining; it's simply the overwriting of data (as opposed to, e.g., dynamic allocation of new memory).
Secondly: here's a very simple example of why numbers are immutable:
5.Increase(1);
Console.WriteLine(5); // What should happen here?
Granted, that is a contrived example. So let's consider a couple more involved ideas.
Mutable reference type
First, there's this one: what if Integer were a mutable reference type?
class Integer
{
public int Value;
}
Then we could have code like this:
class Something
{
public Integer Integer { get; set; }
}
And:
Integer x = new Integer { Value = 10 };
Something t1 = new Something();
t1.Integer = x;
Something t2 = new Something();
t2.Integer = t1.Integer;
t1.Integer.Value += 1;
Console.WriteLine(t2.Integer.Value); // Would output 11
This seems to defy intuition: that the line t2.Integer = t1.Integer would simply copy a value (actually, it does; but that "value" is in fact a reference) and thus that t2.Integer would remain independent of t1.Integer.
Mutable value type
This could be approached another way, of course, keeping Integer as a value type but maintaining its mutability:
struct Integer
{
public int Value;
// just for kicks
public static implicit operator Integer(int value)
{
return new Integer { Value = value };
}
}
But now let's say we do this:
Integer x = 10;
Something t = new Something();
t.Integer = x;
t.Integer.Value += 1; // This actually won't compile; but if it did,
// it would be modifying a copy of t.Integer, leaving
// the actual value at t.Integer unchanged.
Console.WriteLine(t.Integer.Value); // would still output 10
Basically, immutability of values is something that is highly intuitive. The opposite is highly unintuitive.
I guess that is subjective, though, in all fairness ;)

Integer variables are mutable. However, integer literals are constants, hence immutable.
int i = 0;
// Mutation coming!
i += 3;
// The following line will not compile.
3 += 7;
It's possible to make an integer field immutable, using readonly. Likewise, an integer property could be get-only.

As a mutable object, you have to lock an int variable before you change it (in any multi-threaded code that writes to your int from separate threads).
Why? Let's say you were incrementing an int, like this:
myInt++
Under the hood, this is a 32-bit number. Theoretically, on a 32 bit computer you could add 1 to it, and this operation might be atomic; that is, it would be accomplished in one step, because it would be accomplished in a CPU register. Unfortunately, it's not; there is more going on than this.
What if another thread mutated this number while it was in the middle of being incremented? Your number would get corrupted.
However, if you make a thread-safe copy of your object before you increment it, operate on your thread-safe copy, and return a new object when your increment is complete, you guarantee that your increment is thread safe; it cannot be affected by any operations on the original object that take place on other threads, because you're no longer working with the original object. In effect, you have made your object immutable.
This is the basic principle behind functional programming; by making objects immutable, and returning new objects from functions, you get thread safety for free.

It makes sense to have BoundedInt as a mutable type because it represents a variable that at any point in time has a specific value and that value can be changed but only within a certain range.
However integers themselves aren't variables so they should not be mutable.

Anything with value semantics should be immutable in C#.
Mutable classes can't have value semantics because you can't override the assignment operator.
MyClass o1=new MyClass();
MyClass o2=o1;
o1.Mutate();
//o2 got mutated too
//=> no value but reference semantics
Mutable structs are ugly because you can easily call a mutating method on a temporary variable. In particular properties return temporary variables.
MyStruct S1;
MyStruct S2{get;set;}
S1.Mutate(); //Changes S1
S2.Mutate();//Doesn't change S2
That's why I don't like that most Vector libraries use mutating methods like Normalize in their Vector struct.

I'm working on an academic project with Neural Networks. These networks do heavy computation with doubles. I run it on amazon cloud for days on 32 core servers. When profiling the application, the top performance problem is allocation of double!!
It would be fair to have a dedicated namespace with mutable types. "unsafe" keywords can be enforced for additional precaution.

Why can '=' not be overloaded in C#?

I was wondering, why can't I overload '=' in C#? Can I get a better explanation?

Memory managed languages usually work with references rather than objects. When you define a class and its members you are defining the object behavior, but when you create a variable you are working with references to those objects.
Now, the operator = is applied to references, not objects. When you assign a reference to another you are actually making the receiving reference point to the same object that the other reference is.
Type var1 = new Type();
Type var2 = new Type();
var2 = var1;
In the code above, two objects are created on the heap, one referred by var1 and the other by var2. Now the last statement makes the var2 reference point to the same object that var1 is referring. After that line, the garbage collector can free the second object and there is only one object in memory. In the whole process, no operation is applied to the objects themselves.
Going back to why = cannot be overloaded, the system implementation is the only sensible thing you can do with references. You can overload operations that are applied to the objects, but not to references.

If you overloaded '=' you would never be able to change an object reference after it's been created.
... think about it - any call to theObjectWithOverloadedOperator=something inside the overloaded operator would result in another call to the overloaded operator... so what would the overloaded operator really be doing ? Maybe setting some other properties - or setting the value to a new object (immutability) ?
Generally not what '=' implies..
You can, however, override the implicit & explicit cast operators:
http://www.blackwasp.co.uk/CSharpConversionOverload.aspx

Because it doesn't really make sense to do so.
In C# = assigns an object reference to a variable. So it operates on variables and object references, not objects themselves. There is no point in overloading it depending on object type.
In C++ defining operator= makes sense for classes whose instances can be created e.g. on stack because the objects themselves are stored in variables, not references to them. So it makes sense to define how to perform such assignment. But even in C++, if you have set of polymorphic classes which are typically used via pointers or references, you usually explicitly forbid copying them like this by declaring operator= and copy constructor as private (or inheriting from boost::noncopyable), because of exactly the same reasons as why you don't redefine = in C#. Simply, if you have reference or pointer of class A, you don't really know whether it points to an instance of class A or class B which is a subclass of A. So do you really know how to perform = in this situation?

Actually, overloading operator = would make sense if you could define classes with value semantics and allocate objects of these classes in the stack. But, in C#, you can't.

One possible explanation is that you can't do proper reference updates if you overload assignment operator. It would literally screw up semantics because when people would be expecting references to update, your = operator may as well be doing something else entirely. Not very programmer friendly.
You can use implicit and explicit to/from conversion operators to mitigate some of the seeming shortcomings of not able to overload assignment.

I don't think there's any really particular single reason to point to. Generally, I think the idea goes like this:
If your object is a big, complicated object, doing something that isn't assignment with the = operator is probably misleading.
If your object is a small object, you may as well make it immutable and return new copies when performing operations on it, so that the assignment operator works the way you expect out of the box (as System.String does.)

You can overload assignment in C#. Just not on an entire object, only on members of it. You declare a property with a setter:
class Complex
{
public double Real
{
get { ... }
set { /* do something with value */ }
}
// more members
}
Now when you assign to Real, your own code runs.
The reason assignment to an object is not replaceable is because it is already defined by the language to mean something vitally important.

It's allowed in C++ and if not careful , it can result in a lot of confusion and bug hunting.
This article explains this in great detail.
http://www.relisoft.com/book/lang/project/14value.html

Because shooting oneself in the foot is frowned upon.
On a more serious note one can only hope you meant comparison rather than assignment. The framework makes elaborate provision for interfering with equality/equivalence evaluation, look for "compar" in help or online with msdn.

Being able to define special semantics for assignment operations would be useful, but only if such semantics could be applied to all situations where one storage location of a given type was copied to another. Although standard C++ implements such assignment rules, it has the luxury of requiring that all types be defined at compile time. Things get much more complicated when Reflection and and generics are added to the list.
Presently, the rules in .net specify that a storage location may be set to the default value for its type--regardless of what that type is--by zeroing out all the bytes. They further specify that any storage location can be copied to another of the same type by copying all the bytes. These rules apply to all types, including generics. Given two variables of type KeyValuePair<t1,t2>, the system can copy one to another without having to know anything but the size and alignment requirements of that type. If it were possible for t1, t2, or the type of any field within either of those types, to implement a copy constructor, code which copied one struct instance to another would have to be much more complicated.
That's not to say that such an ability offer some significant benefits--it's possible that, were a new framework being designed, the benefits of custom value assignment operators and default constructors would exceed the costs. The costs of implementation, however, would be substantial in a new framework, and likely insurmountable for an existing one.

This code is working for me:
public class Class1
{
...
public static implicit operator Class1(Class2 value)
{
Class1 result = new Class1();
result.property = value.prop;
return result;
}
}

Type of Overriding Assignment
There are two type to Override Assignment:
When you feel that user may miss something, and you want force user to use 'casting'
like float to integer, when you loss the floating value
int a = (int)5.4f;
When you want user to do that without even notice that s/he changing the object type
float f = 5;
How to Override Assignment
For 1, use of explicit keyword:
public static explicit override ToType(FromType from){
ToType to = new ToType();
to.FillFrom(from);
return to;
}
For 2, use of implicit keyword:
public static implicit override ToType(FromType from){
ToType to = new ToType();
to.FillFrom(from);
return to;
}
Update:
Note: that this implementation can take place in either the FromType or ToType class, depending on your need, there's no restriction, one of your class can hold all the conversions, and the other implements no code for this.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.