Consider the following executable example:
namespace MyNamespace;
public record struct Record()
{
public bool DoSomething { get; set; } = false;
public void SetDoSomething(bool newValue)
{
DoSomething = newValue;
}
}
public static class Program
{
public static readonly Record MyObject = new();
public static void Main()
{
MyObject.SetDoSomething(true);
Console.WriteLine($"MyObject.DoSomething: {MyObject.DoSomething}");
/* Output:
* false - current version
* true - if MyObject is not readonly or Record is defined as record class
*/
}
}
I'm trying to understand, why DoSomething is still false, after calling the method which sets the property to true.
My guess is, that a copy gets created when calling the method. It makes sense that this does not happen if Record is a reference type (record class). But why gets MyObject not copied if I remove the readonly modifier?
It is called Defensive Copy, which is performed by the C# compilers to enforce the semantic of the value types, it is generally not recommended to mark readonly on a non-readonly struct since such things will happen and further causes performance regression, there're also some similar scenarios worth mentioning, more specifically:
x.Y causes a defensive copy of the x if:
x is a readonly field and
the type of x is a non-readonly struct and
Y is not a field.
The same rules are applied when x is an in-parameter, ref readonly local variable or a result of a method invocation that returns a value by readonly reference.
The record modifier here really doesn't matter, you mark the field with value type as readonly so the compiler thinks that it should preserve the semantic, i.e., the immutability of value types through and through. When you invoke a method or access a property of that field, the compiler won't know if the method or property is actually side-effect free, so it makes a conservative decision, that is, the defensive copy to avoid it.
you can check more information at The ‘in’-modifier and the readonly structs in C# and Avoiding struct and readonly reference performance pitfalls with ErrorProne.NET
The behaviour you see is present not only in record structs, but also non-record structs too. Try removing the keyword record and the () after the name Record, and see the same behaviour.
This is just how calling mutating methods on structs are supposed to work. When you call a mutating method on a struct variable, say x.F(), you actually pass a reference to x, then that reference can be mutated by F.
For example, if Record is a non-record struct, and MyObject is not readonly, MyObject.SetDoSomething(true); is compiled to the following IL (Try it yourself with SharpLab):
ldsflda valuetype Record Program::MyObject
ldc.i4.1
call instance void Record::SetDoSomething(bool)
ldsflda means "load static field address". I've only found a small section of the spec that talks about this when it is talking about boxing of structs (emphasis mine):
Similarly, boxing never implicitly occurs when accessing a member on a constrained type parameter when the member is implemented within the value type. For example, suppose an interface ICounter contains a method Increment, which can be used to modify a value. If ICounter is used as a constraint, the implementation of the Increment method is called with a reference to the variable that Increment was called on, never a boxed copy.
Basically, if you don't box structs (you clearly don't here!), their methods are supposed to be called by reference. No copies are supposed to be made.
On the other hand, if you call x.F() but x is readonly, you obviously can't translate it to the same code above, since that would mutate the field. What the compiler does, according to SharpLab, is:
ldsfld valuetype Record Program::MyObject
stloc.0
ldloca.s 0
ldc.i4.1
call instance void Record::SetDoSomething(bool)
Basically, it loads the value of the struct to a temporary variable first, and then pass the reference of that variable to SetDoSomething.
var temp = MyObject;
temp.SetDoSomething();
Hence the "copy" behaviour that you see.
Related
In C#, if I have the following struct:
internal struct myStruct : IDisposable
{
public int x;
public void Dispose()
{
x = 0;
}
}
then do this in Main:
using (myStruct myStruct = new myStruct())
{
myStruct.x = 5;
}
it fails saying that myStruct is readonly. That makes sense as myStruct is a value-type.
Now if I add the folling function to the struct:
public void myfunc(int x)
{
this.x = x;
}
and change the Main code to this:
using (myStruct myStruct = new myStruct())
{
myStruct.myfunc(5);
Console.WriteLine(myStruct.x);
}
it works. Why ?
The short answer is "because the C# specification says so". Which, I admit, may be a bit unsatisfying. But that's how it is.
The motivation is, I'm sure, as commenter Blogbeard suggests: while it's practical to enforce read-only on the field access, it's not practical to do so from within a type. After all, the type itself has no way to know how a variable containing a value of that type was declared.
The key part of the C# specification (from the v5.0 spec) is here, on page 258 (in the section on the using statement):
Local variables declared in a resource-acquisition are read-only, and must include an initializer. A compile-time error occurs if the embedded statement attempts to modify these local variables (via assignment or the ++ and operators), take the address of them, or pass them as ref or out parameters.
Since in the case of a value type, the variable itself contains the value of the object rather than a reference to an object, modifying any field of the object via that variable is the same as modifying the variable, and is so a "modification via assignment", which is specifically prohibited by the specification.
This is exactly the same as if you had declared the value type variable as a field in another object, with the readonly modifier.
But note that this is a compile-time rule, enforced by the C# compiler, and that there's no way for the compiler to similarly enforce the rule for a value type that modifies itself.
I will point out that this is one of many excellent reasons that one should never ever implement a mutable value type. Mutable value types frequently wind up being able to be modified when you don't want them to be, while at the same time find themselves failing to be modified when you do want them to be (in completely different scenarios from this one).
If you treat a value type as something that is truly a value, i.e. a single value that is itself never changing, they work much better and find themselves in the middle of many fewer bugs. :)
Apparently you can change the this value from anywhere in your struct (but not in classes):
struct Point
{
public Point(int x, int y)
{
this = new Point();
X = x; Y = y;
}
int X; int Y;
}
I've neither seen this before nor ever needed it. Why would one ever want to do that? Eric Lippert reminds us that a feature must be justified to be implemented. What great use case could justify this? Are there any scenarios where this is invaluable? I couldn't find any documentation on it1.
Also, for calling constructors there is already a better known alternative syntax, so this feature is sometimes redundant:
public Point(int x, int y)
: this()
{
X = x; Y = y;
}
I found this feature in an example in Jeffrey Richter's CLR via C# 4th edition.
1) Apparently it is in the C# specification.
Good question!
Value types are, by definition, copied by value. If this was not actually an alias to a storage location then the constructor would be initializing a copy rather than initializing the variable you intend to initialize. Which would make the constructor rather less useful! And similarly for methods; yes, mutable structs are evil but if you are going to make a mutable struct then again, this has to be the variable that is being mutated, not a copy of its value.
The behaviour you are describing is a logical consequence of that design decision: since this aliases a variable, you can assign to it, same as you can assign to any other variable.
It is somewhat odd to assign directly to this like that, rather than assigning to its fields. It is even more odd to assign directly to this and then overwrite 100% of that assignment!
An alternative design which would avoid making this an alias to the receiver's storage would be to allocate this off the short-term storage pool, initialize it in the ctor, and then return it by value. The down side of that approach is that it makes copy elision optimizations pretty much impossible, and it makes ctors and methods weirdly inconsistent.
Also, I couldn't find any documentation on it.
Did you try looking in the C# spec? Because I can find documentation on it (7.6.7):
When this is used in a primary-expression within an instance constructor of a struct, it is classified as a variable. The type of the variable is the instance type (§10.3.1) of the struct within which the usage occurs, and the variable represents the struct being constructed. The this variable of an instance constructor of a struct behaves exactly the same as an out parameter of the struct type—in particular, this means that the variable must be definitely assigned in every execution path of the instance constructor.
When this is used in a primary-expression within an instance method or instance accessor of a struct, it is classified as a variable. The type of the variable is the instance type (§10.3.1) of the struct within which the usage occurs.
If the method or accessor is not an iterator (§10.14), the this variable represents the struct for which the method or accessor was invoked, and behaves exactly the same as a ref parameter of the struct type.
If the method or accessor is an iterator, the this variable represents a copy of the struct for which the method or accessor was invoked, and behaves exactly the same as a value parameter of the struct type.
As to a use case for it, I can't immediately think of many - about the only thing I've got is if the values you want to assign in the constructor are expensive to compute, and you've got a cached value you want to copy into this, it might be convenient.
A storage location of value type in an aggregation of storage locations comprising that type's public and private fields. Passing a value type an an ordinary (value) parameter will physically and semantically pass the contents of all its fields. Passing a value type as a ref parameter is semantically pass the contents of all its fields, though a single "byref" is used to pass all of them.
Calling a method on a struct is equivalent to passing the struct (and thus all its fields) as a ref parameter, except for one wrinkle: normally, neither C# nor vb.net will allow a read-only value to be passed as a ref parameter. Both, however, will allow struct methods to be invoked on read-only values or temporary values. They do this by making a copy of all the struct (and thus all of its fields), and then passing that copy as a ref parameter.
Because of this behavior, some people call mutable structs "evil", but the only thing that's evil is the fact that neither C# or vb.net defines any attribute to indicate whether a struct member or property should be invokable on things that can't be directly passed by ref.
I understand that the decision to use a value type over a reference type should be based on the semantics, not performance. I do not understand why value types can legally contain reference type members? This is for a couple reasons:
For one, we should not build a struct to require a constructor.
public struct MyStruct
{
public Person p;
// public Person p = new Person(); // error: cannot have instance field initializers in structs
MyStruct(Person p)
{
p = new Person();
}
}
Second, because of value type semantics:
MyStruct someVariable;
someVariable.p.Age = 2; // NullReferenceException
The compiler does not allow me to initialize Person at the declaration. I have to move this off to the constructor, rely on the caller, or expect a NullReferenceException. None of these situations are ideal.
Does the .NET Framework have any examples of reference types within value types? When should we do this (if ever)?
Instances of a value type never contain instances of a reference type. The reference-typed object is somewhere on the managed heap, and the value-typed object may contain a reference to the object. Such a reference has a fixed size. It is perfectly common to do this — for example every time you use a string inside a struct.
But yes, you cannot guarantee initialization of a reference-typed field in a struct because you cannot define a parameter-less constructor (nor can you guarantee it ever gets called, if you define it in a language other than C#).
You say you should "not build a struct to require a constructor". I say otherwise. Since value-types should almost always be immutable, you must use a constructor (quite possibly via a factory to a private constructor). Otherwise it will never have any interesting contents.
Use the constructor. The constructor is fine.
If you don't want to pass in an instance of Person to initialize p, you could use lazy initialization via a property. (Because obviously the public field p was just for demonstration, right? Right?)
public struct MyStruct
{
public MyStruct(Person p)
{
this.p = p;
}
private Person p;
public Person Person
{
get
{
if (p == null)
{
p = new Person(…); // see comment below about struct immutability
}
return p;
}
}
// ^ in most other cases, this would be a typical use case for Lazy<T>;
// but due to structs' default constructor, we *always* need the null check.
}
There are two primary useful scenarios for a struct holding a class-type field:
The struct holds a possibly-mutable reference to an immutable object (`String` being by far the most common). An reference to an immutable object will behave as a cross between a nullable value type and a normal value type; it doesn't have the "Value" and "HasValue" properties of the former, but it will have null as a possible (and default) value. Note that if the field is accessed through a property, that property may return a non-null default when the field is null, but should not modify the field itself.
The struct holds an "immutable" reference to a possibly-mutable object and serves to wrap the object or its contents. `List.Enumerator` is probably the most common struct using this pattern. Having struct fields pretend to be immutable is something of a dodgy construct(*), but in some contexts it can work out pretty well. In most instances where this pattern is applied, the behavior of a struct will be essentially like that of a class, except that performance will be better(**).
(*) The statement structVar = new structType(whatever); will create a new instance of structType, pass it to the constructor, and then mutate structVar by copying all public and private fields from that new instance into structVar; once that is done, the new instance will be discarded. Consequently, all struct fields are mutable, even if they "pretend" to be otherwise; pretending they are immutable can be dodgy unless one knows that the way structVar = new structType(whatever); is actually implemented will never pose a problem.
(**) Structs will perform better in some circumstances; classes will perform better in others. Generally, so-called "immutable" structs are chosen over classes in situations where they are expected to perform better, and where the corner cases where their semantics would differ from those of classes are not expected to pose problems.
Some people like to pretend that structs are like classes, but more efficient, and dislike using structs in ways that take advantage of the fact that they're not classes. Such people would probably only be inclined toward using scenario (2) above. Scenario #1 can be very useful with mutable structs, especially with types like String which behave essentially as values.
I wanted to add to Marc's answer, but I had too much to say for a comment.
If you look at the C# specifications, it says of struct constructors:
Struct constructors are invoked with the new operator, but that does
not imply that memory is being allocated. Instead of dynamically
allocating an object and returning a reference to it, a struct
constructor simply returns the struct value itself (typically in a
temporary location on the stack), and this value is then copied as
necessary.
(You can find a copy of the spec under
C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC#\Specifications\1033)
So, a struct constructor is inherently different than a class constructor.
In addition to this, structs are expected to be copied by value, and thus:
With structs, the variables each have their own copy of the data, and
it is not possible for operations on one to affect the other.
Any time I've seen a reference type in a struct, it has been a string. This works because strings are immutable. I am guessing your Person object is not immutable and can introduce very odd and severe bugs because of the divergence from the expected behavior of a struct.
That being said, the errors you're seeing with the constructor of your struct may be that you have a public field p with the same name as your parameter p and not referring to the struct's p as this.p, or that you're missing the keyword struct.
I'm using Visual Studio 2010 + ReSharper and it shows a warning on the following code:
if (rect.Contains(point))
{
...
}
rect is a readonly Rectangle field, and ReSharper shows me this warning:
"Impure Method is called for readonly field of value type."
What are impure methods and why is this warning being shown to me?
First off, Jon, Michael and Jared's answers are essentially correct but I have a few more things I'd like to add to them.
What is meant by an "impure" method?
It is easier to characterize pure methods. A "pure" method has the following characteristics:
Its output is entirely determined by its input; its output does not depend on externalities like the time of day or the bits on your hard disk. Its output does not depend on its history; calling the method with a given argument twice should give the same result.
A pure method produces no observable mutations in the world around it. A pure method may choose to mutate private state for efficiency's sake, but a pure method does not, say, mutate a field of its argument.
For example, Math.Cos is a pure method. Its output depends only on its input, and the input is not changed by the call.
An impure method is a method which is not pure.
What are some of the dangers of passing readonly structs to impure methods?
There are two that come to mind. The first is the one pointed out by Jon, Michael and Jared, and this is the one that ReSharper is warning you about. When you call a method on a struct, we always pass a reference to the variable that is the receiver, in case the method wishes to mutate the variable.
So what if you call such a method on a value, rather than a variable? In that case, we make a temporary variable, copy the value into it, and pass a reference to the variable.
A readonly variable is considered a value, because it cannot be mutated outside the constructor. So we are copying the variable to another variable, and the impure method is possibly mutating the copy, when you intend it to mutate the variable.
That's the danger of passing a readonly struct as a receiver. There is also a danger of passing a struct that contains a readonly field. A struct that contains a readonly field is a common practice, but it is essentially writing a cheque that the type system does not have the funds to cash; the "read-only-ness" of a particular variable is determined by the owner of the storage. An instance of a reference type "owns" its own storage, but an instance of a value type does not!
struct S
{
private readonly int x;
public S(int x) { this.x = x; }
public void Badness(ref S s)
{
Console.WriteLine(this.x);
s = new S(this.x + 1);
// This should be the same, right?
Console.WriteLine(this.x);
}
}
One thinks that this.x is not going to change because x is a readonly field and Badness is not a constructor. But...
S s = new S(1);
s.Badness(ref s);
... clearly demonstrates the falsity of that. this and s refer to the same variable, and that variable is not readonly!
An impure method is one which isn't guaranteed to leave the value as it was.
In .NET 4, you can decorate methods and types with [Pure] to declare them to be pure, and R# will take notice of this. Unfortunately, you can't apply it to someone else's members, and you can't convince R# that a type/member is pure in a .NET 3.5 project as far as I'm aware. (This bites me in Noda Time all the time.)
The idea is that if you're calling a method which mutates a variable, but you call it on a read-only field, it's probably not doing what you want, so R# will warn you about this. For example:
public struct Nasty
{
public int value;
public void SetValue()
{
value = 10;
}
}
class Test
{
static readonly Nasty first;
static Nasty second;
static void Main()
{
first.SetValue();
second.SetValue();
Console.WriteLine(first.value); // 0
Console.WriteLine(second.value); // 10
}
}
This would be a really useful warning if every method which was actually pure was declared that way. Unfortunately they're not, so there are a lot of false positives :(
The short answer is that this is a false positive, and you can safely ignore the warning.
The longer answer is that accessing a read-only value type creates a copy of it, so that any changes to the value made by a method would only affect the copy. ReSharper doesn't realize that Contains is a pure method (meaning it has no side effects). Eric Lippert talks about it here: Mutating Readonly Structs
It sounds like ReSharper believes that the method Contains can mutate the rect value. Because rect is a readonly struct, the C# compiler makes defensive copies of the value to prevent the method from mutating a readonly field. Essentially, the final code looks like this:
Rectangle temp = rect;
if (temp.Contains(point)) {
...
}
ReSharper is warning you here that Contains may mutate rect in a way that would be immediately lost because it happened on a temporary.
An Impure method is a method that could have side-effects. In this case, ReSharper seems to think it could change rect. It probably doesn't but the chain of evidence is broken.
I am reading Eric Liperts' blog about Mutating Readonly Structs and I see many references here in SO to this blog as an argument why value types must be immutable.
But still one thing is not clear, says that when you access value type you always get the copy of it and here is the example :
struct Mutable
{
private int x;
public int Mutate()
{
this.x = this.x + 1;
return this.x;
}
}
class Test
{
public readonly Mutable m = new Mutable();
static void Main(string[] args)
{
Test t = new Test();
System.Console.WriteLine(t.m.Mutate());
System.Console.WriteLine(t.m.Mutate());
System.Console.WriteLine(t.m.Mutate());
}
}
And the question is this why when I change the
public readonly Mutable m = new Mutable();
to
public Mutable m = new Mutable();
everything starts to work es expected.
Please can you explain more clear why Value Types must be immutable.
I know that it is good for thread safety, but in this case same can be applied to reference types.
Structs with mutating methods behave strangely in several situations.
The example you already discovered is a readonly field. A defensive copy is necessary because you don't want to mutate a readonly field.
But also when used as properties. Once again an implicit copy happens, and only the copy is mutated. Even if the property has a setter.
struct Mutable
{
private int x;
public int Mutate()
{
this.x = this.x + 1;
return this.x;
}
}
Mutable property{get;set;}
void Main()
{
property=new Mutable();
property.Mutate().Dump();//returns 1
property.Mutate().Dump();//returns 1 :(
}
This shows that mutating methods are problematic on structs. But it doesn't show that a mutable struct with either public fields or properties that have a setter is problematic.
The thread-safety is a clear technical reason. It applies to value types as well as to reference types (see System.String).
The more general guideline "value types should be immutable" is different. It is about readability of code, and comes mainly from the confusion that mutable values can cause. This code snippet is just one example. Most people would not expect the 1,1,1 outcome.
I don't know C# so I'll try to answer the 2nd part of your question.
Why value types must be immutable?
There are two types of objects from Domain Driven Design's point of view:
value objects/types - their identity is determined by their value (e.g. numbers: 2 is always 2 - an identity of number two is always the same, so 2 == 2 is always true)
entities (reference types) - they can consist of other value types and their identity is determined by their identity itself (e.g. people: even if there was man looking exactly like you, it wouldn't be you)
If value types were mutable, then imagine what could happen if it would be possible to change the value of the number two: 2 == 1 + 1 wouldn't be guarantied to be true.
See these links for more:
Value vs Entity objects (Domain Driven Design)
http://devlicio.us/blogs/casey/archive/2009/02/13/ddd-entities-and-value-objects.aspx
I think the tricky thing about that example is that one could argue it shouldn't be possible. You made an instance of Mutable read-only and yet you can change its value through the Mutate() function, therefore violating the concept of immutability, in a sense. Strictly speaking, however, it works because the private field x is not readonly. If you make one simple change in the mutable class then immutability will actually be enforced:
private readonly int x;
Then the Mutate() function will produce a compiler error.
The example shows clearly how copy-by-value works in the context of readonly variables. Whenever you call m you are creating a copy of the instance, as opposed to a copy of a reference to the instance -- the latter would occur if Mutable were a class instead of a struct.
Since everytime you call m you are calling 1) a copy of the instance, and 2) a copy of an instance that is read-only, the value of x is always going to be 0 at the time the copying takes place. When you call Mutate() on the copy it increments x to 1, which works because x itself is NOT readonly. But next time you call Mutate() you are still calling it on the original default value of 0. As he says in the article "m is immutable, but the copy is not". Every copy of the original instance will have x as 0 because the object being copied never changes whereas its copies can be changed.
Maybe that helps.