Why does Microsoft advise against readonly fields with mutable values? - c#

In the Design Guidelines for Developing Class Libraries, Microsoft say:
Do not assign instances of mutable types to read-only fields.
The objects created using a mutable type can be modified after they are created. For example, arrays and most collections are mutable types while Int32, Uri, and String are immutable types. For fields that hold a mutable reference type, the read-only modifier prevents the field value from being overwritten but does not protect the mutable type from modification.
This simply restates the behaviour of readonly without explaining why it's bad to use readonly. The implication appears to be that many people do not understand what "readonly" does and will wrongly expect readonly fields to be deeply immutable. In effect it advises using "readonly" as code documentation indicating deep immutability - despite the fact that the compiler has no way to enforce this - and disallows its use for its normal function: to ensure that the value of the field doesn't change after the object has been constructed.
I feel uneasy with this recommendation to use "readonly" to indicate something other than its normal meaning understood by the compiler. I feel that it encourages people to misunderstand the meaning of "readonly", and furthermore to expect it to mean something that the author of the code might not intend. I feel that it precludes using it in places it could be useful - e.g. to show that some relationship between two mutable objects remains unchanged for the lifetime of one of those objects. The notion of assuming that readers do not understand the meaning of "readonly" also appears to be in contradiction to other advice from Microsoft, such as FxCop's "Do not initialize unnecessarily" rule, which assumes readers of your code to be experts in the language and should know that (for example) bool fields are automatically initialised to false, and stops you from providing the redundancy that shows "yes, this has been consciously set to false; I didn't just forget to initialize it".
So, first and foremost, why do Microsoft advise against use of readonly for references to mutable types? I'd also be interested to know:
Do you follow this Design Guideline in all your code?
What do you expect when you see "readonly" in a piece of code you didn't write?

It seems natural that if a field is readonly, you would expect to not be able to change the value or anything having to do with it. If I knew that Bar was a readonly field of Foo, I could obviously not say
Foo foo = new Foo();
foo.Bar = new Baz();
But I can get away with saying
foo.Bar.Name = "Blah";
If the object backing Bar is, in fact, mutable. Microsoft is simply recommending against that subtle, counterintuitive behavior by suggesting that readonly fields be backed by immutable objects.

I agree with you completely, and I do sometimes use readonly in my code for mutable reference types.
As an example: I might have some private or protected member -- say, a List<T> -- which I use within a class's methods in all its mutable glory (calling Add, Remove, etc.). I may simply want to put a safeguard in place to ensure that, no matter what, I am always dealing with the same object. This protects both me and other developers from doing something stupid: namely, assigning the member to a new object.
To me, this is often a preferable alternative to using a property with a private set method. Why? Because readonly means the value cannot be changed after instantiation, even by the base class.
In other words, if I had this:
protected List<T> InternalList { get; private set; }
Then I could still set InternalList = new List<T>(); at any arbitrary point in code in my base class. (This would require a very foolish error on my part, yes; but it would still be possible.)
On the other hand, this:
protected readonly List<T> _internalList;
Makes it unmistakably clear that _internalList can only ever refer to one particular object (the one to which _internalList is set in the constructor).
So I am on your side. The idea that one should refrain from using readonly on a mutable reference type is frustrating to me personally, as it basically presupposes a misunderstanding of the readonly keyword.

DO NOT assign instances of mutable types to readonly fields.
I had a quick look in the Framework Design Guidelines book (pages 161-162), and it basically states what you've already noticed yourself. There's an additional comment by Joe Duffy that explains the guideline's raison-d'être:
What this guideline is trying to protect you from is believing you've exposed a deeply immutable object graph when in fact it is shallow, and then writing code that assumes the whole graph is immutable. — Joe Duffy
I personally think that the keyword readonly was named badly. The fact that it only specifies the const-ness of the reference, and not of the const-ness of the referenced object, easily creates misleading situations.
I think it would have been preferable if readonly made referenced objects immutable, too, and not just the reference, because that is what the keyword implies.
To remedy this unfortunate situation, the guideline was made up. While I think that its advice is sound from the human point of view (it's not always obvious which types are mutable and which aren't without looking up their definition, and the word suggests deep immutability), I sometimes wish that, when it comes to declaring const-ness, C# would offer a freedom similar to that offered by C++, where you can define const either on the pointer, or on the pointed-to-object, or on both or nothing.

The syntax you are looking for is supported by the C++/CLI language:
const Example^ const obj;
The first const makes the referenced object immutable, the 2nd makes the reference immutable. The latter is equivalent to the readonly keyword in C#. Attempts to evade it produce a compile error:
Test^ t = gcnew Test();
t->obj = gcnew Example(); // Error C3892
t->obj->field = 42; // Error C3892
Example^ another = t->obj; // Error C2440
another->field = 42;
It is however smoke and mirrors. The immutability is verified by the compiler, not by the CLR. Another managed language could modify both. Which is the root of the problem, the CLR just doesn't have support for it.

Microsoft has a few such peculiar advices. The other one that immediately springs to mind is not to nest generic types in public members, like List<List<int>>. I try to avoid these constructs where easily possible, but ignore the newbie-friendly advice when I feel the use is justified.
As for readonly fields - I try to avoid public fields as such, instead going for properties. I think there was a piece of advice about that too, but more importantly there are cases now and then when a field doesn't work while a property does (mostly it has to do with databinding and/or visual designers). By making all public fields properties I avoid any potential problems.

In the end they are just guidelines. I know for a fact that the people at Microsoft often don't follow all of the guidelines.

This is legal in C# (simple Console App)
readonly static object[] x = new object[2] { true, false };
static void Main(string[] args)
{
Console.WriteLine("Hello World!");
x[0] = false;
x[1] = true;
Console.WriteLine("{0} {1}", x[0], x[1]); //prints "false true"
Console.ReadLine();
}
that would work. but that wouldn't make sense. bear in mind the variable x is readonly, and has not changed (i.e. the ref of x has not changed indeed). but that's not what we meant when we said "readonly x", is it? so don't use readonly fields with mutable values. It's confusing and counter-intuitive.

Related

How to determine if .NET (BCL) type is immutable

From this Answer, I came to know that KeyValuePair are immutables.
I browsed through the docs, but could not find any information regarding immutable behavior.
I was wondering how to determine if a type is immutable or not?
I don't think there's a standard way to do this, since there is no official concept of immutability in C#. The only way I can think of is looking at certain things, indicating a higher probability:
1) All properties of the type have a private set
2) All fields are const/readonly or private
3) There are no methods with obvious/known side effects
4) Also, being a struct generally is a good indication (if it is BCL type or by someone with guidelines for this)
Something like an ImmutabeAttribute would be nice. There are some thoughts here (somewhere down in the comments), but I haven't seen one in "real life" yet.
The first indication would be that the documentation for the property in the overview says "Gets the key in the key/value pair."
The second more definite indication would be in the description of the property itself:
"This property is read/only."
I don't think you can find "proof" of immutability by just looking at the docs, but there are several strong indicators:
It's a struct (why does this matter?)
It has no settable public properties (both are read-only)
It has no obvious mutator methods
For definitive proof I recommend downloading the BCL's reference source from Microsoft or using an IL decompiler to show you how a type would look like in code.
A KeyValuePair<T1,T2> is a struct which, absent Reflection, can only be mutated outside its constructor by copying the contents of another KeyValuePair<T1,T2> which holds the desired values. Note that the statement:
MyKeyValuePair = new KeyValuePair(1,2);
like all similar constructor invocations on structures, actually works by creating a new temporary instance of KeyValuePair<int,int> (happens before the constructor itself executes), setting the field values of that instance (done by the constructor), copying all public and private fields of that new temporary instance to MyKeyValuePair, and then discarding the temporary instance.
Consider the following code:
static KeyValuePair MyKeyValuePair; // Field in some class
// Thread1
MyKeyValuePair = new KeyValuePair(1,1);
// ***
MyKeyValuePair = new KeyValuePair(2,2);
// Thread2
st = MyKeyValuePair.ToString();
Because MyKeyValuePair is precisely four bytes in length, the second statement in Thread1 will update both fields simultaneously. Despite that, if the second statement in Thread1 executes between Thread2's evaluation of MyKeyValuePair.Key.ToString() and MyKeyValuePair.Value.ToString(), the second ToString() will act upon the new mutated value of the structure, even though the first already-completed ToString()operated upon the value before the mutation.
All non-trivial structs, regardless of how they are declared, have the same immutability rules for their fields: code which can change a struct can change its fields; code which cannot change a struct cannot change its fields. Some structs may force one to go through hoops to change one of their fields, but designing struct types to be "immutable" is neither necessary nor sufficient to ensure the immutability of instances. There are a few reasonable uses of "immutable" struct types, but such use cases if anything require more care than is necessary for structs with exposed public fields.

const, readonly and mutable value types

I'm continuing my study of C# and the language specification and Here goes another behavior that I don't quite understand:
The C# Language Specification clearly states the following in section 10.4:
The type specified in a constant declaration must be sbyte, byte, short, ushort, int, uint, long, ulong, char, float, double, decimal, bool, string, an enum-type, or a reference-type.
It also states in section 4.1.4 the following:
Through const declarations it is possible to declare constants of the simple types (§10.4). It is not possible to have constants of other struct types, but a similar effect is provided by static readonly fields.
Ok, so a similar effect can be gained by using static readonly. Reading this I went and tried the following code:
static void Main()
{
OffsetPoints();
Console.Write("Hit a key to exit...");
Console.ReadKey();
}
static Point staticPoint = new Point(0, 0);
static readonly Point staticReadOnlyPoint = new Point(0, 0);
public static void OffsetPoints()
{
PrintOutPoints();
staticPoint.Offset(1, 1);
staticReadOnlyPoint.Offset(1, 1);
Console.WriteLine("Offsetting...");
Console.WriteLine();
PrintOutPoints();
}
static void PrintOutPoints()
{
Console.WriteLine("Static Point: X={0};Y={1}", staticPoint.X, staticPoint.Y);
Console.WriteLine("Static readonly Point: X={0};Y={1}", staticReadOnlyPoint.X, staticReadOnlyPoint.Y);
Console.WriteLine();
}
The output of this code is:
Static Point: X=0;Y=0
Static readonly Point: X=0;Y=0
Offsetting...
Static Point: X=1;Y=1
Static readonly Point: X=0;Y=0
Hit a key to exit...
I really expected the compiler to give me some kind of warning about mutating a static readonly field or failing that, to mutate the field as it would with a reference type.
I know mutable value types are evil (why did Microsoft ever implement Point as mutable is a mystery) but shouldn't the compiler warn you in some way that you are trying to mutate a static readonly value type? Or at least warn you that your Offset() method will not have the "desired" side effects?
Eric Lippert explains what's going on here:
...if the field is readonly and the reference occurs outside an
instance constructor of the class in which the field is declared, then
the result is a value, namely the value of the field I in the object
referenced by E.
The important word here is that the result is the value of the field,
not the variable associated with the field. Readonly fields are not
variables outside of the constructor. (The initializer here is
considered to be inside the constructor; see my earlier post on that
subject.)
Oh and just to stress on the evilness of mutable structs, here is his conclusion:
This is yet another reason why mutable value types are evil. Try to
always make value types immutable.
The point of the readonly is that you cannot reassign the reference or value.
In other words if you attempted this
staticReadOnlyPoint = new Point(1, 1);
you would get a compiler error because you are attempting to reassign staticReadOnlyPoint. The compiler will prevent you from doing this.
However, readonly doesn't enforce whether the value or referenced object itself is mutable - that is a behaviour that is designed into the class or struct by the person creating it.
[EDIT: to properly address the odd behaviour being described]
The reason you see the behaviour where staticReadOnlyPoint appears to be immutable is not because it is immutable itself, but because it is a readonly struct. This means that every time you access it, you are taking a full copy of it.
So your line
staticReadOnlyPoint.Offset(1, 1);
is accessing, and mutating, a copy of the field, not the actual value in the field. When you subsequently write out the value you are then writing out yet another copy of the original (not the mutated copy).
The copy you did mutate with the call to Offset is discarded, because it is never assigned to anything.
The compiler simply doesn't have enough information available about a method to know that the method mutates the struct. A method may well have a side-effect that's useful but doesn't otherwise change any members of the struct. If would technically be possible to add such analysis to the compiler. But that won't work for any types that live in another assembly.
The missing ingredient is a metadata token that indicates that a method doesn't mutate any members. Like the const keyword in C++. Not available. It would have be drastically non-CLS compliant if it was added in the original design. There are very few languages that support the notion. I can only think of C++ but I don't get out much.
Fwiw, the compiler does generate explicit code to ensure that the statement cannot accidentally modify the readonly. This statement
staticReadOnlyPoint.Offset(1, 1);
gets translated to
Point temp = staticReadOnlyPoint; // makes a copy
temp.Offset(1, 1);
Adding code that then compares the value and generates a runtime error is also only technically possible. It costs too much.
The observed behavior is an unfortunate consequence of the fact that neither the Framework nor C# provides any means by which member function declarations can specify whether this should be passed by ref, const-ref, or value. Instead, value types always pass this by (non-const-restricted) ref, and reference types always pass this by value.
The 'proper' behavior for a compiler would be to forbid passing immutable or temporary values by non-const-restricted ref. If such restriction could be imposed, ensuring proper semantics for mutable value types would mean following a simple rule: if you make an implicit copy of a struct, you're doing something wrong. Unfortunately, the fact that member functions can only accept this by non-const-restricted ref means a language designer must make one of three choices:
Guess that a member function won't modify `this`, and simply pass immutable or temporary variables by `ref`. This would be most efficient for functions which do not, in fact, modify `this`, but could dangerously expose to modification things that should be immutable.
Don't allow member functions to be used on immutable or temporary entities. This would avoid improper semantics, but would be a really annoying restriction, especially given that most member functions do not modify `this`.
Allow the use of member functions except those deemed most likely to modify `this` (e.g. property setters), but instead of passing immutable entities directly by ref, copy them to temporary locations and pass those.
Microsoft's choice protects constants from improper modification, but has the unfortunate consequences that code will run needlessly slowly when calling functions that don't modify this, while generally working incorrectly for those which do.
Given the way this is actually handled, one's best bet is to avoid making any changes to it in structure member functions other than property setters. Having property setters or mutable fields is fine, since the compiler will correctly forbid any attempt to use property setters on immutable or temporary objects, or to modify any fields thereof.
If you look at the IL, you will see that on usage of the readonly field, a copy is made before calling Offset:
IL_0014: ldsfld valuetype [System.Drawing]System.Drawing.Point
Program::staticReadOnlyPoint
IL_0019: stloc.0
IL_001a: ldloca.s CS$0$0000
Why this is happening, is beyond me.
It could be part of the spec, or a compiler bug (but it looks a bit too intentional for the latter).
The effect is due to several well-defined features coming together.
readonly means that the field in question cannot be changed, but not that the target of the field cannot be changed. This is more easily understood (and more often useful in practice) with readonly fields of a mutable reference type, where you can do x.SomeMutatingMethod() but not x = someNewObject.
So, first item is; you can mutate the target of a readonly field.
Second item is, that when you access a non-variable value type you obtain a copy of the value. The least confusing example of this is giveMeAPoint().Offset(1, 1) because there isn't a known location for us to later observe that the value-type returned by giveMeAPoint() may or may not have been mutated.
This is why value types are not evil, but are in some ways worse. Truly evil code doesn't have a well-defined behaviour, and all of this is well-defined. It's still confusing though (confusing enough for me to get this wrong on my first answer), and confusing is worse than evil when you're trying to code. Easily understood evil is so much more easily avoided.

Is there any benefit to making a C# field read-only if its appropriate?

I am working on a project using ReSharper. On occasion it prompts me that a field can be made readonly. Is there any performance or other benefit to this? I am presuming the benefits would be quite low-level, or would any benefits be purely semantic?
Thanks
With example below the field was initially just private, but resharper prompted to set it as readonly. I understand the reason why it can be set as readonly, ie. its being set in the constructor and not changed again, but just wondering if there are any benefits to this...
public class MarketsController : Controller
{
private readonly IMarketsRepository marketsRepository;
public AnalysisController(IMarketsRepository marketsRepository)
{
this.marketsRepository = marketsRepository;
}
}
Edit
What is the easiest way to look at the MSIL?
The benefit is purely semantic. It will help users of your code explicitly understand that this field can't be changed after object is created. Compiler will prevent unwanted changes of this field. I totally agree with following quote from Python Zen:
Explicit is better than implicit.
Some details:
The only difference between normal field and read-only field is flag initonly in IL. There is no optimization about it (as with constants) because actually it allows all operations (get and set, but only in ctor). It is just hint to compiler: don't let it be changed after construction.
.field public initonly int32 R
It's not so much low-level performance, but more high-level maintainability. Making things readonly is one of the possibilities you have to limit and control the number of places a certain value can be changed. This in turn means that you reduce interdependency between classes (a.k.a. "loose coupling"); the result is an application that has fewer internal dependencies and thus a lower complexity. In other words, readonly fields and properties make your application more maintainable.
It also might help spotting some bugs as well. The value is assigned in a construcotor only and this could be a problem if you forgot to change elsewhere or not. And if it is not supposed to be changed then you mark it as a read only.
My professor taught me back in the day that declaring something readonly is a way of admitting to your computer that you make mistakes.
You might be interested in this answer.
The readonly keyword is used to
declare a member variable a constant,
but allows the value to be calculated
at runtime. This differs from a
constant declared with the const
modifier, which must have its value
set at compile time. Using readonly
you can set the value of the field
either in the declaration, or in the
constructor of the object that the
field is a member of.

Do not declare read only mutable reference types - why not?

I have been reading this question and a few other answers and whilst I get the difference between changing the reference and changing the state of the current instance I'm not certain why this means that I shouldn't mark it readonly. Is this because marking something as readonly tells the compiler something special about the instance and so it is able to then treat it as thread safe when it actually might not be?
Presumably there are situations where I don't want the instance to be able to be changed, but don't mind if the state of the instance is changed (singleton maybe. /me prepares for flames) What are the consequences for marking the instance readonly if I want this?
There are no (runtime/environment-based) consequences. The compiler won't freak out, your runtime won't explode, and everything will generally be fine.
It's only FxCop (a static analysis tool that some people use) that has a warning about this. The reasons for the warning are explained in the thread you've linked to (semantically, it may not be clear that the object isn't actually "readonly", only that the variable can't be "reassigned").
Personally, I disagree with the rule, so I'd just disable it (if you're running FxCop and it is concerning you).
An interesting related point: mutable readonly structs are a bad idea because of situations like this one:
http://ericlippert.com/2008/05/14/mutating-readonly-structs/
Most of the people to whom I've shown this code are unable to correctly predict its output.
Mutable readonly structs have the problem that people think they're mutating them but really they are mutating a copy.
Mutable readonly ref types have the opposite problem. People think they're deeply immutable but in fact they are only shallowly immutable.
I think the point is that it could be misleading - an unwary reader might assume that a readonly variable was effectively a constant, which it wouldn't be.
I would treat it as a suggestion more than a rule - there are certainly cases where it could be reasonable, but should be used with care. (In particular, I'd be nervous if the mutable state were exposed to the outside world.)
These two type declarations are both valid and useful, although they mean different things:
a constant reference to a mutable object.
a mutable reference to a constant object.
If you really want a constant reference to a mutable object it seems objectionable to be forced to make do with a mutable reference for any reason, let alone a bad one. In multi-threaded programming the constant reference could save you having to synchronize on the owning object (you still have to synchronize on the mutable object via the reference, but that's a finer-grained lock).
Unwary readers are probably unwary writers too - can't do much about them short of:
const SourceRepository cvs = new ReadOnlyRepository(...);
C++ has a means of enforcing a constant pointer to a constant object (references are a bit different in C++):
Object const* const constantPointerToConstantObject = ...;
It's one of the few C++ features that I miss in C#/Java.
D has an even better keyword (immutable) that enforces transitive immutability. D's immutable does allow the compiler to make the kind of optimizations that you're worried about and guarantee safety. D also has const for non-transitive const-ness, which implies that both transitive and non-transitive forms are independently useful.

Overhead of using this on structs

When you have automatic properties, C# compiler asks you to call the this constructor on any constructor you have, to make sure everything is initialized before you access them.
If you don't use automatic properties, but simply declare the values, you can avoid using the this constructor.
What's the overhead of using this on constructors in structs? Is it the same as double initializing the values?
Would you recommend not using it, if performance was a top concern for this particular type?
I would recommend not using automatic properties at all for structs, as it means they'll be mutable - if only privately.
Use readonly fields, and public properties to provide access to them where appropriate. Mutable structures are almost always a bad idea, and have all kinds of nasty little niggles.
Do you definitely need to create your own value type in the first place though? In my experience it's very rare to find a good reason to create a struct rather than a class. It may be that you've got one, but it's worth checking.
Back to your original question: if you care about performance, measure it. Always. In this case it's really easy - you can write the struct using an automatic property and then reimplement it without. You could use a #if block to keep both options available. Then you can measure typical situations and see whether the difference is significant. Of course, I think the design implications are likely to be more important anyway :)
Yes, the values will be initialized twice and without profiling it is difficult to say whether or not this performance hit would be significant.
The default constructor of a struct initializes all members to their default values. After this happens your constructor will run in which you undoubtedly set the values of those properties again.
I would imagine this would be no different than the CLR's practice of initializing all fields of a reference type upon instantiation.
The reason the C# compiler requires you to chain to the default constructor (i.e. append : this() to your constructor declaration) when auto-implemented properties are used is because all variables need to be assigned before exiting the constructor. Now, auto-implemented properties mess this up a bit in that they don't allow you to directly access the variables that back the properties. The method the compiler uses to get around this is to automatically assign all the variables to their default values, and to insure this, you must chain to the default constructor. It's not a particularly clever method, but it does the job well enough.
So indeed, this will mean that some variables will end up getting initialised twice. However, I don't think this will be a big performance problem. I would be very surprised it the compiler (or at very least the JIT) didn't simply remove the first initialisation statement for any variable that is set twice in your constructor. A quick benchmark should confirm this for you, though I'm quite sure you will get the suspected results. (If you by chance don't, and you absolutely need the tiny performance boost that avoidance of duplicate initialisation offers, you can just define your properties the normal way, i.e. with backing variables.)
To be honest, my advice would be not even to bother with auto-implemented properties in structures. It's perfectly acceptable just to use public variables in lieu of them, and they offer no less functionality than auto-implemented properties. Classes are a different situation of course, but I really wouldn't hesitate to use public variables in structs. (Any complex properties can be defined normally, if you need them.)
Hope that helps.
Don't use automatic properties with structure types. Simply expose fields directly. If a struct has an exposed public field Foo of type Bar, the fact that Foo is an exposed field of type Bar (information readily available from Intellisense) tells one pretty much everything there is to know about it. By contrast, the fact that a struct Foo has an exposed read-write property of Boz does not say anything about whether writing to Boz will mutate a field in the struct, or whether it will mutate some object to which Boz holds a reference. Exposing fields directly will offer cleaner semantics, and often also result in faster-running code.

Categories

Resources