The true definition of immutability? - c#

I am wondering how immutability is defined? If the values aren't exposed as public, so can't be modified, then it's enough?
Can the values be modified inside the type, not by the customer of the type?
Or can one only set them inside a constructor? If so, in the cases of double initialization (using the this keyword on structs, etc) is still ok for immutable types?
How can I guarantee that the type is 100% immutable?

If the values aren't exposed as public, so can't be modified, then it's enough?
No, because you need read access.
Can the values be modified inside the type, not by the customer of the type?
No, because that's still mutation.
Or can one only set them inside a constructor?
Ding ding ding! With the additional point that immutable types often have methods that construct and return new instances, and also often have extra constructors marked internal specifically for use by those methods.
How can I guarantee that the type is 100% immutable?
In .Net it's tricky to get a guarantee like this, because you can use reflection to modify (mutate) private members.

The previous posters have already stated that you should assign values to your fields in the constructor and then keep your hands off them. But that is sometimes easier said than done. Let's say that your immutable object exposes a property of the type List<string>. Is that list allowed to change? And if not, how will you control it?
Eric Lippert has written a series of posts in his blog about immutability in C# that you might find interesting: you find the first part here.

One thing that I think might be missed in all these answers is that I think that an object can be considered immutable even if its internal state changes - as long as those internal changes are not visible to the 'client' code.
For example, the System.String class is immutable, but I think it would be permitted to cache the hash code for an instance so the hash is only calculated on the first call to GetHashCode(). Note that as far as I know, the System.String class does not do this, but I think it could and still be considered immutable. Of course any of these changes would have to be handled in a thread-safe manner (in keeping with the non-observable aspect of the changes).
To be honest though, I can't think of many reasons one might want or need this type of 'invisible mutability'.

Here is the definition of immutability from Wikipedia (link)
"In object-oriented and functional programming, an immutable object is an object whose state cannot be modified after it is created."
Essentially, once the object is created, none of its properties can be changed. An example is the String class. Once a String object is created it cannot be changed. Any operation done to it actually creates a new String object.

Lots of questions there. I'll try to answer each of them individually:
"I am wondering how immutability is defined?" - Straight from the Wikipedia page (and a perfectly accurate/concise definition)
An immutable object is an object whose state cannot be modified after it is created
"If the values aren't exposed as public, so can't be modified, then it's enough?" - Not quite. It can't be modified in any way whatsoever, so you've got to insure that methods/functions don't change the state of the object, and if performing operations, always return a new instance.
"Can the values be modified inside the type, not by the customer of the type?" - Technically, it can't be modified either inside or by a consumer of the type. In pratice, types such as System.String (a reference type for the matter) exist that can be considered mutable for almost all practical purposes, though not in theory.
"Or can one only set them inside a constructor?" - Yes, in theory that's the only place where state (variables) can be set.
"If so, in the cases of double initialization (using the this keyword on structs, etc) is still ok for immutable types?" - Yes, that's still perfectly fine, because it's all part of the initialisation (creation) process, and the instance isn't returned until it has finished.
"How can I guarantee that the type is 100% immutable?" - The following conditions should insure that. (Someone please point out if I'm missing one.)
Don't expose any variables. They should all be kept private (not even protected is acceptable, since derived classes can then modify state).
Don't allow any instance methods to modify state (variables). This should only be done in the constructor, while methods should create new instances using a particular constructor if they require to return a "modified" object.
All members that are exposed (as read-only) or objects returned by methods must themselves be immutable.
Note: you can't insure the immutability of derived types, since they can define new variables. This is a reason for marking any type you wan't to make sure it immutable as sealed so that no derived class can be considered to be of your base immutable type anywhere in code.
Hope that helps.

I've learned that immutability is when you set everything in the constructor and cannot modify it later on during the lifetime of the object.

The definition of immutability can be located on Google .
Example:
immutable - literally, not able to change.
www.filosofia.net/materiales/rec/glosaen.htm
In terms of immutable data structures, the typical definition is write-once-read-many, in other words, as you say, once created, it cannot be changed.
There are some cases which are slightly in the gray area. For instance, .NET strings are considered immutable, because they can't change, however, StringBuilder internally modifies a String object.

An immutable is essentially a class that forces itself to be final from within its own code. Once it is there, nothing can be changed. In my knowledge, things are set in the constructor and then that's it. I don't see how something could be immutable otherwise.

There's unfortunately no immutable keywords in c#/vb.net, though it has been debated, but if there's no autoproperties and all fields are declared with the readonly (readonly fields can only bet assigned in the constructor) modfier and that all fields is declared of an immutable type you will have assured your self immutability.

An immutable object is one whose observable state can never be changed by any plausible sequence of code execution. An immutable type is one which guarantees that any instances exposed to the outside world will be immutable (this requirement is often stated as requiring that the object's state may only be set in its constructor; this isn't strictly necessary in the case of objects with private constructors, nor is it sufficient in the case of objects which call outside methods on themselves during construction).
A point which other answers have neglected, however, is a definition of an object's state. If Foo is a class, the state of a List<Foo> consists of the sequence of object identities contained therein. If the only reference to a particular List<Foo> instance is held by code which will neither cause that sequence to be changed, nor expose it to code that might do so, then that instance will be immutable, regardless of whether the Foo objects referred to therein are mutable or immutable.
To use an analogy, if one has a list of automobile VINs (Vehicle Identification Numbers) printed on tamper-evident paper, the list itself would be immutable even though cars aren't. Even if the list contains ten red cars today, it might contain ten blue cars tomorrow; they would still, however, be the same ten cars.

Related

C# What is the purpose of a setter for an object when a getter is already implemented?

If there's a class which has a getter for an object, when the getter returns the object you can modify this object outside of its own class container, and this changes will be reflected when you read the object later with the getter again; so, I can't see the goal to set a setter for the object when the getter let me to read and modify the object as well.
Example:
You have a class called CashRegister and this class has an object called queue, if you read queue by means of a CashRegister's getter you can modify queue from the MainClass and the next time you invoke the CashRegister's getter the modifications previously made in MainClass will be present. By the way CLI.PrintAndJump() prints the content of a queue.
class MainClass
{
static void Main(string[] args)
{
Queue<int> tmpQueue, tmpQueue2;
CashRegister aCashRegister = new CashRegister();
tmpQueue = aCashRegister.GetCoinValues();
CLI.PrintAndJump(tmpQueue);
tmpQueue.Enqueue(10);
tmpQueue2 = aCashRegister.GetCoinValues();
CLI.PrintAndJump(tmpQueue2);
}
}
class CashRegister
{
Queue<int> coinValues = new Queue<int>(1);
public Queue<int> GetCoinValues()
{
return (coinValues);
}
}
Output:
1
1, 10
In a nutshell, if you need to modify the object queue, you don't need to set a setter for it (Is this method a good practice?), but what if I want the object to remain immutable?
Thanks.
In the context of C#, we typically talk about "setter" and "getter" methods in relation to a property, so it's kind of odd that your code example doesn't include any properties. But, let's ignore that for a moment, and assume that you might have a corresponding SetCoinValues() method.
The reason for such a method would be if you want to replace the entire Queue<int> object. There is a difference between modifying the Queue<int> object itself, which you can do with only a getter method, and replacing the Queue<int> object with a whole new one, which would require a setter method.
Why one might want to do this varies. It depends on the exact circumstance. And I think it's less likely one might want to replace a queue object, than say some other collection type (like an array or a list), or some other complex type other than a collection. But it could still happen.
Examples of complex types which are used as property values, or in terms of the non-property scenario, might have both a getter and setter method, include System.Windows.Media.Pen.DashStyle and System.Diagnostics.Process.StartInfo. The DashStyle object itself even has properties for setting and getting complex values, including the Dashes property, which is a collection of Double values.
I mention these to emphasize that this really has nothing at all to do with mutable vs. immutable. Both DashStyle and ProcessStartInfo are mutable types, but we still have properties which reference objects of those types and which have setter methods in addition to a getter.
The question of mutability (which seems to be the emphasis of the other answer) is a red herring, and will only distract you from what is really going on. The real point is that, even with mutable, complex types, there are times when you want to be able to replace the entire object, rather than modifying the object currently being held. In those cases, you need a setter method, so that you can change the actual reference for the property, instead of modifying the object the property refers to.
First of all, this is C#, not Java. In C#, we do not write getters and setters; we use properties instead. So, GetXyz() methods in C# are usually not getters, and they are very rarely paired with SetXyz() methods. For example, think of ICollection.GetLength(). Once we have established that, let's move on.
Read-write properties are primarily used for immutable values, an example of which are primitives. I hope you understand that you cannot get for example an int, change its value, and expect the value held by the containing object to also change. You have to put it back. And for this you need a writable property. But you have anticipated that, because in your question you are talking about objects.
Primitives are not the only entities that you cannot modify; there exist plenty of immutable structs and classes that behave the same way. If a property returns to you an object that simply does not offer any methods that you could use to alter its state, the only thing you can do is create a different instance of that object, and put it back in the containing object, for which of course you will need a writable property.
So, writable properties are necessary for writing back immutable entities.
When it comes to mutable objects, you are right, it does not make much sense to have a read-write property, because once you obtain a reference to the mutable object you can mutate it to your heart's content without ever having to "set" it back. For this reason, you will rarely see a setter for a mutable. But it happens some times. Peter Duniho covers an example of this in his answer. (It is mainly done for performance/convenience reasons, and the implied agreement in these cases is that the containing object does not have ownership of the object that is being set into it.)

Does wrapping orthogonal struct fields in value-agnostic read-write properties serve a purpose

Wrapping a struct or class field in a property forces all accesses to that field to go through "getter" and "setter" methods. This allows for the possibility of adding logic for validation, lazy initialization, etc. Further, in the case of class fields, it allows for the possibility that one might have logic which applies to some instances but not others; if the properties are not virtual, it may be difficult to implement such logic efficiently (e.g. one would might have to define a static VerySpecialInstance and have the property getter say if (this == VerySpecialInstance) GetSpecialProperty(); else GetOrdinaryProperty();) but it could be done.
If, however, the semantics of a struct (e.g. System.Drawing.Point) dictate that a particular read-write property may be written with any value which is legal for its type, writing will have no side effect other than to change its value, it will always return the last value written (if any), and if not written it will read as the default value for its type; and if code which uses the type will likely rely upon such assumptions, I'm unclear on what possible benefit would be served by using a read-write property rather than a field to hold the value.
The fact that Microsoft uses properties rather than fields for things like Point.X etc. has historically caused confusion since MyList[3].X = 4; would be translated to MyList[3].Set_X(4), and without looking inside the definition of Set_X it's not possible to tell whether that method would achieve its desired effect without changing any fields of the struct in question; today's C# compiler will guess that it wouldn't work, and will forbid that construct even though there are some struct types where property setters would in fact work just fine. If X been a field rather than a property, and if Microsoft had said that the two safe ways to mutate a struct are either to access the fields directly or to pass the struct as a ref parameter to a mutating method (which, if it's a static method of the struct type, could access public fields), such guesswork would not be necessary.
Given that using exposed struct fields rather than read-write properties improves both performance and semantic clarity, what reasons exist to make struct fields private and wrap them in properties? Data binding requires properties, but I don't think it works with structure types anyway (if one makes a copy of a struct and then sets some property of the original to one value and the corresponding property of the duplicate to another, what value should be reported to the bound object?) Are there some benefits of struct properties of which I'm unaware?
Personally, I think the 'ideal' struct in many cases would simply be a list of exposed public fields, and a constructor whose parameters are simply the initial values of those fields, in order. Such a struct would offer optimal performance and predictable semantics (behaving identically to all other such structs, aside from the types and names of the fields). Is there any reason to favor read-write properties in cases where there isn't anything they could do anything other than simply read and write an underlying field?
Don't see any benefit on immutable struct of using read/write properties, except point you wrote about: wrapping the logic inside setter and/or getter of the property, and maintaining general guideline across your code base (benefit for maintainance and readability point of view) .
I personally when define a struct almost always use raw public fields and no properties, for simplicity and easy consumption of my type (for the problems on immutable types you wrote already in question)
Hope this helps.
Rico Mariani wrote a good MSDN blog article on this very topic.
Reasons to use public fields rather than getters and setters include:
There are no values the field cannot be allowed to have.
The client is expected to edit it.
To be able to write things such as object.X.Y = Z.
To making a strong promise that the value is just a value and there are no side-effects associated with it (and won't be in the future either).
Some people find this very controversial. I suspect this is because the case listed rarely or never come up in the kind of software they write, but they don't realise that in other application areas they come up a great deal.
(This is a copy of an answer I provided here, but I thought the information is useful enough to be repeated here.)

How to determine if .NET (BCL) type is immutable

From this Answer, I came to know that KeyValuePair are immutables.
I browsed through the docs, but could not find any information regarding immutable behavior.
I was wondering how to determine if a type is immutable or not?
I don't think there's a standard way to do this, since there is no official concept of immutability in C#. The only way I can think of is looking at certain things, indicating a higher probability:
1) All properties of the type have a private set
2) All fields are const/readonly or private
3) There are no methods with obvious/known side effects
4) Also, being a struct generally is a good indication (if it is BCL type or by someone with guidelines for this)
Something like an ImmutabeAttribute would be nice. There are some thoughts here (somewhere down in the comments), but I haven't seen one in "real life" yet.
The first indication would be that the documentation for the property in the overview says "Gets the key in the key/value pair."
The second more definite indication would be in the description of the property itself:
"This property is read/only."
I don't think you can find "proof" of immutability by just looking at the docs, but there are several strong indicators:
It's a struct (why does this matter?)
It has no settable public properties (both are read-only)
It has no obvious mutator methods
For definitive proof I recommend downloading the BCL's reference source from Microsoft or using an IL decompiler to show you how a type would look like in code.
A KeyValuePair<T1,T2> is a struct which, absent Reflection, can only be mutated outside its constructor by copying the contents of another KeyValuePair<T1,T2> which holds the desired values. Note that the statement:
MyKeyValuePair = new KeyValuePair(1,2);
like all similar constructor invocations on structures, actually works by creating a new temporary instance of KeyValuePair<int,int> (happens before the constructor itself executes), setting the field values of that instance (done by the constructor), copying all public and private fields of that new temporary instance to MyKeyValuePair, and then discarding the temporary instance.
Consider the following code:
static KeyValuePair MyKeyValuePair; // Field in some class
// Thread1
MyKeyValuePair = new KeyValuePair(1,1);
// ***
MyKeyValuePair = new KeyValuePair(2,2);
// Thread2
st = MyKeyValuePair.ToString();
Because MyKeyValuePair is precisely four bytes in length, the second statement in Thread1 will update both fields simultaneously. Despite that, if the second statement in Thread1 executes between Thread2's evaluation of MyKeyValuePair.Key.ToString() and MyKeyValuePair.Value.ToString(), the second ToString() will act upon the new mutated value of the structure, even though the first already-completed ToString()operated upon the value before the mutation.
All non-trivial structs, regardless of how they are declared, have the same immutability rules for their fields: code which can change a struct can change its fields; code which cannot change a struct cannot change its fields. Some structs may force one to go through hoops to change one of their fields, but designing struct types to be "immutable" is neither necessary nor sufficient to ensure the immutability of instances. There are a few reasonable uses of "immutable" struct types, but such use cases if anything require more care than is necessary for structs with exposed public fields.

Overhead of using this on structs

When you have automatic properties, C# compiler asks you to call the this constructor on any constructor you have, to make sure everything is initialized before you access them.
If you don't use automatic properties, but simply declare the values, you can avoid using the this constructor.
What's the overhead of using this on constructors in structs? Is it the same as double initializing the values?
Would you recommend not using it, if performance was a top concern for this particular type?
I would recommend not using automatic properties at all for structs, as it means they'll be mutable - if only privately.
Use readonly fields, and public properties to provide access to them where appropriate. Mutable structures are almost always a bad idea, and have all kinds of nasty little niggles.
Do you definitely need to create your own value type in the first place though? In my experience it's very rare to find a good reason to create a struct rather than a class. It may be that you've got one, but it's worth checking.
Back to your original question: if you care about performance, measure it. Always. In this case it's really easy - you can write the struct using an automatic property and then reimplement it without. You could use a #if block to keep both options available. Then you can measure typical situations and see whether the difference is significant. Of course, I think the design implications are likely to be more important anyway :)
Yes, the values will be initialized twice and without profiling it is difficult to say whether or not this performance hit would be significant.
The default constructor of a struct initializes all members to their default values. After this happens your constructor will run in which you undoubtedly set the values of those properties again.
I would imagine this would be no different than the CLR's practice of initializing all fields of a reference type upon instantiation.
The reason the C# compiler requires you to chain to the default constructor (i.e. append : this() to your constructor declaration) when auto-implemented properties are used is because all variables need to be assigned before exiting the constructor. Now, auto-implemented properties mess this up a bit in that they don't allow you to directly access the variables that back the properties. The method the compiler uses to get around this is to automatically assign all the variables to their default values, and to insure this, you must chain to the default constructor. It's not a particularly clever method, but it does the job well enough.
So indeed, this will mean that some variables will end up getting initialised twice. However, I don't think this will be a big performance problem. I would be very surprised it the compiler (or at very least the JIT) didn't simply remove the first initialisation statement for any variable that is set twice in your constructor. A quick benchmark should confirm this for you, though I'm quite sure you will get the suspected results. (If you by chance don't, and you absolutely need the tiny performance boost that avoidance of duplicate initialisation offers, you can just define your properties the normal way, i.e. with backing variables.)
To be honest, my advice would be not even to bother with auto-implemented properties in structures. It's perfectly acceptable just to use public variables in lieu of them, and they offer no less functionality than auto-implemented properties. Classes are a different situation of course, but I really wouldn't hesitate to use public variables in structs. (Any complex properties can be defined normally, if you need them.)
Hope that helps.
Don't use automatic properties with structure types. Simply expose fields directly. If a struct has an exposed public field Foo of type Bar, the fact that Foo is an exposed field of type Bar (information readily available from Intellisense) tells one pretty much everything there is to know about it. By contrast, the fact that a struct Foo has an exposed read-write property of Boz does not say anything about whether writing to Boz will mutate a field in the struct, or whether it will mutate some object to which Boz holds a reference. Exposing fields directly will offer cleaner semantics, and often also result in faster-running code.

What are the deficiencies of the Java/C# type system?

Its often hear that Haskell(which I don't know) has a very interesting type system.. I'm very familiar with Java and a little with C#, and sometimes it happens that I'm fighting the type system so some design accommodates or works better in a certain way.
That led me to wonder...
What are the problems that occur somehow because of deficiencies of Java/C# type system?
How do you deal with them?
Arrays are broken.
Object[] foo = new String[1];
foo[0] = new Integer(4);
Gives you java.lang.ArrayStoreException
You deal with them with caution.
Nullability is another big issue. NullPointerExceptions jump at your face everywhere. You really can't do anything about them except switch language, or use conventions of avoiding them as much as possible (initialize fields properly, etc).
More generally, the Java's/C#'s type systems are not very expressive. The most important thing Haskell can give you is that with its types you can enforce that functions don't have side effects. Having a compile time proof that parts of programs are just expressions that are evaluated makes programs much more reliable, composable, and easier to reason about. (Ignore the fact, that implementations of Haskell give you ways to bypass that).
Compare that to Java, where calling a method can do almost anything!
Also Haskell has pattern matching, which gives you different way of creating programs; you have data on which functions operate, often recursively. In pattern matching you destruct data to see of what kind it is, and behave according to it. e.g. You have a list, which is either empty, or head and tail. If you want to calculate the length, you define a function that says: if list is empty, length = 0, otherwise length = 1 + length(tail).
If you really like to learn more, there's two excellent online sources:
Learn you a Haskell and Real World Haskell
I dislike the fact that there is a differentiation between primitive (native) types (int, boolean, double) and their corresponding class-wrappers (Integer, Boolean, Double) in Java.
This is often quite annoying especially when writing generic code. Native types can't be genericized, you must instantiate a wrapper instead. Generics should make your code more abstract and easier reusable, but in Java they bring restrictions with obviously no reasons.
private static <T> T First(T arg[]) {
return arg[0];
}
public static void main(String[] args) {
int x[] = {1, 2, 3};
Integer y[] = {3, 4, 5};
First(x); // Wrong
First(y); // Fine
}
In .NET there are no such problems even though there are separate value and reference types, because they strictly realized "everything is an object".
this question about generics shows the deficiencies of the java type system's expressiveness
Higher-kinded generics in Java
I don't like the fact that classes are not first-class objects, and you can't do fancy things such as having a static method be part of an interface.
A fundamental weakness in the Java/.net type system is that it has no declarative means of specifying how an object's state relates to the contents of its reference-type fields, nor of specifying what a method is allowed to persist reference-type parameters. Although in some sense it's nice for the runtime to be able to use a field Foo of one type ICollection<integer> to mean many different things, it's not possible for the type system to provide real support for things like immutability, equivalency testing, cloning, or any other such features without knowing whether Foo represents:
A read-only reference to a collection which nothing will ever mutate; the class may freely share such reference with outside code, without affecting its semantics. The reference encapsulates only immutable state, and likely does not encapsulate identity.
A writable reference to a collection whose type is mutable, but which nothing will ever actually mutate; the class may only share such references with code that can be trusted not to mutate it. As above, the reference encapsulates only immutable state, and likely does not encapsulate identity.
The only reference anywhere in the universe to a collection which it mutates. The reference would encapsulate mutable state, but would not encapsulate identity (replacing the collection with another holding the same items would not change the state of the enclosing object).
A reference to a collection which it mutates, and whose contents it considers to be its own, but to which outside code holds references which it expects to be attached to `Foo`'s current state. The reference would encapsulate both identity and mutable state.
A reference to a mutable collection owned by some other object, which it expects to be attached to that other object's state (e.g. if the object holding `Foo` is supposed to display the contents of some other collection). That reference would encapsulate identity, but would not encapsulate mutable state.
Suppose one wants to copy the state of the object that contains Foo to a new, detached, object. If Foo represents #1 or #2, one may store in the new object either a copy of the reference in Foo, or a reference to a new object holding the same data; copying the reference would be faster, but both operations would be correct. If Foo represents #3, a correct detached copy must hold a reference to a new detached object whose state is copied from the original. If Foo represents #5, a correct detached copy must hold a copy of the original reference--it must NOT hold reference to a new detached object. And if Foo represents #4, the state of the object containing it cannot be copied in isolation; it might be possible to copy a bunch of interconnected objects to yield a new bunch whose state is equivalent to the original, but it would not be possible to copy the state of objects individually.
While it won't be possible for a type system to specify declaratively all of the possible relationships that can exist among objects and what should be done about them, it should be possible for a type system and framework to correctly generate code to produce semantically-correct equivalence tests, cloning methods, smoothly inter-operable mutable, immutable, and "readable" types, etc. in most cases, if it knew which fields encapsulate identity, mutable state, both, or neither. Additionally, it should be possible for a framework to minimize defensive copying and wrapping in circumstances where it could ensure that the passed references would not be given to anything that would mutate them.
(Re: C# specifically.)
I would love tagged unions.
Ditto on first-class objects for classes, methods, properties, etc.
Although I've never used them, Python has type classes that basically are the types that represent classes and how they behave.
Non-nullable reference types so null-checks are not needed. It was originally considered for C# but was discarded. (There is a stack overflow question on this.)
Covariance so I can cast a List<string> to a List<object>.
This is minor, but for the current versions of Java and C# declaring objects breaks the DRY principle:
Object foo = new Object;
Int x = new Int;
None of them have meta-programming facilities like say that old darn C++ dog has.
Using "using" duplication and lack of typedef is one example that violates DRY and can even cause user-induced 'aliasing' errors and more. Java 'templates' isn't even worth mentioning..

Categories

Resources