Related
Does a cast from ConcurrentDictionary to IDictionary cut off the thread-safe implementation, since IDictionary doesn't have GetOrAdd and AddOrUpdate methods ?
The resulting object will still be a concurrent dictionary. The calls like Add or Remove use the underlying implementation TryAdd and TryRemove (which are thread-safe). Casting an object to a different type doesn't change the object itself.
Also, for clarification, you could use tools like ILSpy to see what's the implementation of default IDictionary methods and whether they'll be still thread-safe.
IDictionary is just an interface. If you cast to it, the result is an implementation of ConcurrentDictionary, missing the GetOrAdd and AddOrUpdate methods.
Presumably, you can still use the Item property and the Add and ContainsKey methods (in lieu of the GetOrAdd and AddOrUpdate) methods, and your casted object will still be thread-safe (since the underlying implementation is a ConcurrentDictionary).
It would be like looking at big ConcurrentDictionary object through IDictionary shaped keyhole - you could only see IDictionary shape but it would still be ConcurrentDictionary.
The interface doesn't affect the implementation. It just doesn't exposed some of ConcurrentDictionary's methods.
You may find this or this helpful in understanding interfaces.
When a ConcurrentDictionary<K,V> is exposed as an IDictionary<K,V>, it is thread-safe in a very limited sense: the consistence of its internal state is guaranteed. No undocumented exceptions will be thrown, and no values are going to emerge out of the blue that were never added to the dictionary. But in a broader sense of the term, taking into account how the dictionary interacts with the rest of the application and how meaningful are the data that it contains, it's unlikely to be thread-safe. The reason is that the IDictionary<K,V> interface doesn't expose the specialized atomic APIs that make the ConcurrentDictionary<K,V> suitable for multithreaded usage. Without these APIs, many common operations become impossible.
For example let's say that you have a ConcurrentDictionary<string, int>, and you want to increment the value of the key "A", or, if the key does not exist, add it with the value 1. The correct API to use is the AddOrUpdate:
dictionary.AddOrUpdate("A", 1, (_, existing) => existing + 1);
Without this API, you are stuck. By using only the members of the IDictionary<string, int> interface, the best you can do is this:
if (dictionary.TryGetValue("A", out int existing))
dictionary["A"] = existing + 1;
else
dictionary.Add("A", 1);
...which is irreparably flawed because the operation is not atomic. Another thread might change the value of the key, between the calls to the TryGetValue and to the set indexer. So the final value for the key "A" might not be equal to the number of times that it was (supposedly) incremented. There is no way to fix this race condition. The IDictionary<K,V> interface was simply not designed for multithreaded usage.
Short answer no.
You are manipulating an object through an interface and hence still using the concrete implementation. You are not losing any functionality nor its methods. They are just not available.
On the side note, you need an explicit cast when downcasting, no need for an explicit cast when upcasting - always safe to do so.
Ok, as I understand it, immutable types are inherently thread safe or so I've read in various places and I think I understand why it is so. If the inner state of an instance can not be modified once the object is created there seems to be no problems with concurrent access to the instance itself.
Therefore, I could create the following List:
class ImmutableList<T>: IEnumerable<T>
{
readonly List<T> innerList;
public ImmutableList(IEnumerable<T> collection)
{
this.innerList = new List<T>(collection);
}
public ImmutableList()
{
this.innerList = new List<T>();
}
public ImmutableList<T> Add(T item)
{
var list = new ImmutableList<T>(this.innerList);
list.innerList.Add(item);
return list;
}
public ImmutableList<T> Remove(T item)
{
var list = new ImmutableList<T>(this.innerList);
list.innerList.Remove(item);
return list;
} //and so on with relevant List methods...
public T this[int index]
{
get
{
return this.innerList[index];
}
}
public IEnumerator<T> GetEnumerator()
{
return innerList.GetEnumerator();
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return ((System.Collections.IEnumerable)this.innerList).GetEnumerator();
}
}
So the question is: Is this really an immutable type? Is it really thread safe?
Obviously the type itself is immutable but there is absolutely no garantee that T is and therefore you could have concurrent access and threading issues related directly with the generic type. Would that mean that ImmutableList should be considered mutable?.
Should class ImmutableList<T>: IEnumerable<T> where T: struct be the only type truly considered immutable?
Thanks for any input on this issue.
UPDATE: A lot of answers/comments are concentrating on the particular implementation of ImmutableList I've posted which is probably not a very good example. But the issue of the question is not the implementation. The question I'm asking is if ImmutableList<MutableT> is really an immutable type considering everything that an immutable type entails.
If the inner state of an instance can not be modified once the object is created there seems to be no problems with concurrent access to the instance itself.
That is generally the case, yes.
Is this really an immutable type?
To briefly sum up: you have a copy-on-write wrapper around a mutable list. Adding a new member to an immutable list does not mutate the list; instead it makes a copy of the underlying mutable list, adds to the copy, and returns a wrapper around the copy.
Provided that the underlying list object you are wrapping does not mutate its internal state when it is read from, you have met your original definition of "immutable", so, yes.
I note that this is not a very efficient way to implement an immutable list. You'd likely do better with an immutable balanced binary tree, for example. Your sketch is O(n) in both time and memory every time you make a new list; you can improve that to O(log n) without too much difficulty.
Is it really thread safe?
Provided that the underlying mutable list is threadsafe for multiple readers, yes.
This might be of interest to you:
http://blogs.msdn.com/b/ericlippert/archive/2011/05/23/read-only-and-threadsafe-are-different.aspx
Obviously the type itself is immutable but there is absolutely no garantee that T is and therefore you could have concurrent access and threading issues related directly with the generic type. Would that mean that ImmutableList<T> should be considered mutable?.
That's a philosophical question, not a technical one. If you have an immutable list of people's names, and the list never changes, but one of the people dies, was the list of names "mutable"? I would think not.
A list is immutable if any question about the list always has the same answer. In our list of people's names, "how many names are on the list?" is a question about the list. "How many of those people are alive?" is not a question about the list, it is a question about the people referred to by the list. The answer to that question changes over time; the answer to the first question does not.
Should class ImmutableList<T>: IEnumerable<T> where T: struct be the only type truely considered immutable?
I'm not following you. How does restricting T to be a struct change anything? OK, T is restricted to struct. I make an immutable struct:
struct S
{
public int[] MutableArray { get; private set; }
...
}
And now I make an ImmutableList<S>. What stops me from modifying the mutable array stored in instances of S? Just because the list is immutable and the struct is immutable doesn't make the array immutable.
Immutability is sometimes defined in different ways. So is thread-safety.
In creating a immutable list whose purpose is to be immutable, you should document just what guarantees you are making. E.g. in this case you guarantee that the list itself is immutable and does not have any hidden mutability (some apparently immutable objects are actually mutable behind the scenes, with e.g. memoisation or internal re-sorting as an optimisation) which removes the thread-safety that comes from immutability (though one can also have such internal mutations performed in a manner that guarantees thread-safety in a different way). You are not guaranteeing that the objects stored can be used in a thread-safe manner.
The thread-safety that you should document relates to this. You can not guarantee that another object won't have the same object (you could if you were creating new objects on each call). You can guarantee that operations will not corrupt the list itself.
Insisting upon T : struct could help, as it would mean that you could ensure that each time you return an item, it's a new copy of the struct (T : struct alone wouldn't do that, as you could have operations that didn't mutate the list, but did mutate its members, so obviously you have to also do this).
This though limits you in both not supporting immutable reference types (e.g. string which tends to be a member of collections in lots of real-world cases) and doesn't allow a user to make use of it and provide their own means of ensuring that the mutability of the contained items doesn't cause problems. Since no thread-safe object can guarantee that all the code it is used in is thread-safe, there's little point tryint to ensure that (help as much as you can by all means, but don't try to ensure what you can't ensure).
It also doesn't protect mutable members of immutable structs in your immutable list!
Using your code, let's say i do this:
ImmutableList<int> mylist = new ImmutableList<int>();
mylist.Add(1);
... your code, posted on StackOverflow, causes a StackOverflow-Exception. There are quite a few sensible ways to create thread save collection, copying collections (at least trying to) and calling them immutable, a lot, doesn't quite do the trick.
Eric Lippert posted a link that might be very worth reading.
A prime example of a data type that behaves as an immutable list of mutable objects: MulticastDelegate. A MulticastDelegate may be modeled pretty accurately as an immutable list of (object, method) pairs. The set of methods, and the identities of the objects upon which they act are immutable, but in the vast majority of cases the objects themselves will be mutable. Indeed, in many if not most cases, the very purpose of the delegate will be to mutate the objects to which it holds references.
It is not the responsibility of a delegate to know whether the methods it's going to invoke upon its target objects might mutate them in thread-unsafe fashion. The delegate is responsible merely for ensuring that its lists of functions and object identities are immutable, and I don't think anyone would expect it to likewise.
An ImmutableList<T> should likewise always hold the same set of instances of type T. The properties of those instances might change, but not their identity. If a List<Car> was created holding two Fords, serial #1234 and #4422, both of which happened to be red, and one of the Fords was painted blue, what started out as a list of two red cars would have changed to a list holding a blue car and a red car, but it would still hold #1234 and #4422.
I would say that generic list is not immutable if the element is mutable, because it does not represent the full snapshot of the data state.
To achieve immutability in your example you would have to create deep copies of list elements, which is not efficient to do every time.
You can see my solution to this problem at IOG library
I believe that one thing adding to the rather lengthy discussion on this topic is that immutability/mutability should be considered along with scope.
As an example:
Say that I am working with a C# project and I define a static class with static data structures, and I define no way to modify those structures in my project. It is in effect a read-only cache of data that I can use, and for the purposes of my program, it is immutable from one run of my program to the next. I have total control over the data, and I/the user am/is unable to modify it at run time.
Now I modify my data in my source code and re-run the program. The data has changed, so in the highest sense of the word, the data is no longer immutable. From the highest level perspective, the data is actually mutable.
I would therefore posit that what we need to be discussing is not the black and white question blanketing something as immutable or not, but rather we should consider the degree of mutability of a particular implementation (as there are few things that actually never change, and are therefore truly immutable).
I've stumbled accross this code in production and I think It may be causing us problems.
internal static readonly MyObject Instance = new MyObject();
Calling the Instance field twice returns two objects with the same hash code. Is it possible that these objects are different?
My knowledge of the CLI says that they are the same because the hash codes are the same.
Can anyone clarify please?
The field will only be initialized once, so you'll always get the same object. It's perfectly safe.
Of course, you have to be careful when using static objects from multiple threads. If the object is not thread-safe, you should lock it before accessing it from different threads.
Yes it is safe - the simplest safe singleton implementation.
As a further point on comparing the hash-code to infer "they're the same object"; since we're talking about reference-types here (singleton being meaningless for value-types), the best way to check if two references point to the same object is:
bool isSame = ReferenceEqual(first, second);
which isn't dependent on the GetHashCode()/Equals/== implementations (it looks at the reference itself).
It is a guarantee provided by the CLR that this will work properly, even when the class is used by multiple threads. This is specified in Ecma 335, Partition II, section 10.5.3.3:
There are similar, but more complex, problems when type initialization takes place in a multi-threaded system. In these cases, for example, two separate threads might start attempting to access static variables of separate
types (A and B) and then each would have to wait for the other to complete initialization.
A rough outline of an algorithm to ensure points 1 and 2 above is as follows:
1. At class load-time (hence prior to initialization time) store zero or null into all static fields of the type.
2. If the type is initialized, you are done.
2.1. If the type is not yet initialized, try to take an initialization lock.
2.2. If successful, record this thread as responsible for initializing the type and proceed to step 2.3.
2.2.1. If not successful, see whether this thread or any thread waiting for this thread to complete already holds
the lock.
2.2.2. If so, return since blocking would create a deadlock. This thread will now see an incompletely initialized
state for the type, but no deadlock will arise.
2.2.3 If not, block until the type is initialized then return.
2.3 Initialize the base class type and then all interfaces implemented by this type.
2.4 Execute the type initialization code for this type.
2.5 Mark the type as initialized, release the initialization lock, awaken any threads waiting for this type to be
initialized, and return.
To be clear, that's the algorithm they propose for a CLR implementation, not your code.
Other answers have commented on the rock safety. Here's some more on your reference to hash codes:
The hash codes being the same implies that the two objects might be considered "equal" - a different concept to "the same". All a hash code really tells you is that, if two objects have different hash codes, they are definitely not "equal" - and therefore by implication definitely not "the same". Equality is defined by the overriding of the .Equals() method, and the contract imposed is that if two objects are considered equal by this method, then they must return the same value from their .GetHashCode() methods. Two variables are "the same" if their references are equal - i.e. they point to the same object in memory.
It's static meaning it belongs to the class, and it's readonly, so I cannot be changed after initialization, so yes you will get the same object.
It will work just fine in this case but if you mark the instance with the [ThreadStatic] attribute then inline initialization won't work and you'll have to use something else, like lazy initialization, in which case you don't have to take care if the operations using the singleton are "thread-safe" as the singleton is per thread.
Regards...
You could be interested in the fact that the laziness of the initialization
could vary.
Jon Skeet suggests to add an empty static constructor if you care about when
the instance is actually initialized.
In order to avoid exposing in a wrong way I provide you with the link about his
article on the Singleton Pattern.
Your question refers to the Fourth (and suggested) singleton pattern implementation discussed in his article.
Singleton: singleton implementation
Inside the article you find a link to a discussion on beforefieldinit and laziness of the initialization.
Your assumption that they are the same because the hash codes are the same is incorrect, GetHashCode() does a comparison on the fields of your object.
Assuming you didn't overload Object.Equals, you can do a simple equals comparison, which is by default a comparison by reference:
MyObject a = MyObject.Instance;
MyObject b = MyObject.Instance;
Console.WriteLine(a == b);
This will output True, by the way, because your singleton implementation is sort of correct. A static readonly field is guaranteed to be assigned only once. However, semantically it would be more correct to implement a property with only a get-accessor and use a private static field as a backing store.
I have been reading about the syncroot element but I can't find it in the List type.
So how should the multithreading synchronization be done with the System.Collections.Generic.List<> type?
The reason you can't find it is because it was explicitly removed. If it is really what you want to do, use a SynchronizedCollection<T> or create a dedicated synchronization object. The best approach (in general) is to create a dedicated synchronization object, as Winston illustrates.
The essential problem with the SyncRoot property is that it provides a false sense of security -- it only handles a very narrow set of circumstances. Developers often neglect synchronization for an entire logical operation, assuming that locking on SyncRoot is good enough.
You generally want to avoid locking on a type (List<T> in this case). If, for example, you have two instances of your type, or another type were to also use a lock on List<T>, they would all competing for a single global lock. Really, what you are trying to achieve is proper synchronization for a single object.
Why do you want to lock on List<T> as opposed to your specific instance of a list?
It is often suggested that the best method of locking is to lock on a private object created solely for that purpose.
private readonly object myListLock = new object();
// Everywhere you access myList
lock(myListLock)
{
// do stuff with myList
}
For a great guide to threading in C#, see this Free E-Book (Threading in C#) by Joe Albahari.
You have to cast the generic list to an ICollection like this:
using System.Collection; // required for ICollection
using System.Collection.Generic;
List<int> myIntList = new List<int>();
lock (((ICollection)myIntList).SyncRoot)
{
// do your synchronized stuff here...
}
Keep in mind that this only synchronizes access to the elements of the generic list. It does not synchronize access to the generic list variable, e.g., myIntList. If you replace myIntList with a new list at some point, using SyncRoot will not be sufficient. In that case, I would recommend creating a specific synchronization object that can be used for both synchronization scenarios.
The answer is Yes, you can use an instance of the list as a synchronizing object:
private readonly List<string> list = new List<string>();
lock(list)
{
// ...
}
So you don't have to use SyncRoot property. Moreover, documentation states that
For collections whose underlying store is not publicly available, the
expected implementation is to return the current instance.
i.e. in some cases SyncRoot property returns the collection object itself.
Also read this answer about SyncRoot.
FYI: I've never seen SyncRoot usages in production code. So I suggest you to use instance of any collection as a synchronizing object instead of it's SyncRoot property (as long as the collection is private).
I am wondering how immutability is defined? If the values aren't exposed as public, so can't be modified, then it's enough?
Can the values be modified inside the type, not by the customer of the type?
Or can one only set them inside a constructor? If so, in the cases of double initialization (using the this keyword on structs, etc) is still ok for immutable types?
How can I guarantee that the type is 100% immutable?
If the values aren't exposed as public, so can't be modified, then it's enough?
No, because you need read access.
Can the values be modified inside the type, not by the customer of the type?
No, because that's still mutation.
Or can one only set them inside a constructor?
Ding ding ding! With the additional point that immutable types often have methods that construct and return new instances, and also often have extra constructors marked internal specifically for use by those methods.
How can I guarantee that the type is 100% immutable?
In .Net it's tricky to get a guarantee like this, because you can use reflection to modify (mutate) private members.
The previous posters have already stated that you should assign values to your fields in the constructor and then keep your hands off them. But that is sometimes easier said than done. Let's say that your immutable object exposes a property of the type List<string>. Is that list allowed to change? And if not, how will you control it?
Eric Lippert has written a series of posts in his blog about immutability in C# that you might find interesting: you find the first part here.
One thing that I think might be missed in all these answers is that I think that an object can be considered immutable even if its internal state changes - as long as those internal changes are not visible to the 'client' code.
For example, the System.String class is immutable, but I think it would be permitted to cache the hash code for an instance so the hash is only calculated on the first call to GetHashCode(). Note that as far as I know, the System.String class does not do this, but I think it could and still be considered immutable. Of course any of these changes would have to be handled in a thread-safe manner (in keeping with the non-observable aspect of the changes).
To be honest though, I can't think of many reasons one might want or need this type of 'invisible mutability'.
Here is the definition of immutability from Wikipedia (link)
"In object-oriented and functional programming, an immutable object is an object whose state cannot be modified after it is created."
Essentially, once the object is created, none of its properties can be changed. An example is the String class. Once a String object is created it cannot be changed. Any operation done to it actually creates a new String object.
Lots of questions there. I'll try to answer each of them individually:
"I am wondering how immutability is defined?" - Straight from the Wikipedia page (and a perfectly accurate/concise definition)
An immutable object is an object whose state cannot be modified after it is created
"If the values aren't exposed as public, so can't be modified, then it's enough?" - Not quite. It can't be modified in any way whatsoever, so you've got to insure that methods/functions don't change the state of the object, and if performing operations, always return a new instance.
"Can the values be modified inside the type, not by the customer of the type?" - Technically, it can't be modified either inside or by a consumer of the type. In pratice, types such as System.String (a reference type for the matter) exist that can be considered mutable for almost all practical purposes, though not in theory.
"Or can one only set them inside a constructor?" - Yes, in theory that's the only place where state (variables) can be set.
"If so, in the cases of double initialization (using the this keyword on structs, etc) is still ok for immutable types?" - Yes, that's still perfectly fine, because it's all part of the initialisation (creation) process, and the instance isn't returned until it has finished.
"How can I guarantee that the type is 100% immutable?" - The following conditions should insure that. (Someone please point out if I'm missing one.)
Don't expose any variables. They should all be kept private (not even protected is acceptable, since derived classes can then modify state).
Don't allow any instance methods to modify state (variables). This should only be done in the constructor, while methods should create new instances using a particular constructor if they require to return a "modified" object.
All members that are exposed (as read-only) or objects returned by methods must themselves be immutable.
Note: you can't insure the immutability of derived types, since they can define new variables. This is a reason for marking any type you wan't to make sure it immutable as sealed so that no derived class can be considered to be of your base immutable type anywhere in code.
Hope that helps.
I've learned that immutability is when you set everything in the constructor and cannot modify it later on during the lifetime of the object.
The definition of immutability can be located on Google .
Example:
immutable - literally, not able to change.
www.filosofia.net/materiales/rec/glosaen.htm
In terms of immutable data structures, the typical definition is write-once-read-many, in other words, as you say, once created, it cannot be changed.
There are some cases which are slightly in the gray area. For instance, .NET strings are considered immutable, because they can't change, however, StringBuilder internally modifies a String object.
An immutable is essentially a class that forces itself to be final from within its own code. Once it is there, nothing can be changed. In my knowledge, things are set in the constructor and then that's it. I don't see how something could be immutable otherwise.
There's unfortunately no immutable keywords in c#/vb.net, though it has been debated, but if there's no autoproperties and all fields are declared with the readonly (readonly fields can only bet assigned in the constructor) modfier and that all fields is declared of an immutable type you will have assured your self immutability.
An immutable object is one whose observable state can never be changed by any plausible sequence of code execution. An immutable type is one which guarantees that any instances exposed to the outside world will be immutable (this requirement is often stated as requiring that the object's state may only be set in its constructor; this isn't strictly necessary in the case of objects with private constructors, nor is it sufficient in the case of objects which call outside methods on themselves during construction).
A point which other answers have neglected, however, is a definition of an object's state. If Foo is a class, the state of a List<Foo> consists of the sequence of object identities contained therein. If the only reference to a particular List<Foo> instance is held by code which will neither cause that sequence to be changed, nor expose it to code that might do so, then that instance will be immutable, regardless of whether the Foo objects referred to therein are mutable or immutable.
To use an analogy, if one has a list of automobile VINs (Vehicle Identification Numbers) printed on tamper-evident paper, the list itself would be immutable even though cars aren't. Even if the list contains ten red cars today, it might contain ten blue cars tomorrow; they would still, however, be the same ten cars.