Does a cast from ConcurrentDictionary to IDictionary cut off the thread-safe implementation, since IDictionary doesn't have GetOrAdd and AddOrUpdate methods ?
The resulting object will still be a concurrent dictionary. The calls like Add or Remove use the underlying implementation TryAdd and TryRemove (which are thread-safe). Casting an object to a different type doesn't change the object itself.
Also, for clarification, you could use tools like ILSpy to see what's the implementation of default IDictionary methods and whether they'll be still thread-safe.
IDictionary is just an interface. If you cast to it, the result is an implementation of ConcurrentDictionary, missing the GetOrAdd and AddOrUpdate methods.
Presumably, you can still use the Item property and the Add and ContainsKey methods (in lieu of the GetOrAdd and AddOrUpdate) methods, and your casted object will still be thread-safe (since the underlying implementation is a ConcurrentDictionary).
It would be like looking at big ConcurrentDictionary object through IDictionary shaped keyhole - you could only see IDictionary shape but it would still be ConcurrentDictionary.
The interface doesn't affect the implementation. It just doesn't exposed some of ConcurrentDictionary's methods.
You may find this or this helpful in understanding interfaces.
When a ConcurrentDictionary<K,V> is exposed as an IDictionary<K,V>, it is thread-safe in a very limited sense: the consistence of its internal state is guaranteed. No undocumented exceptions will be thrown, and no values are going to emerge out of the blue that were never added to the dictionary. But in a broader sense of the term, taking into account how the dictionary interacts with the rest of the application and how meaningful are the data that it contains, it's unlikely to be thread-safe. The reason is that the IDictionary<K,V> interface doesn't expose the specialized atomic APIs that make the ConcurrentDictionary<K,V> suitable for multithreaded usage. Without these APIs, many common operations become impossible.
For example let's say that you have a ConcurrentDictionary<string, int>, and you want to increment the value of the key "A", or, if the key does not exist, add it with the value 1. The correct API to use is the AddOrUpdate:
dictionary.AddOrUpdate("A", 1, (_, existing) => existing + 1);
Without this API, you are stuck. By using only the members of the IDictionary<string, int> interface, the best you can do is this:
if (dictionary.TryGetValue("A", out int existing))
dictionary["A"] = existing + 1;
else
dictionary.Add("A", 1);
...which is irreparably flawed because the operation is not atomic. Another thread might change the value of the key, between the calls to the TryGetValue and to the set indexer. So the final value for the key "A" might not be equal to the number of times that it was (supposedly) incremented. There is no way to fix this race condition. The IDictionary<K,V> interface was simply not designed for multithreaded usage.
Short answer no.
You are manipulating an object through an interface and hence still using the concrete implementation. You are not losing any functionality nor its methods. They are just not available.
On the side note, you need an explicit cast when downcasting, no need for an explicit cast when upcasting - always safe to do so.
Related
In C#, what is the key difference (in terms of features or use cases) between these two containers? There doesn't appear to be any information comparing these on Google.
System.Collections.ObjectModel.ReadOnlyDictionary
System.Collections.Immutable.ImmutableDictionary
I understand that an ImmutableDictionary is thread-safe. Is the same true of a ReadOnlyDictionary?
This is not a duplicate of How to properly use IReadOnlyDictionary?. That question is about how to use IReadOnlyDictionary. This question is about the difference between the two (which, as someone commented on that thread back in 2015, would be a different question - ie. this one)
A ReadOnlyDictionary can be initialized once via constructor, then you can't add or remove items from it (they throw NotSupportedExceptions). It's useful if you want to ensure that it won't be modified while it's sent across multiple layers of your application.
An ImmutableDictionary has methods to modify it like Add or Remove, but they will create a new dictionary and return that, the original one remains unchanged and the copy of the new immutable dictionary is returned.
Note that:
You initialize the ReadOnlyDictionary by passing another dictionary instance to the constructor. That explains why a ReadOnlyDictionary is mutable (if the underlying dictionary is modified). It's just a wrapper that is protected from direct changes.
You can't use a constructor for ImmutableDictionary: How can I create a new instance of ImmutableDictionary?
That also explains why the ReadOnlyDictionary is not thread-safe (better: it's as thread-safe as the underlying dictionary). The ImmutableDictionary is thread-safe because you can't modify the original instance (neither directly nor indirectly). All methods that "modify" it actually return a new instance.
But if you need a thread-safe dictionary and it's not necessary that it's immutable, use a ConcurrentDictionary instead.
A ReadOnlyDictionary<TKey,TValue> is a wrapper around another existing IDictionary<TKey,TValue> implementing object.
Importantly, whilst "you" (the code with access to the ReadOnlyDictionary) cannot make any changes to the dictionary via the wrapper, this does not mean that other code is unable to modify the underlying dictionary.
So unlike what other answers may suggest, you cannot assume that the ReadOnlyDictionary isn't subject to modification - just that "you" aren't allowed to. So for example, you cannot be sure that two attempts to access a particular key will produce the same result.
In addition to the current answers, I would add that ImmutableDictionary is slower and usually will use more memory.
Why slower? Behind the scenes, the ImmutableDictionary isn't a hash table. It uses an AVL tree which is a self-balancing tree, and therefore, its access complexity is O(logn). On the other hand, the other dictionaries use a hash table behind the scenes and the access complexity for them is O(1).
Why more memory allocation? Every time the dictionary is being changed it creates a new dictionary because it is immutable.
Instead of describing what these two classes do, it would be better to describe what it actually means for something to be read-only or immutable, as there is a key distinction which doesn't really give much option to those two implementations.
Read-only is part of an "interface" of a class, its set of public methods and properties. Being read-only means that there is no possible sequence of actions an external consumer of the class could do in order to affect its visible state. Compare with a read-only file for example; no application can write to such a file using the same API that made it possible to make it read-only in the first place.
Does read-only imply thread-safe? Not necessarily – a read-only class could still employ things like caching or optimization of its internal data structures and those may be (poorly) implemented in a way that breaks when invoked concurrently.
Does read-only imply never-changing? Also no; look at the system clock, for example. You cannot really affect it (with the default permissions), you can only read it (making it read-only by definition), but its value changes based on the time.
Never-changing means immutable. It is a much stronger concept, and, like thread-safety, is a part of the contract of the whole class. The class must actively ensure that no part of its instance ever changes during its lifetime, with respect to what can be observed externally.
Strings are immutable in .NET: as long as the integrity of the runtime is not compromised (by memory hacking), a particular instance of a string will never be different from its initially observed value. Read-only files are, on the other hand, not much immutable, as one could always turn read-only off and change the file.
Immutable also does not imply thread-safe, as such an object could still employ techniques that modify its internal state and are not thread-safe (but it's generally easier to ensure).
The question whether immutable implies read-only depends on how you look at it. You can usually "mutate" an immutable object in a way that doesn't affect external code that may be using it, thus exposing an immutable object is at least as strong as exposing a read-only one. Taking a substring of a string is like deleting a part of it, but in a safe manner.
This brings us back to the original question about the two classes. All ReadOnlyDictionary has to do is to be read-only. You still have to provide the data in some way, with an internally wrapped dictionary, and you and only you can still write to it through the internal dictionary. The wrapper provides "strong" read-only access (compared to a "weak" read-only access that you get just by casting to IReadOnlyDictionary). It is also thread-safe, but only when the underlying dictionary is thread-safe as well.
ImmutableDictionary can do much more with the strong guarantee that the data it holds cannot be changed. Essentially you can "patch" parts of it with new data and obtain a modified "copy" of the structure but without actually copying the complete object. It is also thread-safe by the virtue of its implementation. Similarly to a StringBuilder, you use a builder to make changes to an instance and then bake them to make the final instance of an immutable dictionary.
ReadOnlyDictionary: is ReadOnly, cannot add or remove
ImmutableDictonary: can add or remove but it is immutable like string. There is new object to added and removed.
I am running through some tests about using ArrayLists and List.
Speed is very important in my app.
I have tested creating 10000 records in each, finding an item by index and then updating that object for example:
List[i] = newX;
Using the arraylist seems much faster. Is that correct?
UPDATE:
Using the List[i] approach, for my List<T> approach I am using LINQ to find the index eg/
....
int index = base.FindIndex(x=>x.AlpaNumericString = "PopItem");
base[index] = UpdatedItem;
It is definately slower than
ArrayList.IndexOf("PopItem"))
base[index] = UpdatedItem;
A generic List (List<T>) should always be quicker than an ArrayList.
Firstly, an ArrayList is not strongly-typed and accepts types of object, so if you're storing value types in the ArrayList, they are going to be boxed and unboxed every time they are added or accessed.
A Generic List can be defined to accept only (say) int's so therefore no boxing or unboxing needs to occur when adding/accessing elements of the list.
If you're dealing with reference types, you're probably still better off with a Generic List over an ArrayList, since although there's no boxing/unboxing going on, your Generic List is type-safe, and there will be no implicit (or explicit) casts required when retrieving your strongly-typed object from the ArrayList's "collection" of object types.
There may be some edge-cases where an ArrayList is faster performing than a Generic List, however, I (personally) have not yet come across one. Even the MSDN documentation states:
Performance Considerations
In deciding whether to use the
List<(Of <(T>)>) or ArrayList class,
both of which have similar
functionality, remember that the
List<(Of <(T>)>) class performs better
in most cases and is type safe. If a
reference type is used for type T of
the List<(Of <(T>)>) class, the
behavior of the two classes is
identical. However, if a value type is
used for type T, you need to consider
implementation and boxing issues.
If a value type is used for type T,
the compiler generates an
implementation of the List<(Of <(T>)>)
class specifically for that value
type. That means a list element of a
List<(Of <(T>)>) object does not have
to be boxed before the element can be
used, and after about 500 list
elements are created the memory saved
not boxing list elements is greater
than the memory used to generate the
class implementation.
Make certain the value type used for
type T implements the IEquatable<(Of
<(T>)>) generic interface. If not,
methods such as Contains must call the
Object..::.Equals(Object) method,
which boxes the affected list element.
If the value type implements the
IComparable interface and you own the
source code, also implement the
IComparable<(Of <(T>)>) generic
interface to prevent the BinarySearch
and Sort methods from boxing list
elements. If you do not own the source
code, pass an IComparer<(Of <(T>)>)
object to the BinarySearch and Sort
methods
Moreover, I particularly like the very last section of that paragraph, which states:
It is to your advantage to use the type-specific implementation of the List<(Of <(T>)>) class instead of using the ArrayList class or writing a strongly typed wrapper collection yourself. The reason is your implementation must do what the .NET Framework does for you already, and the common language runtime can share Microsoft intermediate language code and metadata, which your implementation cannot.
Touché! :)
Based on your recent edit it seems as though you're not performing a 1:1 comparison here. In the List you have a class object and you're looking for the index based on a property, whereas in the ArrayList you just store the values of that property. If so, this is a severely flawed comparison.
To make it a 1:1 comparison you would add the values to the list only, not the class. Or, you would add the class items to the ArrayList. The former would allow you to use IndexOf on both collections. The latter would entail looping through your entire ArrayList and comparing each item till a match was found (and you could do the same for the List), or overriding object.Equals since ArrayList uses that for comparison.
For an interesting read, I suggest taking a look at Rico Mariani's post: Performance Quiz #7 -- Generics Improvements and Costs -- Solution. Even in that post Rico also emphasizes the need to benchmark different scenarios. No blanket statement is issued about ArrayLists, although the general consensus is to use generic lists for performance, type safety, and having a strongly typed collection.
Another related article is: Why should I use List and not ArrayList.
ArrayList seems faster? According to the documentation ( http://msdn.microsoft.com/en-us/library/6sh2ey19.aspx ) List should be faster when using a value type, and the same speed when using a reference type. ArrayList is slower with value types because it needs to box/unbox the values when you're accessing them.
I would expect them to be about the same if they are value-types. There is an extra cast/type-check for ArrayList, but nothing huge. Of course, List<T> should be preferred. If speed is the primary concern (which it almost always isn't, at least not in this way), then you might also want to profile an array (T[]) - harder (=more expensive) to add/remove, of course - but if you are just querying/assigning by index, it should be the fastest. I have had to resort to arrays for some very localised performance critical work, but 99.95% of the time this is overkill and should be avoided.
For example, for any of the 3 approaches (List<T>/ArrayList/T[]) I would expect the assignment cost to be insignificant to the cost of newing up the new instance to put into the storage.
Marc Gravell touched on this in his anwswer - I think it needs to be stressed.
It is usually a waste of time to prematurely optimize your code!
A better approach is to do a simple, well designed first implementation, and test it with anticipated real world data loads.
Often, you will find that it's "fast enough". (It helps to start out with a clear definition of "fast enough" - e.g. "Must be able to find a single CD in a 10,000 CD collection in 3 seconds or less")
If it's not, put a profiler on it. Almost invariably, the bottle neck will NOT be where you expect.
(I learned this the hard way when I brought a whole app to it's knees with single badly chosen string concatenation)
Ok, I'm hoping the community at large will aid us in solving a workplace debate that has been ongoing for a while. This has to do with defining interfaces that either accept or return lists of some type. There are several ways of doing this:
public interface Foo
{
Bar[] Bars { get; }
IEnumerable<Bar> Bars { get; }
ICollection<Bar> Bars { get; }
IList<Bar> Bars { get; }
}
My own preference is to use IEnumerable for arguments and arrays for return values:
public interface Foo
{
void Do(IEnumerable<Bar> bars);
Bar[] Bars { get; }
}
My argument for this approach is that the implementation class can create a List directly from the IEnumerable and simply return it with List.ToArray(). However some believe that IList should be returned instead of an array. The problem I have here is that now your required again to copy it with a ReadOnlyCollection before returning. The option of returning IEnumerable seems troublesome for client code?
What do you use/prefer? (especially with regards to libraries that will be used by other developers outside your organization)
My preference is IEnumerable<T>. Any other of the suggested interfaces gives the appearance of allowing the consumer to modify the underlying collection. This is almost certainly not what you want to do as it's allowing consumers to silently modify an internal collection.
Another good one IMHO, is ReadOnlyCollection<T>. It allows for all of the fun .Count and Indexer properties and unambiguously says to the consumer "you cannot modify my data".
I don't return arrays - they really are a terrible return type to use when creating an API - if you truly need a mutable sequence use the IList<T> or ICollection<T> interface or return a concrete Collection<T> instead.
Also I would suggest that you read Arrays considered somewhat harmful by Eric Lippert:
I got a moral question from an author
of programming language textbooks the
other day requesting my opinions on
whether or not beginner programmers
should be taught how to use arrays.
Rather than actually answer that
question, I gave him a long list of my
opinions about arrays, how I use
arrays, how we expect arrays to be
used in the future, and so on. This
gets a bit long, but like Pascal, I
didn't have time to make it shorter.
Let me start by saying when you
definitely should not use arrays, and
then wax more philosophical about the
future of modern programming and the
role of the array in the coming world.
For property collections that are indexed (and the indices have necessary semantic meaning), you should use ReadOnlyCollection<T> (read only) or IList<T> (read/write). It's the most flexible and expressive. For non-indexed collections, use IEnumerable<T> (read only) or ICollection<T> (read/write).
Method parameters should use IEnumerable<T> unless they 1) need to add/remove items to the collection (ICollection<T>) or 2) require indexes for necesary semantic purposes (IList<T>). If the method can benefit from indexing availability (such as a sorting routine), it can always use as IList<T> or .ToList() when that fails in the implementation.
I think about this in terms of writing the most useful code possible: code that can do more.
Put in those terms, it means I like to accept the weakest interface possible as method arguments, because that makes my code useful from more places. In this case, that's an IEnumerable<T>. Have an array? You can call my method. Have a List? You can call my method. Have an iterator block? You get the idea.
It also means I like my methods to return the strongest interface that is convenient, so that code that relies on the method can easily do more. In this case, that would be IList<T>. Note that this doesn't mean I will construct a list just so I can return it. It just means that if I already have some that implements IList<T>, I may as well use it.
Note that I'm a little unconventional with regards to return types. A more typical approach is to also return weaker types from methods to avoid locking yourself into a specific implementation.
I would prefer IEnumerable as it is the most highlevel of the interfaces giving the end user the opportunity to re-cast as he wishes. Even though this may provide the user with minimum functionality to begin with (basically only enumeration) it would still be enough to cover virtually any need, especially with the extension methods, ToArray(), ToList() etc.
IEnumerable<T> is very useful for lazy-evaluated iteration, especially in scenarios that use method chaining.
But as a return type for a typical data access tier, a Count property is often useful, and I would prefer to return an ICollection<T> with a Count property or possibly IList<T> if I think typical consumers will want to use an indexer.
This is also an indication to the caller that the collection has actually been materialized. And thus the caller can iterate through the returned collection without getting exceptions from the data access tier. This can be important. For example, a service may generate a stream (e.g. SOAP) from the returned collection. It can be awkward if an exception is thrown from the data access layer while generating the stream due to lazy-evaluated iteration, as the output stream is already partially written when the exception is thrown.
Since the Linq extension methods were added to IEnumerable<T>, I've found that my use of the other interfaces has declined considerably; probably around 80%. I used to use List<T> religiously as it had methods that accepted delegates for lazy evaluation like Find, FindAll, ForEach and the like. Since that's available through System.Linq's extensions, I've replaced all those references with IEnumerable<T> references.
I wouldn't go with array, its a type that allows modification yet doesn't have add/remove... kind of like the worst of the pack. If I want to allow modifications, then I would use a type that supports add/remove.
When you want to prevent modifications, you are already wrapping it/copying it, so I don't see what's wrong with a an IEnumerable or a ReadOnlyCollection. I would go with the later ... something I don't like about IEnumerable is that its lazy by nature, yet when you are using with pre-loaded data only to wrap it, calling code that works with it tends to assume pre-loaded data or have extra "unnecessary" lines :( ... that can get ugly results during change.
is there any difference between:
lock((IDictionary) _collection).SyncRoot)
or
lock(_collection)
Yes, the Monitor is take out on a different object in each, so the two are not functional equivalents. The SyncRoot is exposed so you can lock on it for collection member access (and the collection internally uses the same for locking). This allows code in the collection and code external to it to agree upon a specific lock object.
I believe there are two questions.
Is it the same?
The answer to this is "it depends." It is only the same if SyncRoot is implemented by returing "this". IDictionary is an interface and there is no contract detailing what object should be returned for the SyncRoot property. The implementor is free to return "this" or a completely different object (IDictionary<TKey,TValue> does this).
Is it a good idea?
No. IDictionary implementors tend to return a different object (all of the standard collection clasess do). So the odds are against you in creating a valid locking construct.
You should use a native Thread Safe Dictionary implementation instead of locking the entire Dictionary.
http://devplanet.com/blogs/brianr/archive/2008/09/26/thread-safe-dictionary-in-net.aspx
That depends on the implementation of the IDictionary. If the dictionary returns "this" as the SyncRoot, the statements are equivalent.
It's really easy to see how it is not the same:
Monitor.Enter((IDictionary) _collection).SyncRoot));
Monitor.Exit(_collection);
This will probably throw an exception saying the object is not the same.
I would recommend using the SyncRoot object. The IDictionary may simply return this and it will effectively be the same, but that's not guaranteed.
I ran into this today when unit testing a generic dictionary.
System.Collections.Generic.Dictionary<int, string> actual, expected;
actual = new System.Collections.Generic.Dictionary<int, string> { { 1, "foo" }, { 2, "bar" } };
expected = new System.Collections.Generic.Dictionary<int, string> { { 1, "foo" }, { 2, "bar" } };
Assert.AreEqual(expected, actual); //returns false
fails except when actual == expected (object references are the same). Obviously, actual.Equals(expected) returns false as well.
Fine, but if the implementation of System.Collections.Generic.Dictionary<int, string>.Equals only does reference equality, what's the point of IEquatable? In other words, why is there no baked-in way to do value equality for generic collections?
Edit Thanks for the responses so far. Obviously my example is using value types, but I think my complaint holds for all objects. Why can't a generic collection equality just be a union of equalities of its types? Unexpected behavior doesn't really cut it since there are separate provisions for finding reference equality. I suppose this would introduce the constraint of collections only holding object that implement IEquatable, as Konrad Rudolph points out. However, in an object like Dictionary, this doesn't seem too much to ask.
In other words, why is there no baked-in way to do value equality for generic collections?
Probably because it's hard to formulate in generic terms, since this would only be possible if the value type (and key type) of the dictionary also implemented IEquatable. However, requiring this would be too strong, making Dictionary unusable with a lot of types that don't implement this interface.
This is an inherent problem with constrained generics. Haskell provides a solution for this problem but this requires a much more powerful and complicated generics mechanism.
Notice that something similar is true for IComparable in comparison with containers, yet there is support for this, using Comparer<T>.Default if necessary.
Dictionary<T,T>.Equals is inherited from Object.Equals and thus does a simple comparison of the object references.
Why do the generic collections not do value equality semantic? Because that may not be what you want. Sometimes you want to check if they're the same instance. What would the alternative do? Call Equals on each of the keys and values? What if these inherit Equals from Object? It wouldn't be a full deep comparison.
So it's up to you to provide some other semantic if and when necessary.
Dictionary<TKey, TValue> does not implement IEquatable, as such there is no formal way to determine/know what comparing one such dictionary to another with the same contents would actually produce.
In fact, Dictionary<TKey, TValue> does not implement any of the comparison interfaces at all.
In my opinion, to have two objects compare themselves is a rather special thing, so it would've made much more sense to put that into an interface than just to put a default typically-unwanted implementation in the base Object class. It should've been a more advertised feature of a class than something every object can do, albeit not quite the way you expect it to.
But there you have it. It's there, and you need to know when it's going to be used.
Like in this case.