I'm always confused on which one of these to pick. As I see it I use Dictionary over List if I want two data types as a Key and Value so I can easily find a value by its key but I am always confused if I should use a ConcurrentDictionary or Dictionary?
Before you go off at me for not putting much research in to this I have tried, but it seems google hasn't really got anything on Dictionary vs ConcurrentDictionary but has something on each one individually.
I have asked a friend this before but all they said is: "use ConcurrentDictionary if you use your dictionary a lot in code" and I didn't really want to pester them in to explaining it in larger detail. Could anyone expand on this?
"Use ConcurrentDictionary if you use your dictionary in a lot in code" is kind of vague advice. I don't blame you for the confusion.
ConcurrentDictionary is primarily for use in an environment where you're updating the dictionary from multiple threads (or async tasks). You can use a standard Dictionary from as much code as you like if it's from a single thread ;)
If you look at the methods on a ConcurrentDictionary, you'll spot some interesting methods like TryAdd, TryGetValue, TryUpdate, and TryRemove.
For example, consider a typical pattern you might see for working with a normal Dictionary class.
// There are better ways to do this... but we need an example ;)
if (!dictionary.ContainsKey(id))
dictionary.Add(id, value);
This has an issue in that between the check for whether it contains a key and calling Add a different thread could call Add with that same id. When this thread calls Add, it'll throw an exception. The method TryAdd handles that for you and will return a true/false telling you whether it added it (or whether that key was already in the dictionary).
So unless you're working in a multi-threaded section of code, you probably can just use the standard Dictionary class. That being said, you could theoretically have locks to prevent concurrent access to a dictionary; that question is already addressed in "Dictionary locking vs. ConcurrentDictionary".
The biggest reason to use ConcurrentDictionary over the normal Dictionary is thread safety. If your application will get multiple threads using the same dictionary at the same time, you need the thread-safe ConcurrentDictionary, this is particularly true when these threads are writing to or building the dictionary.
The downside to using ConcurrentDictionary without the multi-threading is overhead. All those functions that allow it to be thread-safe will still be there, all the locks and checks will still happen, taking processing time and using extra memory.
ConcurrentDictionary is useful when you need to access a dictionary across multiple threads (i.e. multithreading). Vanilla Dictionary objects do not possess this capability and therefore should only be used in a single-threaded manner.
A ConcurrentDictionary is useful when you want a high-performance dictionary that can be safely accessed by multiple threads concurrently. Compared to a standard Dictionary protected with a lock, it is more efficient under heavy usage because of its granular locking implementation. Instead of all threads competing for a single lock, the ConcurrentDictionary maintains multiple locks internally, minimizing this way the contention, and limiting the possibility of becoming a bottleneck.
Despite these nice characteristics, the number of scenarios where using a ConcurrentDictionary is the best option is actually quite small. There are two reasons for that:
The thread-safety guaranties offered by the ConcurrentDictionary are limited to the protection of its internal state. That's it. If you want to do anything slightly non-trivial, like for example updating the dictionary and another variable as an atomic operation, you are out of luck. This is not a supported scenario for a ConcurrentDictionary. Even protecting the elements it contains (in case they are mutable objects) is not supported. If you try to update one of its values using the AddOrUpdate method, the dictionary will be protected but the value will not. The Update in this context means replace the existing value with another one, not modify the existing value.
Whenever you find tempting to use a ConcurrentDictionary, there are usually better alternatives available. Alternatives that do not involve shared state, which is what a ConcurrentDictionary essentially is. No matter how efficient is its locking scheme, it will have a hard time beating an architecture where there is no shared state at all, and each thread does its own thing without interfering with the other threads. Commonly used libraries that follow this principle are the PLINQ and the TPL Dataflow library. Below is a PLINQ example:
Dictionary<string, Product> dictionary = productIDs
.AsParallel()
.Select(id => GetProduct(id))
.ToDictionary(product => product.Barcode);
Instead of creating a dictionary beforehand, and then having multiple threads filling it concurrently with values, you can trust PLINQ to produce a dictionary utilizing more efficient strategies, involving partitioning of the initial workload, and assigning each partition to a different worker thread. A single thread will eventually aggregate the partial results, and fill the dictionary.
The accepted answer above is correct. However, it is worth mentioning explicitly if a dictionary is not being modified i.e. it is only ever read from, regardless of number of threads, then Dictionary<TKey,TValue> is preferred because no synchronization is required.
e.g. caching config in a Dictionary<TKey,TValue>, that is populated only once at startup and used throughout the application for the life of the application.
When to use a thread-safe collection : ConcurrentDictionary vs. Dictionary
If you are only reading key or values, the Dictionary<TKey,TValue> is faster because no synchronization is required if the dictionary is not being modified by any threads.
Related
Premise
I am working on a multiplayer game and I have a number of network controlled actors, the inputs of witch I decided to store in a volatile dictionary:
public static volatile Dictionary<string, QuickMessage> quickPlayerMessages;
And each frame the actors fetch their values from that dictionary, but independently of that I also have a receiver thread that constantly updates that dictionary, so the values constantly get changed and keys may be deleted or added to it. I did a few tests and this works but...
My concerns
I do not fully understand the volatile modifier, I know that it is supposed to eliminate some optimizations that may result in the reading of a partially written information. I also heard that this isn't really functioning like if you did it with a lock mechanism. While I did pick this solution because it cut down a lot of complexity for me and it looks like it is fairly responsive(speedwise) I do have this feeling of unease about it.
Is there any issue that may arise or that I should be aware of from this approach?
volatile is probably not the correct approach for this. What volatile does is forcing the value to be read from memory rather than from cache. Reading or writing a reference, or a value smaller than a IntPtr is atomic in c#. But without some kind of memory barrier the value could be cached, and changes on one thread might not be visible for other threads. Volatile inserts some memory barriers to solve some of the visibility issues.
In this scenario you do not seem to replace the dictionary, so volatile would do nothing. You are however mentioning you are updating the dictionary. This is not threadsafe without a lock. But you might however consider a ConcurrentDictionary<TKey, TValue> instead. A concurrent dictionary would however only guarantee that the dictionary itself is thread safe, you will still need to consider the overall thread safety of your program.
Also, your primary concern should be that the program is correct, speed is secondary.
I'm maintaining a legacy application that uses strings to lock values in a cache. It does so something like this:
object Cache(string key, Func<object> createObjToCache)
{
object result = Get(key);
if (result == null)
{
string internKey = string.Intern(key);
lock (internKey) {
result = Get(key);
if (result == null)
{
result = createObjToCache();
Add(key, result);
}
}
}
return result;
}
I've two questions about this code. First is string.Intern() thread safe? Is it possible that two threads on two separate CPUs with two identical strings would return different references? If not is that a possible bottle neck, does string.Intern block?
Secondly I'm concerned that this application might be using a huge number of strings as keys. I'd like to be able to monitor the amount of memory that the intern pool uses to store all these strings, but I can't find a performance counter for this on .Net Memory. Is there one somewhere else?
NOTE:
I'm aware that this implementation sucks. However I need to make the case to management before re-writing what they see as a critical bit of code. Hence I could use facts and stats on exactly how bad it sucks rather than alternative solutions.
Also Get() and Add() are not in the original code. I've replaced the original code to keep this question simple. We can assume that Add() will not fail if it is called twice with the same or different keys.
MSDN does not make any mention of thread-safety on string.Intern, so you're right in that it is very undefined what would happen if two threads called Intern for a new key at exactly the same time. I want to say "it'll probably work OK", but that isn't a guarantee. There is no guarantee AFAIK. The implementation is extern, so peeking at the implementation means looking at the runtime itself.
Frankly, there are so many reasons not to do this that it is hard to get excited about answering these specific questions. I'd be tempted to look at some kind of Dictionary<string,object> or ThreadSafeDictionary<string,object> (where the object here is simply a new object() that I can use for the lock) - without all the issues related to string.Intern. Then I can a: query the size, b: discard it at whim, c: have parallel isolated containers, etc.
First is string.Intern() thread safe?
Unless something has changed (my info on this is quite old, and I'm not curious enough to take a look at the current implementation), yes. This however is about the only good thing with this idea.
Indeed, it's not fully a good thing. string.Intern() locks globally which is one of the things that can make it slow.
Secondly I'm concerned that this application might be using a huge number of strings as keys.
If that cache lives forever then that's an issue (or not if the memory use is sufficiently low) whether you intern or not. In which case have the wrong approach to the right potential issue to investigate:
I'd like to be able to monitor the amount of memory that the intern pool uses to store all these strings,
If they weren't interned but still lived forever in that cache, then if you stopped interning, you'd still be the same that amount of memory for the strings themselves, and the extra memory overhead of the interning wouldn't really be the issue.
There are a few reasons why one might want to intern a key, and not all of them are even bad (if the strings being interned are going to all appear regularly throughout the lifetime of the application then interning could even reduce memory use), but it seems here that the reason is to make sure that the key locked on is the same instance that another attempt to use the same string would use.
This might be thread safety at the wrong place, if Add() isn't thread-safe enough to guarantee that two simultaneous insertions of different keys can't put it into an invalid state (if Add() isn't explicitly thread-safe, then it does not make this guarantee).
If the cache is threadsafe, then this is likely extra thread safety for no good reason. Since objToCache has already been created and races will result in one being thrown away, it might be fine to let them race and have a brief period of two objToCache existing before one is collected. If not then MemoryCache.AddOrGetExisting or ConcurrentDictionary.GetOrAdd deal with this issue much better than this.
So I have Dictionary<string, SomeClass> which will be accessed heavily by multiple concurrent threads - some will write, most will read. No locks, no synchronization - worker thread will make no checks - simply read or write. The only guarantee is that no two threads will write value with same key. The question is could this data structure become corrupted this way? By corrupted I mean not working anymore even with one thread.
could this data structure become corrupted this way?
Yes, most likely you would get an IndexOutOfRange or similar exception.
And even when you catch and ignore the exceptions, you would not get reliable data any more. Both duplicates and missing values are possible.
So just don't do this.
The worst-case scenarios include:
NullReferenceException or IndexOutOfRangeException thrown out of a Dictionary<,> method.
An arbitrary amount of data is lost. If two threads attempt to resize the Dictionary<,> table at the same time, they can stomp on each other, screw up, and lose data.
Wrong answer is returned by a read from the Dictionary<,>.
Basically, the Dictionary<,> can do just about anything bad you can think of, within the limits imposed by the CLR. Presumably, you still won't break type safety or corrupt the heap as you could in a native programming language. Probably, anyways :-)
If you're accessing a collection from multiple threads, it would be safest to use one of the threadsafe varieties in .Net 4, such as ConcurrentDictionary
Basically, if I want to do the following:
public class SomeClass
{
private static ConcurrentDictionary<..., ...> Cache { get; set; }
}
Does this let me avoid using locks all over the place?
Yes, it is thread safe and yes it avoids you using locks all over the place (whatever that means). Of course that will only provide you a thread safe access to the data stored in this dictionary, but if the data itself is not thread safe then you need to synchronize access to it of course. Imagine for example that you have stored in this cache a List<T>. Now thread1 fetches this list (in a thread safe manner as the concurrent dictionary guarantees you this) and then starts enumerating over this list. At exactly the same time thread2 fetches this very same list from the cache (in a thread safe manner as the concurrent dictionary guarantees you this) and writes to the list (for example it adds a value). Conclusion: if you haven't synchronized thread1 it will get into trouble.
As far as using it as a cache is concerned, well, that's probably not a good idea. For caching I would recommend you what is already built into the framework. Classes such as MemoryCache for example. The reason for this is that what is built into the System.Runtime.Caching assembly is, well, explicitly built for caching => it handles things like automatic expiration of data if you start running low on memory, callbacks for cache expiration items, and you would even be able to distribute your cache over multiple servers using things like memcached, AppFabric, ..., all things that you would can't dream of with a concurrent dictionary.
You still may need to use locking in the same way that you might need a transaction in a database. The "concurrent" part means that the dictionary will continue to function correctly across multiple threads.
Built into the concurrent collection are TryGetValue and TryRemove which acknowledge that someone might delete an item first. Locking at a granular level is built in, but you still need to think about what to do in these situations. For Caching, it often doesn't matter -- i.e. it's an idempotent operation.
re: caching. I feel that it depends on what you are storing in the cache + what you are doing with it. There are casting costs associated with using an object. Probably for most web based things MemCache is better suited as suggested above.
As MSDN says
ConcurrentDictionary<TKey, TValue> Class Represents a thread-safe collection of key-value pairs that can be accessed by multiple threads concurrently.
But as I know, System.Collections.Concurrent classes are designed for PLINQ.
I have Dictionary<Key,Value> which keeps on-line clients in the server, and I make it thread safe by locking object when I have access to it.
Can I safely replace Dictionary<TKey,TValue> by ConcurrentDictionary<TKey,TValue> in my case? will the performance increased after replacement?
Here in Part 5 Joseph Albahari mentioned that it designed for Parallel programming
The concurrent collections are tuned for parallel programming. The conventional collections outperform them in all but highly concurrent scenarios.
A thread-safe collection doesn’t guarantee that the code using it will be thread-safe.
If you enumerate over a concurrent collection while another thread is modifying it, no exception is thrown. Instead, you get a mixture of old and new content.
There’s no concurrent version of List.
The concurrent stack, queue, and bag classes are implemented internally with linked lists. This makes them less memory-efficient than the nonconcurrent Stack and Queue classes, but better for concurrent access because linked lists are conducive to lock-free or low-lock implementations. (This is because inserting a node into a linked list requires updating just a couple of references, while inserting an element into a List-like structure may require moving thousands of existing elements.)
Without knowing more about what you're doing within the lock, then it's impossible to say.
For instance, if all of your dictionary access looks like this:
lock(lockObject)
{
foo = dict[key];
}
... // elsewhere
lock(lockObject)
{
dict[key] = foo;
}
Then you'll be fine switching it out (though you likely won't see any difference in performance, so if it ain't broke, don't fix it). However, if you're doing anything fancy within the lock block where you interact with the dictionary, then you'll have to make sure that the dictionary provides a single function that can accomplish what you're doing within the lock block, otherwise you'll end up with code that is functionally different from what you had before. The biggest thing to remember is that the dictionary only guarantees that concurrent calls to the dictionary are executed in a serial fashion; it can't handle cases where you have a single action in your code that interacts with the dictionary multiple times. Cases like that, when not accounted for by the ConcurrentDictionary, require your own concurrency control.
Thankfully, the ConcurrentDictionary provides some helper functions for more common multi-step operations like AddOrUpdate or GetOrAdd, but they can't cover every circumstance. If you find yourself having to work to shoehorn your logic into these functions, it may be better to handle your own concurrency.
It's not as simple as replacing Dictionary with ConcurrentDictionary, you'll need to adapt your code, as these classes have new methods that behave differently, in order to guarantee thread-safety.
Eg., instead of calling Add or Remove, you have TryAdd and TryRemove. It's important you use these methods that behave atomically, as if you make two calls where the second is reliant on the outcome of the first, you'll still have race conditions and need a lock.
You can replace Dictionary<TKey, TValue> with ConcurrentDictionary<TKey, TValue>.
The effect on performance may not be what you want though (if there is a lot of locking/synchronization, performance may suffer...but at least your collection is thread-safe).
While I'm unsure about replacement difficulties, but if you have anywhere where you need to access multiple elements in the dictionary in the same "lock session" then you'll need to modify your code.
It could give improved performance if Microsoft has given separate locks for read and write, since read operations shouldn't block other read operations.
Yes you can safely replace, however dictionary designed for plinq may have some extra code for added functionality that you may not use. But the performance overhead will be marginally very small.