I need to implement concurrent Dictionary because .Net does not contain concurrent implementation for collections(Since .NET4 will be contains). Can I use for it "Power Threading Library" from Jeffrey Richter or present implemented variants or any advice for implemented?
Thanks ...
I wrote a thread-safe wrapper for the normal Dictionary class that uses Interlocked to protect the internal dictionary. Interlocked is by far the fastest locking mechanism available and will give much better performance than ReaderWriterLockSlim, Monitor or any of the other available locks.
The code was used to implement a Cache class for Fasterflect, which is a library to speed up reflection. As such we tried a number of different approaches in order to find the fastest possible solution. Interestingly, the new concurrent collections in .NET 4 are noticeably faster than my implementation, although both are pretty darn fast compared to solutions using a less performance locking mechanism. The implementation for .NET 3.5 is located inside a conditional region in the bottom half of the file.
You can use Reflector to view the source code of the concurrent implementation of .NET 4.0 RC and copy it to your own code. This way you will have the least problems when migrating to .NET 4.0.
I wrote a concurrent dictionary myself (prior to .NET 4.0's System.Collections.Concurrent namespace); there's not much to it. You basically just want to make sure certain methods are not getting called at the same time, e.g., Contains and Remove or something like that.
What I did was to use a ReaderWriterLock (in .NET 3.5 and above, you could go with ReaderWriterLockSlim) and call AcquireReaderLock for all "read" operations (like this[TKey], ContainsKey, etc.) and AcquireWriterLock for all "write" operations (like this[TKey] = value, Add, Remove, etc.). Be sure to wrap any calls of this sort in a try/finally block, releasing the lock in the finally.
It's also a good idea to modify the behavior of GetEnumerator slightly: rather than enumerate over the existing collection, make a copy of it and allow enumeration over that. Otherwise you'll face potential deadlocks.
Here's a simple implementation that uses sane locking (though Interlocked would likely be faster):
http://www.tech.windowsapplication1.com/content/the-synchronized-dictionarytkey-tvalue
Essentially, just create a Dictionary wrapper/decorator and synchronize access to any read/write actions.
When you switch to .Net 4.0, just replace all of your overloads with delegated calls to the underlying ConcurrentDictionary.
Related
I'm always confused on which one of these to pick. As I see it I use Dictionary over List if I want two data types as a Key and Value so I can easily find a value by its key but I am always confused if I should use a ConcurrentDictionary or Dictionary?
Before you go off at me for not putting much research in to this I have tried, but it seems google hasn't really got anything on Dictionary vs ConcurrentDictionary but has something on each one individually.
I have asked a friend this before but all they said is: "use ConcurrentDictionary if you use your dictionary a lot in code" and I didn't really want to pester them in to explaining it in larger detail. Could anyone expand on this?
"Use ConcurrentDictionary if you use your dictionary in a lot in code" is kind of vague advice. I don't blame you for the confusion.
ConcurrentDictionary is primarily for use in an environment where you're updating the dictionary from multiple threads (or async tasks). You can use a standard Dictionary from as much code as you like if it's from a single thread ;)
If you look at the methods on a ConcurrentDictionary, you'll spot some interesting methods like TryAdd, TryGetValue, TryUpdate, and TryRemove.
For example, consider a typical pattern you might see for working with a normal Dictionary class.
// There are better ways to do this... but we need an example ;)
if (!dictionary.ContainsKey(id))
dictionary.Add(id, value);
This has an issue in that between the check for whether it contains a key and calling Add a different thread could call Add with that same id. When this thread calls Add, it'll throw an exception. The method TryAdd handles that for you and will return a true/false telling you whether it added it (or whether that key was already in the dictionary).
So unless you're working in a multi-threaded section of code, you probably can just use the standard Dictionary class. That being said, you could theoretically have locks to prevent concurrent access to a dictionary; that question is already addressed in "Dictionary locking vs. ConcurrentDictionary".
The biggest reason to use ConcurrentDictionary over the normal Dictionary is thread safety. If your application will get multiple threads using the same dictionary at the same time, you need the thread-safe ConcurrentDictionary, this is particularly true when these threads are writing to or building the dictionary.
The downside to using ConcurrentDictionary without the multi-threading is overhead. All those functions that allow it to be thread-safe will still be there, all the locks and checks will still happen, taking processing time and using extra memory.
ConcurrentDictionary is useful when you need to access a dictionary across multiple threads (i.e. multithreading). Vanilla Dictionary objects do not possess this capability and therefore should only be used in a single-threaded manner.
A ConcurrentDictionary is useful when you want a high-performance dictionary that can be safely accessed by multiple threads concurrently. Compared to a standard Dictionary protected with a lock, it is more efficient under heavy usage because of its granular locking implementation. Instead of all threads competing for a single lock, the ConcurrentDictionary maintains multiple locks internally, minimizing this way the contention, and limiting the possibility of becoming a bottleneck.
Despite these nice characteristics, the number of scenarios where using a ConcurrentDictionary is the best option is actually quite small. There are two reasons for that:
The thread-safety guaranties offered by the ConcurrentDictionary are limited to the protection of its internal state. That's it. If you want to do anything slightly non-trivial, like for example updating the dictionary and another variable as an atomic operation, you are out of luck. This is not a supported scenario for a ConcurrentDictionary. Even protecting the elements it contains (in case they are mutable objects) is not supported. If you try to update one of its values using the AddOrUpdate method, the dictionary will be protected but the value will not. The Update in this context means replace the existing value with another one, not modify the existing value.
Whenever you find tempting to use a ConcurrentDictionary, there are usually better alternatives available. Alternatives that do not involve shared state, which is what a ConcurrentDictionary essentially is. No matter how efficient is its locking scheme, it will have a hard time beating an architecture where there is no shared state at all, and each thread does its own thing without interfering with the other threads. Commonly used libraries that follow this principle are the PLINQ and the TPL Dataflow library. Below is a PLINQ example:
Dictionary<string, Product> dictionary = productIDs
.AsParallel()
.Select(id => GetProduct(id))
.ToDictionary(product => product.Barcode);
Instead of creating a dictionary beforehand, and then having multiple threads filling it concurrently with values, you can trust PLINQ to produce a dictionary utilizing more efficient strategies, involving partitioning of the initial workload, and assigning each partition to a different worker thread. A single thread will eventually aggregate the partial results, and fill the dictionary.
The accepted answer above is correct. However, it is worth mentioning explicitly if a dictionary is not being modified i.e. it is only ever read from, regardless of number of threads, then Dictionary<TKey,TValue> is preferred because no synchronization is required.
e.g. caching config in a Dictionary<TKey,TValue>, that is populated only once at startup and used throughout the application for the life of the application.
When to use a thread-safe collection : ConcurrentDictionary vs. Dictionary
If you are only reading key or values, the Dictionary<TKey,TValue> is faster because no synchronization is required if the dictionary is not being modified by any threads.
As MSDN says
ConcurrentDictionary<TKey, TValue> Class Represents a thread-safe collection of key-value pairs that can be accessed by multiple threads concurrently.
But as I know, System.Collections.Concurrent classes are designed for PLINQ.
I have Dictionary<Key,Value> which keeps on-line clients in the server, and I make it thread safe by locking object when I have access to it.
Can I safely replace Dictionary<TKey,TValue> by ConcurrentDictionary<TKey,TValue> in my case? will the performance increased after replacement?
Here in Part 5 Joseph Albahari mentioned that it designed for Parallel programming
The concurrent collections are tuned for parallel programming. The conventional collections outperform them in all but highly concurrent scenarios.
A thread-safe collection doesn’t guarantee that the code using it will be thread-safe.
If you enumerate over a concurrent collection while another thread is modifying it, no exception is thrown. Instead, you get a mixture of old and new content.
There’s no concurrent version of List.
The concurrent stack, queue, and bag classes are implemented internally with linked lists. This makes them less memory-efficient than the nonconcurrent Stack and Queue classes, but better for concurrent access because linked lists are conducive to lock-free or low-lock implementations. (This is because inserting a node into a linked list requires updating just a couple of references, while inserting an element into a List-like structure may require moving thousands of existing elements.)
Without knowing more about what you're doing within the lock, then it's impossible to say.
For instance, if all of your dictionary access looks like this:
lock(lockObject)
{
foo = dict[key];
}
... // elsewhere
lock(lockObject)
{
dict[key] = foo;
}
Then you'll be fine switching it out (though you likely won't see any difference in performance, so if it ain't broke, don't fix it). However, if you're doing anything fancy within the lock block where you interact with the dictionary, then you'll have to make sure that the dictionary provides a single function that can accomplish what you're doing within the lock block, otherwise you'll end up with code that is functionally different from what you had before. The biggest thing to remember is that the dictionary only guarantees that concurrent calls to the dictionary are executed in a serial fashion; it can't handle cases where you have a single action in your code that interacts with the dictionary multiple times. Cases like that, when not accounted for by the ConcurrentDictionary, require your own concurrency control.
Thankfully, the ConcurrentDictionary provides some helper functions for more common multi-step operations like AddOrUpdate or GetOrAdd, but they can't cover every circumstance. If you find yourself having to work to shoehorn your logic into these functions, it may be better to handle your own concurrency.
It's not as simple as replacing Dictionary with ConcurrentDictionary, you'll need to adapt your code, as these classes have new methods that behave differently, in order to guarantee thread-safety.
Eg., instead of calling Add or Remove, you have TryAdd and TryRemove. It's important you use these methods that behave atomically, as if you make two calls where the second is reliant on the outcome of the first, you'll still have race conditions and need a lock.
You can replace Dictionary<TKey, TValue> with ConcurrentDictionary<TKey, TValue>.
The effect on performance may not be what you want though (if there is a lot of locking/synchronization, performance may suffer...but at least your collection is thread-safe).
While I'm unsure about replacement difficulties, but if you have anywhere where you need to access multiple elements in the dictionary in the same "lock session" then you'll need to modify your code.
It could give improved performance if Microsoft has given separate locks for read and write, since read operations shouldn't block other read operations.
Yes you can safely replace, however dictionary designed for plinq may have some extra code for added functionality that you may not use. But the performance overhead will be marginally very small.
Does .Net has an equivalent for PosthreadMessage?
We presently use a List (for keeping the items), a lock (protecting the list) and an event (to notify the consumer thread that an item has been added to the list) for the same functionality.
Is there any optimized way for implementing this?
There are some concurrent collections in .NET 4.0 (System.Collections.Concurrent) that perhaps you could use instead of rolling your own thread-safe data structure? I'm not sure what your requirements are, and I'm not sure how your wanting to optimize your container has anything to do with making it equivalent to PostThreadMessage.
If you want, you can always use Managed C++ to expose PostThreadMessage to your .NET application. Or you can use PInvoke to call it from your app as well.
Doesn't look like there's much room for optimization here, unless you want to design it as single-threaded to avoid the context switch. This may anyway be preferred, from program logic point of view.
Is List<T> or HashSet<T> or anything else built in threadsafe for addition only?
My question is similar to Threadsafe and generic arraylist? but I'm only looking for safety to cover adding to this list threaded, not removal or reading from it.
System.Collections.Concurrent.BlockingCollection<T>
Link.
.NET 4.0 you could use the BlockingCollection<T>, but that is still designed to be thread safe for all operations, not just addition.
In general, it's uncommon to design a data structure that guarantees certain operations to be safe for concurrency and other to not be so. If you're concerned that there is an overhead when accessing a collection for reading, you should do some benchmarking before you go out of your way to look for specialized collections to deal with that.
Asking this question with C# tag, but if it is possible, it should be possible in any language.
Is it possible to implement a doubly linked list using Interlocked operations to provide no-wait locking? I would want to insert, add and remove, and clear without waiting.
Yes it's possible, here's my implementation of an STL-like Lock-Free Doubly-Linked List in C++.
Sample code that spawns threads to randomly perform ops on a list
It requires a 64-bit compare-and-swap to operate without ABA issues. This list is only possible because of a lock-free memory manager.
Check out the benchmarks on page 12. Performance of the list scales linearly with the number of threads as contention increases. The algorithm supports parallelism for disjoint accesses, so as the list size increases contention can decrease.
A simple google search will reveal many lock-free doubly linked list papers.
However, they are based on atomic CAS (compare and swap).
I don't know how atomic the operations in C# are, but according to this website
http://www.albahari.com/threading/part4.aspx
C# operations are only guaranteed to be atomic for reading and writing a 32bit field. No mention of CAS.
Here is a paper which discribes a lock free doublly linked list.
We present an efficient and practical
lock-free implementation of a
concurrent deque that is
disjoint-parallel accessible and uses
atomic primitives which are available
in modern computer systems. Previously
known lock-free algorithms of deques
are either based on non-available
atomic synchronization primitives,
only implement a subset of the
functionality, or are not designed for
disjoint accesses. Our algorithm is
based on a doubly linked list, and
only requires single-word
compare-and-swap...
Ross Bencina has some really good links I just found with numerious papers and source code excamples for "Some notes on lock-free and wait-free algorithms".
I don't believe this is possible, since you're having to set multiple references in one shot, and the interlocked operations are limited in their power.
For example, take the add operation - if you're inserting node B between A and C, you need to set B->next, B->prev, A->next, and C->prev in one atomic operation. Interlocked can't handle that. Presetting B's elements doesn't even help, because another thread could decide to do an insert while you're preparing "B".
I'd focus more on getting the locking as fine-grained as possible in this case, not trying to eliminate it.
Read the footnote - they plan to pull ConcurrentLinkedList from 4.0 prior to the final release of VS2010
Well you haven't actually asked how to do it. But, provided you can do an atomic CAS in c# it's entirely possible.
In fact I'm just working through an implementation of a doubly linked wait free list in C++ right now.
Here is paper describing it.
http://www.cse.chalmers.se/~tsigas/papers/Haakan-Thesis.pdf
And a presentation that may also provide you some clues.
http://www.ida.liu.se/~chrke/courses/MULTI/slides/Lock-Free_DoublyLinkedList.pdf
It is possible to write lock free algorithms for all copyable data structures on most architectures [1]. But it is hard to write efficient ones.
I wrote an implementation of the lock-free doubly linked list by Håkan Sundell and Philippas Tsigas for .Net. Note, that it does not support atomic PopLeft due to the concept.
[1]: Maurice Herlihy: Impossibility and universality results for wait-freesynchronization (1988)
FWIW, .NET 4.0 is adding a ConcurrentLinkedList, a threadsafe doubly linked list in the System.Collections.Concurrent namespace. You can read the documentation or the blog post describing it.
I would say that the answer is a very deeply qualified "yes, it is possible, but hard". To implement what you're asking for, you'd basically need something that would compile the operations together to ensure no collisions; as such, it would be very hard to create a general implementation for that purpose, and it would still have some significant limitations. It would probably be simpler to create a specific implementation tailored to the precise needs, and even then, it wouldn't be "simple" by any means.