I like the lock-free operation of the ConcurrentDictionary and use it in two objects:
ConcurrentDictionary<datetime,myObj> myIndexByDate
ConcurrentDictionary<myObjSummary, ConcurrentDictionary<int, myObj> myObjectSummary Index
These two objects need to stay in Sync. Is the only way to do this is to use a Lock, thus avoiding all benefits of the Concurrent dictionary?
I would create a custom class with 2 dictionaries and use a lock only on the methods which are susceptible to change the dictionary(Add, Delete).
You don't lose the benefits of the concurrent dictionary as this method require much less code than what you would have to do using normal dictionary.
ConcurrentDictionary is only "thread-safe" on the operations on a particular instance. e.g. while a ConcurrentDictionary.TryAdd() call is being invoked, no other threads can be modifying the collection...
This doesn't mean that while you get an value from one dictionary and add it to another dictionary that the value still exists in the original dictionary while you're adding it to the second.
You probably have an invariant that details that while you're moving one value from one dictionary to the other, no values in the original dictionary can be removed (or at least that value, but that's a little more difficult to guarantee with ConncurrentDictionary.
Related
I am using a ConcurrentDictionary to store log-lines, and when I need to display them to the user I call ToList() to generate a list. But the weird thing is that some users receive the most recent lines first in the list, while they should logically be last.
Is this because ConcurrentDictionary doesnt guarantee a persistent order on the IEnumerate interface, or what can be the reason?
No ConcurrentDictionary (and Dictionary<T> for that matter) does not guarantee the ordering of the keys in the list. You'll have to use a different data type or perform the sorting yourself. For non-concurrent code you would use SortedDictionary<T>, but I don't believe there is an analogue in the concurrent collections.
No. The list order of ConcurrentDictionary is NOT guaranteed, lines can come out in any order.
We're on a high traffic website where we want to do some very simple caching for a calculation using a ConcurrentDictionary, to prevent it from being done for every request. The number of possible inputs is limited enough and the calculation relatively heavy (splitting and rejoining a string).
I'm considering code like
string result;
if (!MyConCurrentDictionary.TryGetValue(someKey, out result))
{
result = DoCalculation(someKey);
// alternative 1: use Item property
MyConcurrentDictionary[someKey] = result;
//alternative 2: use TryAdd
MyConcurrentDictionary.TryAdd(someKey, result);
}
My question is: which alternative is the best choice from a performance perspective?
Your code is completely broken; you're assuming that nothing will change between those two lines.
You need to use .GetOrAdd().
In general, when dealing with a mutable concurrent object, you must never make multiple calls to the object for the same thing, since its state can change at any time.
As #SLaks indicated, your code has a race condition. ConcurrentDictionary was built to prevent such scenarios by providing methods that perform complex operations atomically, such as GetOrAdd and AddOrUpdate.
Here's a sequence that describes how your code could break:
Thread 1 executes TryGetValue which returns false
Thread 2 does the same
Both threads do the calculation
Thread 2 adds the value to the dictionary and returns the result
Thread 1 either
(using the indexer setter) adds another value, overwriting the previous one and returns it
(using TryAdd) doesn't add it and the method returns a result that's potentially different than the one in the dictionary
Thread 2 now has a result that's potentially different than the one in the dictionary
So you can see how not using GetOrAdd makes things a lot more complex and could have potentially disastrous results. It's possible that in your case if you're deterministically calculating a string from the key it won't matter - but there's really no reason not to use this method. It also simplifies the code, and may (marginally) improve performance since it calculates the hash-code just once.
Another conclusion is choosing between the indexer and TryAdd is much more a question of correctness than of performance.
That said, the performance of both the indexer [] and TryAdd methods is identical, since both share the same implementation (TryAddInternal), which works by taking the lock belonging to the key's bucket, then checking if the key exists in the dictionary, and either updating the dictionary or not.
Lastly, here's an example that shows how to build methods like GetOrAdd correctly.
I have a dictionary with a fixed collection of keys, which I create at the beginning of the program. Later, I have some threads updating the dictionary with values.
No pairs are added or removed once the threads started.
Each thread has its own key. meaning, only one thread will access a certain key.
the thread might update the value.
The question is, should I lock the dictionary?
UPDATE:
Thanks all for the answers,
I tried to simplify the situation when i asked this question, just to understand the behaviour of the dictionary.
To make myself clear, here is the full version:
I have a dictionary with ~3000 entries (fixed keys), and I have more than one thread accessing the key (shared resourse), but I know for a fact that only one thread is accessing a key entry at a time.
so, should I lock the dictionary? and - when you have the full version now, is a dictionary the right choise at all?
Thanks!
FROM MSDN
A Dictionary can support multiple readers concurrently, as long as the collection is not modified.
To allow the collection to be accessed by multiple threads for reading and writing, you must implement your own synchronization.
For a thread-safe alternative, see ConcurrentDictionary<TKey, TValue>.
Let's deal with your question one interpretation at a time.
First interpretation: Given how Dictionary<TKey, TValue> is implemented, with the context I've given, do I need to lock the dictionary?
No, you don't.
Second interpretation: Given how Dictionary<TKey, TValue is documented, with the context I've given, do I need to lock the dictionary?
Yes, you definitely should.
There is no guarantee that the access, which might be OK today, will be OK tomorrow, in a multithreaded world since the type is documented as not threadsafe. This allows the programmers to make certain assumptions about the state and integrity of the type that they would otherwise have to build in guarantees for.
A hotfix or update to .NET, or a whole new version, might change the implementation and make it break and this is your fault for relying on undocumented behavior.
Third interpretation: Given the context I've given, is a dictionary the right choice?
No it isn't. Either switch to a threadsafe type, or simply don't use a dictionary at all. Why not just use a variable per thread instead?
Conclusion: If you intend to use the dictionary, lock the dictionary. If it's OK to switch to something else, do it.
Use a ConcurrentDictionary, don't reinvent the wheel.
Better still, refactor your code to avoid this unecessary contention.
If there is no communication between the threads you could just do something like this:
assuming a function that changes a value.
private static KeyValuePair<TKey, TValue> ValueChanger<TKey, TValue>(
KeyValuePair<TKey, TValue> initial)
{
// I don't know what you do so, i'll just return the value.
return initial;
}
lets say you have some starting data,
var start = Enumerable.Range(1, 3000)
.Select(i => new KeyValuePair<int, object>(i, new object()));
you could process them all at once like this,
var results = start.AsParallel().Select(ValueChanger);
when, results is evaluated, all 3000 ValueChangers will run concurrently, yielding a IEnumerable<KeyValuePair<int, object>>.
There will be no interaction between the threads, thus no possible concurrency problems.
If you want to turn the results into a Dictionary you could,
var resultsDictionary = results.ToDictionary(p => p.Key, p => p.Value);
This may or may not be useful in your situation but, without more detail its hard to say.
If each thread access only one "value" and if you dont care about others I'll say you dont need a Dictionary at all. You can use ThreadLocal or ThreadStatic variables.
If at all you need a Dictionary you definitely need a lock.
If you're in .Net 4.0 or above I'll strongly suggest you to use ConcurrentDictionary, you don't need to synchronize access when using ConcurrentDictionary because it is already "ThreadSafe".
The Diectionary is not thread safe but in your code you do not have to do that; you said one thread update one value so you do not have multi threading problem!
I do not have the code so I'm not sure 100%.
Also check this :Making dictionary access thread-safe?
If you're not adding keys, but simply modifying values, why not completely remove the need for writing directly to the Dictionary by storing complex objects as the value and modifying a value within the complex type. That way, you respect the thread safety constraints of the dictionary.
So:
class ValueWrapper<T>
{
public T Value{get;set;}
}
//...
var myDic = new Dictionary<KeyType, ValueWrapper<ValueType>>();
//...
myDic[someKey].Value = newValue;
You're now not writing directly to the dictionary but you can modify values.
Don't try to do the same with keys. Necessarily, they should be immutable
Given the constraint "I know for a fact that only one thread is accessing a key entry at a time", I don't think you have any problem.
Possible modifications of a Dictionary are: add, update and remove.
If the Dictionary is modified or allowed to be modified, you must use a synchronization mechanism of choice to eliminate the potential race condition, in which one thread reads the old dirty value while a second thread is currently replacing the value/updating the key.
To safe you some work, use the ConcurentDictionary in this scenario.
If the Dictionary is never modified after creation, there won't be any race conditions. A synchronization is therefore not required.
This is a special scenario in which you can replace the table with a read-only table. To add the important robustness, like guarding against potential bugs by accidentally manipulating the table, you should make the Dictionary immutable (or read-only). To give the developer compiler support, such an immutable implementation must throw an exception on any manipulation attempts.
To safe you some work, you can use the ReadOnlyDictionary in this scenario. Note that the underlying Dictionary of the ReadOnlyDictionary is still mutable and that its changes are propagated to the ReadOnlyDictionary facade. The ReadOnlyDictionary only helps to ensure that the table is not accidentally modified by its consumers.
This means: Dictionary is never an option in a multithreaded context.
Rather use the ConcurrentDictionary or a synchronization mechanism in general (or use the ReadOnlyDictionary if you can guarantee that the original source collection never changes).
Since you allow and expect manipulations of the table ("[...] the thread might update the value"), you must use a synchronization mechanism of choice or the ConcurrentDictionary.
I use a ConcurrentDictioanry<string, HashSet<string>> to access some data across many threads.
I read in this article (scroll down) that the method AddOrUpdate is not executed in the lock, so it could endanger thread-safety.
My code is as follows:
//keys and bar are not the concern here
ConcurrentDictioanry<string, HashSet<string>> foo = new ...;
foreach(var key in keys) {
foo.AddOrUpdate(key, new HashSet<string> { bar }, (key, val) => {
val.Add(bar);
return val;
});
}
Should I enclose the AddOrUpdate call in a lock statement in order to be sure everything is thread-safe?
Locking during AddOrUpdate on its own wouldn't help - you'd still have to lock every time you read from the set.
If you're going to treat this collection as thread-safe, you really need the values to be thread-safe too. You need a ConcurrentSet, ideally. Now that doesn't exist within the framework (unless I've missed something) but you could probably create your own ConcurrentSet<T> which used a ConcurrentDictionary<T, int> (or whatever TValue you like) as its underlying data structure. Basically you'd ignore the value within the dictionary, and just treat the presence of the key as the important part.
You don't need to implement everything within ISet<T> - just the bits you actually need.
You'd then create a ConcurrentDictionary<string, ConcurrentSet<string>> in your application code, and you're away - no need for locking.
You'll need to fix this code, it creates a lot of garbage. You create a new HashSet even if none is required. Use the other overload, the one that accepts the valueFactory delegate. So the HashSet is only created when the key isn't yet present in the dictionary.
The valueFactory might be called multiple times if multiple threads concurrently try to add the same value of key and it is not present. Very low odds but not zero. Only one of these hashsets will be used. Not a problem, creating the HashSet has no side effects that could cause threading trouble, the extra copies just get garbage collected.
The article states that the add delegate is not executed in the dictionary's lock, and that the element you get might not be the element created in that thread by the add delegate. That's not a thread safety issue; the dictionary's state will be consistent and all callers will get the same instance, even if a different instance was created for each of them (and all but one get dropped).
Seems the better answer would be to use Lazy, per this article on the methods that pass in a delegate.
Also another good article Here on Lazy loading the add delegate.
I have four level data structure defined like this:
Dictionary<Type1, Dictionary<Type2, Dictionary<Type3, List<Type4>>>>
The whole thing is encapsulated in a class which also maintains thread-safety. Currently it just locks whole collection while it reads/manipulates the data (reading is by orders of magnitude more common than writing).
I was thinking of replacing the Dictionary with ConcurrentDictionary and List with ConcurrentBag (its items don't have to be ordered).
If I do so, can I just eliminate the locks and be sure the concurrent collections will do their job correctly?
I'm nearly a year late to the question.. but just in case anyone finds themselves in a similar position to Matěj Zábský, ask yourself:
Can you use a Dictionary<Tuple<Type1, Type2, Type3>, List<Type4>> instead?
Considerably easier to work with, and considering that hash tables (ie Dictionaries) are O(1) data structures with a somewhat hefty constant component (even more so if you move to a ConcurrentDictionary) it'd likely perform faster too. It'd also use less memory, and be pretty trivial to convert to a ConcurrentDictionary.
Of course if you need to enumerate all of a given Type2 for a given Type1 key, the nested dictionaries is possibly the way to go. But is that a requirement?
The concurrent collections will prevent data corruption and crashes, but the code won't be semantically equivalent to your current one. For example, if you iterate one of the concurrent dictionaries, some of the items may belong to different updates:
The enumerator returned from the
dictionary is safe to use concurrently
with reads and writes to the
dictionary, however it does not
represent a moment-in-time snapshot of
the dictionary. The contents exposed
through the enumerator may contain
modifications made to the dictionary
after GetEnumerator was called.
If you want to maintain the exact behavior you have now, yet save on the cost of locking, you may want to lock with ReaderWriterLockSlim which is especially suited for cases with more reads than writes.