Suppose that I have a Dictionary<string, string>. The dictionary is declared as public static in my console program.
If I'm working with threads and I want to do foreach on this Dictionary from one thread but at the same time another thread want to add item to the dictionary. This would cause a bug here because we can't modify our Dictionary while we are running on it with a foreach loop in another thread.
To bypass this problem I created a lock statement on the same static object on each operation on the dictionary.
Is this the best way to bypass this problem? My Dictionary can be very big and I can have many threads that want to foreach on it. As it is currently, things can be very slow.
Try using a ConcurrentDictionary<TKey, TValue>, which is designed for this kind of scenario.
There's a nice tutorial here on how to use it.
The big question is: Do you need the foreach to be a snapshot?
If the answer is "no", then use a ConcurrentDictionary and you will probably be fine. (The one remaining question is whether the nature of your inserts and reads hit the striped locks in a bad way, but if that was the case you'd be finding normal reads and writes to the dictionary even worse).
However, because it's GetEnumerator doesn't provide a snapshot, it will not be enumerating the same start at the beginning as it is at the end. It could miss items, or duplicate items. The question is whether that's a disaster to you or not.
If it would be a disaster if you had duplicates, but not otherwise, then you can filter out duplicates with Distinct() (whether keyed on the keys or both the key and value, as required).
If you really need it to be a hard snapshot, then take the following approach.
Have a ConcurrentDictionary (dict) and a ReaderWriterLockSlim (rwls). On both reads and writes obtain a reader lock (yes even though you're writing):
public static void AddToDict(string key, string value)
{
rwls.EnterReadLock();
try
{
dict[key] = value;
}
finally
{
rwls.ExitReadLock();
}
}
public static bool ReadFromDict(string key, out string value)
{
rwls.EnterReadLock();
try
{
return dict.TryGetValue(key, out value);
}
finally
{
rwls.ExitReadLock();
}
}
Now, when we want to enumerate the dictionary, we acquire the write lock (even though we're reading):
public IEnumerable<KeyValuePair<string, string>> EnumerateDict()
{
rwls.EnterWriteLock();
try
{
return dict.ToList();
}
finally
{
rwls.ExitWriteLock();
}
}
This way we obtain the shared lock for reading and writing, because ConcurrentDictionary deals with the conflicts involved in that for us. We obtain the exclusive lock for enumerating, but just for long enough to obtain a snapshot of the dictionary in a list, which is then used only in that thread and not shared with any other.
With .NET 4 you get a fancy new ConcurrentDictionary. I think there are some .NET 3.5-based implementations floating around.
Yes, you will have a problem updating the global dictionary while an enumeration is running in another thread.
Solutions:
Require all users of the dictionary to acquire a mutex lock before accessing the object, and release the lock afterwards.
Use .NET 4.0's ConcurrentDictionary class.
Related
I am currently reading code delivered by our extern person and I don't understand this part of the code:
private ConcurrentDictionary<Int64, Person> users = new ConcurrentDictionary<Int64, Person>();
private Dictionary<String, Int64> connectionIndex = new Dictionary<String, Int64>();
public Boolean AddNewUser(Int64 userId, Person user) {
Boolean added = false;
lock (users) {
if (users.Select(X=>X.Key==userId).Count()>0)
{
Person usrtmp = new Person();
users.TryRemove(userId,out usrtmp)
}
added = users.TryAdd(userId, user);
if (added)
{
connectionIndex.Add(user.ConnectionId, userId);
}
}
return added;
}
Why "users concurrent dictionary" is locked before any operation on that dictionary is done? Is it necessary? From my point of view is lock statement unnecessary because ConcurrentDictionary contains thread-safe operations. Am I right?
Also, I know about "performance issue" when you use .Count() or Key, Value operations on ConcurrentDictionary. Is it LINQ statement OK in this point of view?
Thanks for the answers.
Why "users concurrent dictionary" is locked before any operation on that dictionary is done? Is it necessary?
The lock is certainly necessary given how they're using it. They're performing multiple operations on multiple different dictionaries, and it's important that no other thread interact with either dictionary in any way while that is happening. Doing that requires a lock.
The only way you could remove the lock is if you get to a point where the only usage of a shared data structure is one single method call on a single concurrent dictionary. Without knowing the exact requirements we can't know if that's possible, but if both dictionaries are needed, then it certainly isn't.
Now, given that you have put yourself in a position where you always need to lock over all access to the concurrent dictionary, there's no reason to use a concurrent dictionary and not a regular dictionary; you've already made the decision to use your own synchronization.
From my point of view is lock statement unnecessary because ConcurrentDictionary contains thread-safe operations. Am I right?
No. For starters, the non-concurrent dictionary cannot be accessed without a lock
Also, I know about "performance issue" when you use .Count() or Key, Value operations on ConcurrentDictionary. Is it LINQ statement OK in this point of view?
It's a horrible idea for lots of reasons. It's trying to do a linear search through a dictionary to see if a key exists. You should never be doing that. You should be using ContainsKey. Additionally, there's just no point if checking if the key is there before trying to remove it, you can just try to remove it and see what happens. It's also completely unsafe if you weren't already locking, as someone else could be changing the dictionary while you're iterating it, and it can change after you've searched it, making checking before doing anything pointless, as you can't assume that the thing that you just checked is true.
I have a key to task mapping and I need to run the task only if the task for the given is not already running. Pseudo code follows. I believe there is lot of scope for improvement. I'm locking on the map and hence almost serializing access to CacheFreshener. Is there a better way of doing this? We know that when I'm trying to lock a key k1, there is no point in cache freshener call for key k2 waiting for lock.
class CacheFreshener
{
private ConcurrentDictionary<string,bool> lockMap;
public RefreshData(string key, Func<string, bool> cacheMissAction)
{
lock(lockMap)
{
if (lockMap.ContainsKey(key))
{
// no-op
return;
}
else
{
lockMap.Add(key, true);
}
}
// if you are here means task is not already present
cacheMissAction(key);
lock(lockMap) // Do we need to lock here??
{
lockMap.Remove(key);
}
}
}
As requested, here is an elaborated explanation of what I was getting at relative to my comments…
The basic issue here seems to be the question of concurrency, i.e. two or more threads accessing the same object at a time. This is the scenario ConcurrentDictionary is designed for. If you use the IDictionary methods of ContainsKey() and Add() separately, then you would need explicit synchronization (but only for that operation…in this particular scenario it wouldn't strictly be needed when calling Remove()) to ensure these are performed as a single atomic operation. But the ConcurrentDictionary class anticipates this need, and includes the TryAdd() method to accomplish the same, without the explicit synchronization.
<aside>
It is not entirely clear to me the intent behind the code example as given. The code appears to be meant to only store an object in the "cache" for the duration of the invocation of the cacheMissAction delegate. The key is removed immediately after. So it does seem like it's not really caching anything per se. It just prevents more than one thread from being in the process of invoking cacheMissAction at a time (subsequent threads will fail to invoke it, but also cannot count on it having completed by the time their call to the RefreshData() method has completed).
</aside>
But taking the code example as given, it's clear that no explicit locking is actually required. The ConcurrentDictionary class already provides thread-safe access (i.e. non-corruption of the data structure when used concurrently from multiple threads), and it provides the TryAdd() method as a mechanism for adding a key (and its value, though here that's just always a bool literal of true) to the dictionary that will ensure that only one thread ever has a key in the dictionary at a time.
So we can rewrite the code to look like this instead and accomplish the same goal:
private ConcurrentDictionary<string,bool> lockMap;
public RefreshData(string key, Func<string, bool> cacheMissAction)
{
if (!lockMap.TryAdd(key, true))
{
return;
}
// if you are here means task was not already present
cacheMissAction(key);
lockMap.Remove(key);
}
No lock statement is needed for either the add or remove, as the TryAdd() handles the entire "check for key and add if not present" operation atomically.
I will note that using a dictionary to do the job of a set could be considered inefficient. If the collection is likely not to be large, it's no big deal, but I do find it odd that Microsoft chose to make the same mistake they made originally when in the pre-generics days you had to use the non-generic dictionary object Hashtable to store a set, before HashSet<T> came along. Now we have all these easy-to-use classes in System.Collections.Concurrent, but no thread-safe implementation of ISet<T> in there. Sigh…
That said, if you do prefer a somewhat more efficient approach in terms of storage (this is not necessarily a faster implementation, depending on the concurrent access patterns of the object), something like this would work as an alternative:
private HashSet<string> lockSet;
private readonly object _lock = new object();
public RefreshData(string key, Func<string, bool> cacheMissAction)
{
lock (_lock)
{
if (!lockSet.Add(key))
{
return;
}
}
// if you are here means task was not already present
cacheMissAction(key);
lock (_lock)
{
lockSet.Remove(key);
}
}
In this case, you do need the lock statement, because the HashSet<T> class is not inherently thread-safe. This is of course very similar to your original implementation, just using the more set-like semantics of HashSet<T> instead.
Consider the following code:
Dictionary<string, string> list = new Dictionary<string, string>();
object lockObj = new object();
public void MyMethod(string a) {
if (list.Contains(a))
return;
lock (lockObj) {
list.Add(a,"someothervalue");
}
}
Assuming I'm calling MyMethod("mystring") from different threads concurrently.
Would it be possible for more than one thread (we'll just take it as two) enter the if (!list.Contains(a)) statement in the same time (with a few CPU cycles differences), both threads get evaluated as false and one thread enters the critical region while another gets locked outside, so the second thread enters and add "mystring" to the list again after the first thread exits, resulting in the dictionary trying to add a duplicate key?
No, it's not thread-safe. You need the lock around the list.Contains too as it is possible for a thread to be switched out and back in again between the the if test and adding the data. Another thread may have added data in the meantime.
You need to lock the entire operation (check and add) or multiple threads may attempt to add the same value.
I would recommend using the ConcurrentDictionary(TKey, TValue) since it is designed to be thread safe.
private readonly ConcurrentDictionary<string, string> _items
= new ConcurrentDictionary<string, string>();
public void MyMethod(string item, string value)
{
_items.AddOrUpdate(item, value, (i, v) => value);
}
You need to lock around the whole statement. It's possible for you to run into issues on the .Contains portion (the way your code is now)
You should check the list after locking. e.g.
if (list.Contains(a))
return;
lock (lockObj) {
if (list.Contains(a))
return;
list.Add(a);
}
}
private Dictionary<string, string> list = new Dictionary<string, string>();
public void MyMethod(string a) {
lock (list) {
if (list.Contains(a))
return;
list.Add(a,"someothervalue");
}
}
Check out this guide to locking, it's good
A few guidelines to bear in mind
Generally lock around a private static object when locking on multiple writeable values
Do not lock on things with scope outside the class or local method such as lock(this), which could lead to deadlocks!
You may lock on the object being changed if it is the only concurrently accessed object
Ensure the object you lock is not null!
You can only lock on reference types
I am going to assume that you meant write ContainsKey instead of Contains. Contains on a Dictionary is explicitly implemented so it is not accessible via the type you declared.1
Your code is not safe. The reason is because there is nothing preventing ContainsKey and Add from executing at the same time. There are actually some quite remarkable failure scenarios that this would introduce. Because I looked at how the Dictionary is implemented I can see that your code could cause a situation where data structure contains duplicates. And I mean it literally contains duplicates. The exception will not necessarily be thrown. The other failure scenarios just keep getting stranger and stranger, but I will not go into those here.
One trivial modification to your code might involve a variation of the double-checked locking pattern.
public void MyMethod(string a)
{
if (!dictionary.ContainsKey(a))
{
lock (dictionary)
{
if (!dictionary.ContainsKey(a))
{
dictionary.Add(a, "someothervalue");
}
}
}
}
This, of course, is not any safer for the reason I already stated. Actually, the double-checked locking pattern is notoriously difficult to get right in all but the simplest cases (like the canonical implementation of a singleton). There are many variations on this theme. You can try it with TryGetValue or the default indexer, but ultimately all of these variations are just dead wrong.
So how could this be done correctly without taking a lock? You could try ConcurrentDictionary. It has the method GetOrAdd which is really useful in these scenarios. Your code would look like this.
public void MyMethod(string a)
{
// The variable 'dictionary' is a ConcurrentDictionary.
dictionary.GetOrAdd(a, "someothervalue");
}
That is all there is to it. The GetOrAdd function will check to see if the item exists. If it does not then it will be added. Otherwise, it will leave the data structure alone. This is all done in a thread-safe manner. In most cases the ConcurrentDictionary does this without waiting on a lock.2
1By the way, your variable name is obnoxious too. If it were not for Servy's comment I may have missed the fact that we were talking about a Dictionary as opposed to a List. In fact, based on the Contains call I first thought we were talking about a List.
2On the ConcurrentDictionary readers are completely lock free. However, writers always take a lock (adds and updates that is; the remove operation is still lock free). This includes the GetOrAdd function. The difference is that the data structure maintains several possible lock options so in most cases there is little or no lock contention. That is why this data structure is said to be "low lock" or "concurrent" as opposed to "lock free".
You can first do a non-locking check, but if you want to be thread-safe you need to repeat the check again within the lock. This way you don't lock unless you have to and ensure thread safety.
Dictionary<string, string> list = new Dictionary<string, string>();
object lockObj = new object();
public void MyMethod(string a) {
if (list.Contains(a))
return;
lock (lockObj) {
if (!list.Contains(a)){
list.Add(a,"someothervalue");
}
}
}
I have written the following code:
static readonly object failedTestLock = new object();
public static Dictionary<string, Exception> FailedTests
{
get
{
lock (failedTestLock)
{
return _failedTests;
}
}
set
{
lock (failedTestLock)
{
_failedTests = value;
}
}
}
public void RunTest(string testName)
{
try
{
//Run a test
}
catch (Exception exception)
{
// ?? Is this correct / threadsafe?
FailedTests.Add(testName, exception);
}
}
QUESTION:
Is this a correct manner to safely add the failed test to the Dictionary?
Is this threadsafe?
Is FailedTests.Add called INSIDE the lock or OUTSIDE the lock?
Can you explain why this is correct/threadsafe or why not?
Thanks in advance
The fundamental problem with the code above is that it only locks access to _failedTests when a thread is getting the dictionary or setting it. Only one thread can get a reference to the dictionary at a time, but once a thread has a reference to the dictionary, it can read and manipulate it without being constrained by locks.
Is this a correct manner to safely add
the failed test to the Dictionary?
No, not if two threads are trying to add to the dictionary at the same time. Nor if you expect reads and writes to happen in a particular order.
Is this threadsafe?
It depends what you mean by threadsafe, but no, not by any reasonable definition.
Is FailedTests.Add called INSIDE the
lock or OUTSIDE the lock?
The dictionary retrieval (the get accessor) happens inside a lock. This code calls Add after releasing the lock.
Can you explain why this is correct/threadsafe or why not?
If multiple threads operate on your dictionary at the same time, you can't predict the order in which those threads will change its contents and you can't control when reads will occur.
This is not thread-safe access to a dictionary, because only the property access that returns the dictionary object is thread-safe, but you are not synchronizing the call to the Add method. Consider using ConcurrentDictionary<string,Exception> in this case, or synchronize calls to Add manually.
I don;t thinks this is threadsafe, because the lock is kept only in the very brief moment where the pointer to the collection is returned. When you Add to the collection there is no lock so if two threads try to add at the same time you'll get a nasty error.
So you should lock around the FailedTest.Add code.
You may also want to look into concurrent collections, they might provide what you need.
Regards GJ
The call to Add() is outside the locks.
You can solve it by writing your own Add() method to replace the property.
In the following code:
public class StringCache
{
private readonly object lockobj = new object();
private readonly Dictionary<int, string> cache = new Dictionary<int, string>();
public string GetMemberInfo(int key)
{
if (cache.ContainsKey(key))
return cache[key];
lock (lockobj)
{
if (!cache.ContainsKey(key))
cache[key] = GetString(key);
}
return cache[key];
}
private static string GetString(int key)
{
return "Not Important";
}
}
1) Is ContainsKey thread safe? IOW, what happens if that method is executing when another thread is adding something to the dictionary?
2) For the first return cache[key], is there any chance that it could return a garbled value?
TIA,
MB
The inherent thread safety of ContainsKey doesn't matter, since there is no synchronization between ContainsKey & cache[key].
For example:
if (cache.ContainsKey(key))
// Switch to another thread, which deletes the key.
return cache[key];
MSDN is pretty clear on this point:
To allow the collection to be accessed
by multiple threads for reading and
writing, you must implement your own
synchronization.
For more info, JaredPar posted a great blog entry at http://blogs.msdn.com/jaredpar/archive/2009/02/11/why-are-thread-safe-collections-so-hard.aspx on thread-safe collections.
No, ContainsKey is not thread-safe if you're writing values while you're trying to read.
Yes, there is a chance you could get back invalid results -- but you'll probably start seeing exceptions first.
Take a look at the ReaderWriterLockSlim for locking in situations like this -- it's built to do this kind of stuff.
Here's what it says in the MSDN documentation:
Public static (Shared in Visual Basic) members of this type are thread safe. Any instance members are not guaranteed to be thread safe.
A Dictionary<(Of <(TKey, TValue>)>)
can support multiple readers
concurrently, as long as the
collection is not modified. Even so,
enumerating through a collection is
intrinsically not a thread-safe
procedure. In the rare case where an
enumeration contends with write
accesses, the collection must be
locked during the entire enumeration.
To allow the collection to be accessed
by multiple threads for reading and
writing, you must implement your own
synchronization.
If I'm reading that correctly, I don't believe that it is thread safe.
Dictionary is not Thread-Safe.
If you say that
what happens if that method is
executing when another thread is
adding something to the dictionary?
then I suppose other functions access the cache as well. You need to synchronize accesses(reading and writing) to the cache. Use your lock object in all of these operations.
I believe its not thread safe,
I would suggest go thru below link, it shows implementation of the thread safe dictionary, or its better to develop your own synchronization.
http://lysaghtn.weebly.com/synchronised-dictionary.html