I have a key to task mapping and I need to run the task only if the task for the given is not already running. Pseudo code follows. I believe there is lot of scope for improvement. I'm locking on the map and hence almost serializing access to CacheFreshener. Is there a better way of doing this? We know that when I'm trying to lock a key k1, there is no point in cache freshener call for key k2 waiting for lock.
class CacheFreshener
{
private ConcurrentDictionary<string,bool> lockMap;
public RefreshData(string key, Func<string, bool> cacheMissAction)
{
lock(lockMap)
{
if (lockMap.ContainsKey(key))
{
// no-op
return;
}
else
{
lockMap.Add(key, true);
}
}
// if you are here means task is not already present
cacheMissAction(key);
lock(lockMap) // Do we need to lock here??
{
lockMap.Remove(key);
}
}
}
As requested, here is an elaborated explanation of what I was getting at relative to my comments…
The basic issue here seems to be the question of concurrency, i.e. two or more threads accessing the same object at a time. This is the scenario ConcurrentDictionary is designed for. If you use the IDictionary methods of ContainsKey() and Add() separately, then you would need explicit synchronization (but only for that operation…in this particular scenario it wouldn't strictly be needed when calling Remove()) to ensure these are performed as a single atomic operation. But the ConcurrentDictionary class anticipates this need, and includes the TryAdd() method to accomplish the same, without the explicit synchronization.
<aside>
It is not entirely clear to me the intent behind the code example as given. The code appears to be meant to only store an object in the "cache" for the duration of the invocation of the cacheMissAction delegate. The key is removed immediately after. So it does seem like it's not really caching anything per se. It just prevents more than one thread from being in the process of invoking cacheMissAction at a time (subsequent threads will fail to invoke it, but also cannot count on it having completed by the time their call to the RefreshData() method has completed).
</aside>
But taking the code example as given, it's clear that no explicit locking is actually required. The ConcurrentDictionary class already provides thread-safe access (i.e. non-corruption of the data structure when used concurrently from multiple threads), and it provides the TryAdd() method as a mechanism for adding a key (and its value, though here that's just always a bool literal of true) to the dictionary that will ensure that only one thread ever has a key in the dictionary at a time.
So we can rewrite the code to look like this instead and accomplish the same goal:
private ConcurrentDictionary<string,bool> lockMap;
public RefreshData(string key, Func<string, bool> cacheMissAction)
{
if (!lockMap.TryAdd(key, true))
{
return;
}
// if you are here means task was not already present
cacheMissAction(key);
lockMap.Remove(key);
}
No lock statement is needed for either the add or remove, as the TryAdd() handles the entire "check for key and add if not present" operation atomically.
I will note that using a dictionary to do the job of a set could be considered inefficient. If the collection is likely not to be large, it's no big deal, but I do find it odd that Microsoft chose to make the same mistake they made originally when in the pre-generics days you had to use the non-generic dictionary object Hashtable to store a set, before HashSet<T> came along. Now we have all these easy-to-use classes in System.Collections.Concurrent, but no thread-safe implementation of ISet<T> in there. Sigh…
That said, if you do prefer a somewhat more efficient approach in terms of storage (this is not necessarily a faster implementation, depending on the concurrent access patterns of the object), something like this would work as an alternative:
private HashSet<string> lockSet;
private readonly object _lock = new object();
public RefreshData(string key, Func<string, bool> cacheMissAction)
{
lock (_lock)
{
if (!lockSet.Add(key))
{
return;
}
}
// if you are here means task was not already present
cacheMissAction(key);
lock (_lock)
{
lockSet.Remove(key);
}
}
In this case, you do need the lock statement, because the HashSet<T> class is not inherently thread-safe. This is of course very similar to your original implementation, just using the more set-like semantics of HashSet<T> instead.
Related
Consider the following code:
Dictionary<string, string> list = new Dictionary<string, string>();
object lockObj = new object();
public void MyMethod(string a) {
if (list.Contains(a))
return;
lock (lockObj) {
list.Add(a,"someothervalue");
}
}
Assuming I'm calling MyMethod("mystring") from different threads concurrently.
Would it be possible for more than one thread (we'll just take it as two) enter the if (!list.Contains(a)) statement in the same time (with a few CPU cycles differences), both threads get evaluated as false and one thread enters the critical region while another gets locked outside, so the second thread enters and add "mystring" to the list again after the first thread exits, resulting in the dictionary trying to add a duplicate key?
No, it's not thread-safe. You need the lock around the list.Contains too as it is possible for a thread to be switched out and back in again between the the if test and adding the data. Another thread may have added data in the meantime.
You need to lock the entire operation (check and add) or multiple threads may attempt to add the same value.
I would recommend using the ConcurrentDictionary(TKey, TValue) since it is designed to be thread safe.
private readonly ConcurrentDictionary<string, string> _items
= new ConcurrentDictionary<string, string>();
public void MyMethod(string item, string value)
{
_items.AddOrUpdate(item, value, (i, v) => value);
}
You need to lock around the whole statement. It's possible for you to run into issues on the .Contains portion (the way your code is now)
You should check the list after locking. e.g.
if (list.Contains(a))
return;
lock (lockObj) {
if (list.Contains(a))
return;
list.Add(a);
}
}
private Dictionary<string, string> list = new Dictionary<string, string>();
public void MyMethod(string a) {
lock (list) {
if (list.Contains(a))
return;
list.Add(a,"someothervalue");
}
}
Check out this guide to locking, it's good
A few guidelines to bear in mind
Generally lock around a private static object when locking on multiple writeable values
Do not lock on things with scope outside the class or local method such as lock(this), which could lead to deadlocks!
You may lock on the object being changed if it is the only concurrently accessed object
Ensure the object you lock is not null!
You can only lock on reference types
I am going to assume that you meant write ContainsKey instead of Contains. Contains on a Dictionary is explicitly implemented so it is not accessible via the type you declared.1
Your code is not safe. The reason is because there is nothing preventing ContainsKey and Add from executing at the same time. There are actually some quite remarkable failure scenarios that this would introduce. Because I looked at how the Dictionary is implemented I can see that your code could cause a situation where data structure contains duplicates. And I mean it literally contains duplicates. The exception will not necessarily be thrown. The other failure scenarios just keep getting stranger and stranger, but I will not go into those here.
One trivial modification to your code might involve a variation of the double-checked locking pattern.
public void MyMethod(string a)
{
if (!dictionary.ContainsKey(a))
{
lock (dictionary)
{
if (!dictionary.ContainsKey(a))
{
dictionary.Add(a, "someothervalue");
}
}
}
}
This, of course, is not any safer for the reason I already stated. Actually, the double-checked locking pattern is notoriously difficult to get right in all but the simplest cases (like the canonical implementation of a singleton). There are many variations on this theme. You can try it with TryGetValue or the default indexer, but ultimately all of these variations are just dead wrong.
So how could this be done correctly without taking a lock? You could try ConcurrentDictionary. It has the method GetOrAdd which is really useful in these scenarios. Your code would look like this.
public void MyMethod(string a)
{
// The variable 'dictionary' is a ConcurrentDictionary.
dictionary.GetOrAdd(a, "someothervalue");
}
That is all there is to it. The GetOrAdd function will check to see if the item exists. If it does not then it will be added. Otherwise, it will leave the data structure alone. This is all done in a thread-safe manner. In most cases the ConcurrentDictionary does this without waiting on a lock.2
1By the way, your variable name is obnoxious too. If it were not for Servy's comment I may have missed the fact that we were talking about a Dictionary as opposed to a List. In fact, based on the Contains call I first thought we were talking about a List.
2On the ConcurrentDictionary readers are completely lock free. However, writers always take a lock (adds and updates that is; the remove operation is still lock free). This includes the GetOrAdd function. The difference is that the data structure maintains several possible lock options so in most cases there is little or no lock contention. That is why this data structure is said to be "low lock" or "concurrent" as opposed to "lock free".
You can first do a non-locking check, but if you want to be thread-safe you need to repeat the check again within the lock. This way you don't lock unless you have to and ensure thread safety.
Dictionary<string, string> list = new Dictionary<string, string>();
object lockObj = new object();
public void MyMethod(string a) {
if (list.Contains(a))
return;
lock (lockObj) {
if (!list.Contains(a)){
list.Add(a,"someothervalue");
}
}
}
I use a ConcurrentDictioanry<string, HashSet<string>> to access some data across many threads.
I read in this article (scroll down) that the method AddOrUpdate is not executed in the lock, so it could endanger thread-safety.
My code is as follows:
//keys and bar are not the concern here
ConcurrentDictioanry<string, HashSet<string>> foo = new ...;
foreach(var key in keys) {
foo.AddOrUpdate(key, new HashSet<string> { bar }, (key, val) => {
val.Add(bar);
return val;
});
}
Should I enclose the AddOrUpdate call in a lock statement in order to be sure everything is thread-safe?
Locking during AddOrUpdate on its own wouldn't help - you'd still have to lock every time you read from the set.
If you're going to treat this collection as thread-safe, you really need the values to be thread-safe too. You need a ConcurrentSet, ideally. Now that doesn't exist within the framework (unless I've missed something) but you could probably create your own ConcurrentSet<T> which used a ConcurrentDictionary<T, int> (or whatever TValue you like) as its underlying data structure. Basically you'd ignore the value within the dictionary, and just treat the presence of the key as the important part.
You don't need to implement everything within ISet<T> - just the bits you actually need.
You'd then create a ConcurrentDictionary<string, ConcurrentSet<string>> in your application code, and you're away - no need for locking.
You'll need to fix this code, it creates a lot of garbage. You create a new HashSet even if none is required. Use the other overload, the one that accepts the valueFactory delegate. So the HashSet is only created when the key isn't yet present in the dictionary.
The valueFactory might be called multiple times if multiple threads concurrently try to add the same value of key and it is not present. Very low odds but not zero. Only one of these hashsets will be used. Not a problem, creating the HashSet has no side effects that could cause threading trouble, the extra copies just get garbage collected.
The article states that the add delegate is not executed in the dictionary's lock, and that the element you get might not be the element created in that thread by the add delegate. That's not a thread safety issue; the dictionary's state will be consistent and all callers will get the same instance, even if a different instance was created for each of them (and all but one get dropped).
Seems the better answer would be to use Lazy, per this article on the methods that pass in a delegate.
Also another good article Here on Lazy loading the add delegate.
I need to implement the class that should perform locking mechanism in our framework.
We have several threads and they are numbered 0,1,2,3.... We have a static class called ResourceHandler, that should lock these threads on given objects. The requirement is that n Lock() invokes should be realeased by m Release() invokes, where n = [0..] and m = [0..]. So no matter how many locks was performed on single object, only one Release() call is enough to unlock all. Even further if o object is not locked, Release() call should perform nothing. Also we need to know what objects are locked on what threads.
I have this implementation:
public class ResourceHandler
{
private readonly Dictionary<int, List<object>> _locks = new Dictionary<int, List<object>>();
public static ResourceHandler Instance {/* Singleton */}
public virtual void Lock(int threadNumber, object obj)
{
Monitor.Enter(obj);
if (!_locks.ContainsKey(threadNumber)) {_locks.Add(new List<object>());}
_locks[threadNumber].Add(obj);
}
public virtual void Release(int threadNumber, object obj)
{
// Check whether we have threadN in _lock and skip if not
var count = _locks[threadNumber].Count(x => x == obj);
_locks[threadNumber].RemoveAll(x => x == obj);
for (int i=0; i<count; i++)
{
Monitor.Exit(obj);
}
}
// .....
}
Actually what I am worried here about is thread-safety. I'm actually not sure, is it thread-safe or not, and it's a real pain to fix that. Am I doing the task correctly and how can I ensure that this is thread-safe?
Your Lock method locks on the target objects but the _locks dictionary can be accessed by any thread at any time. You may want to add a private lock object for accessing the dictionary (in both the Lock and Release methods).
Also keep in mind that by using such a ResourceHandler it is the responsibility of the rest of the code (the consuming threads) to release all used objects (a regular lock () block for instance covers that problem since whenever you leave the lock's scope, the object is released).
You also may want to use ReferenceEquals when counting the number of times an object is locked instead of ==.
You can ensure this class is thread safe by using a ConcurrentDictionary but, it won't help you with all the problems you will get from trying to develop your own locking mechanism.
There are a number locking mechansims that are already part of the .Net Framework, you should use those.
It sounds like you are going to need to use a combination of these, including Wait Handles to achieve what you want.
EDIT
After reading more carefully, I think you might need an EventWaitHandle
What you have got conceptually looks dangerous; this is bacause calls to Monitor.Enter and Monitor.Exit for them to work as a Lock statement, are reccomended to be encapsulated in a try/finally block, that is to ensure they are executed sequetally. Calling Monitor.Exit before Monitor.Enter will throw an exception.
To avoid these problems (if an exception is thrown, the lock for a given thread may-or-may-not be taken, and if a lock is taken it will not be released, resulting in a leaked lock. I would recomend using one of the options provided in the other answers above. However, if you do want to progress with this mechanism, CLR 4.0 added the following overload to the Monitor.Enter method
public static void Enter (object, ref bool lockTaken);
lockTaken is false if and only if the Enter method throws an exception and the lock was not taken. So, using your two methods using a global bool lockTaken you can create something like (here the example is for a single locker - you will need a Dictionary of List<bool> corresponding to your threads - or event better a Tuple). So in your method Lock you would have something like
bool lockTaken = false;
Monitor.Enter(locker, ref lockTaken);
in the other method Release
if (lockTaken)
Monitor.Exit(locker);
I hope this helps.
Edit: I don't think I fully appreciate your problem, but from what I can gather I would be using a Concurrent Collection. These are fully thead safe. Check out IProducerConsumerCollection<T> and ConcurrentBag<T>. These should facilitate what you want with all thread safter taken care of by the framework (note. a thread safe collection doesn't mean the code it executes is thread safe!). However, using a collection like this, is likely to be far slower than using locks.
IMO you need to use atomic set of functions to make it safe.
http://msdn.microsoft.com/en-us/library/system.threading.mutex.aspx
Mutexes I guess will help u.
Suppose that I have a Dictionary<string, string>. The dictionary is declared as public static in my console program.
If I'm working with threads and I want to do foreach on this Dictionary from one thread but at the same time another thread want to add item to the dictionary. This would cause a bug here because we can't modify our Dictionary while we are running on it with a foreach loop in another thread.
To bypass this problem I created a lock statement on the same static object on each operation on the dictionary.
Is this the best way to bypass this problem? My Dictionary can be very big and I can have many threads that want to foreach on it. As it is currently, things can be very slow.
Try using a ConcurrentDictionary<TKey, TValue>, which is designed for this kind of scenario.
There's a nice tutorial here on how to use it.
The big question is: Do you need the foreach to be a snapshot?
If the answer is "no", then use a ConcurrentDictionary and you will probably be fine. (The one remaining question is whether the nature of your inserts and reads hit the striped locks in a bad way, but if that was the case you'd be finding normal reads and writes to the dictionary even worse).
However, because it's GetEnumerator doesn't provide a snapshot, it will not be enumerating the same start at the beginning as it is at the end. It could miss items, or duplicate items. The question is whether that's a disaster to you or not.
If it would be a disaster if you had duplicates, but not otherwise, then you can filter out duplicates with Distinct() (whether keyed on the keys or both the key and value, as required).
If you really need it to be a hard snapshot, then take the following approach.
Have a ConcurrentDictionary (dict) and a ReaderWriterLockSlim (rwls). On both reads and writes obtain a reader lock (yes even though you're writing):
public static void AddToDict(string key, string value)
{
rwls.EnterReadLock();
try
{
dict[key] = value;
}
finally
{
rwls.ExitReadLock();
}
}
public static bool ReadFromDict(string key, out string value)
{
rwls.EnterReadLock();
try
{
return dict.TryGetValue(key, out value);
}
finally
{
rwls.ExitReadLock();
}
}
Now, when we want to enumerate the dictionary, we acquire the write lock (even though we're reading):
public IEnumerable<KeyValuePair<string, string>> EnumerateDict()
{
rwls.EnterWriteLock();
try
{
return dict.ToList();
}
finally
{
rwls.ExitWriteLock();
}
}
This way we obtain the shared lock for reading and writing, because ConcurrentDictionary deals with the conflicts involved in that for us. We obtain the exclusive lock for enumerating, but just for long enough to obtain a snapshot of the dictionary in a list, which is then used only in that thread and not shared with any other.
With .NET 4 you get a fancy new ConcurrentDictionary. I think there are some .NET 3.5-based implementations floating around.
Yes, you will have a problem updating the global dictionary while an enumeration is running in another thread.
Solutions:
Require all users of the dictionary to acquire a mutex lock before accessing the object, and release the lock afterwards.
Use .NET 4.0's ConcurrentDictionary class.
I have a situation in C# where I have a list of simple types. This list can be accessed by multiple threads: entries can be added or removed, and the existence of an entry can be checked. I have encapsulated the list in an object exposing just those three operations so far.
I have a few cases to handle (not exactly the same as the methods I just mentioned).
A thread can just check for the existence of an entry. (simple)
A thread can check for the existence of an entry, and if it doesn't exist, add it.
A thread needs to check whether an entry exists, and if it does, wait until it is removed.
A combination of 2 and 3, where a thread checks for the existence of an entry, if it does exist, it must wait until it is removed before it can then add it itself.
The whole idea is that the existence of an entry signifies a lock. If an entry exists, the object it identifies cannot be changed and code cannot proceed because it is being modified elsewhere.
These may seem like simple novice situations but I'm refreshing myself on concurrency issues and it's making me a bit paranoid, and I'm also not as familiar with C#'s concurrency mechanisms.
What would be the best way to handle this? Am I totally off? Should check and add (test and set?) be combined into a fourth atomic operation? Would I simply be adding lock blocks to my methods where the list is accessed?
Also, is it possible to unit test this kind of thing (not the simple operations, the concurrency situations)?
Unit testing will certainly be hard.
This can all be done reasonably simply with the "native" concurrency mechanisms in .NET: lock statements and Monitor.Wait/Monitor.PulseAll. Unless you have a separate monitor per item though, you're going to need to wake all the threads up whenever anything is removed - otherwise you won't be able to tell the "right" thread to wake up.
If it really is just a set of items, you might want to use HashSet<T> instead of List<T> to represent the collection, by the way - nothing you've mentioned is to do with ordering.
Sample code, assuming that a set is okay for you:
using System;
using System.Collections.Generic;
using System.Threading;
public class LockCollection<T>
{
private readonly HashSet<T> items = new HashSet<T>();
private readonly object padlock = new object();
public bool Contains(T item)
{
lock (padlock)
{
return items.Contains(item);
}
}
public bool Add(T item)
{
lock (padlock)
{
// HashSet<T>.Add does what you want already :)
// Note that it will return true if the item
// *was* added (i.e. !Contains(item))
return items.Add(item);
}
}
public void WaitForNonExistence(T item)
{
lock (padlock)
{
while (items.Contains(item))
{
Monitor.Wait(padlock);
}
}
}
public void WaitForAndAdd(T item)
{
lock (padlock)
{
WaitForNonExistence(item);
items.Add(item);
}
}
public void Remove(T item)
{
lock (padlock)
{
if (items.Remove(item))
{
Monitor.PulseAll(padlock);
}
}
}
}
(Completely untested, admittedly. You might also want to specify timeouts for the waiting code...)
While #1 may be the simplest to write, it's essentially a useless method. Unless you are holding onto the same lock after finishing a query for "existence of an entry", you are actually returning "existence of an entry at some point in the past". It doesn't give you any information about the current existence of the entry.
In between the discovery of a value in the list then doing any operation to retrieve, remove the value, another thread could come and remove it for you.
Contains operations on a concurrent list should be combined with the operation you plan on doing in the case of true/false existence of that check. For instance TestAdd() or TestRemove() is much safer than Contains + Add or Contains + Remove
here is a proper, concurrent, thread-safe, parallelisable concurrent list implementation
http://www.deanchalk.me.uk/post/Task-Parallel-Concurrent-List-Implementation.aspx
There is a product for finding race conditions and suchlike in unit tests. It's called TypeMock Racer. I can't say anything for or against its effectiveness, though. :)