I have a situation in C# where I have a list of simple types. This list can be accessed by multiple threads: entries can be added or removed, and the existence of an entry can be checked. I have encapsulated the list in an object exposing just those three operations so far.
I have a few cases to handle (not exactly the same as the methods I just mentioned).
A thread can just check for the existence of an entry. (simple)
A thread can check for the existence of an entry, and if it doesn't exist, add it.
A thread needs to check whether an entry exists, and if it does, wait until it is removed.
A combination of 2 and 3, where a thread checks for the existence of an entry, if it does exist, it must wait until it is removed before it can then add it itself.
The whole idea is that the existence of an entry signifies a lock. If an entry exists, the object it identifies cannot be changed and code cannot proceed because it is being modified elsewhere.
These may seem like simple novice situations but I'm refreshing myself on concurrency issues and it's making me a bit paranoid, and I'm also not as familiar with C#'s concurrency mechanisms.
What would be the best way to handle this? Am I totally off? Should check and add (test and set?) be combined into a fourth atomic operation? Would I simply be adding lock blocks to my methods where the list is accessed?
Also, is it possible to unit test this kind of thing (not the simple operations, the concurrency situations)?
Unit testing will certainly be hard.
This can all be done reasonably simply with the "native" concurrency mechanisms in .NET: lock statements and Monitor.Wait/Monitor.PulseAll. Unless you have a separate monitor per item though, you're going to need to wake all the threads up whenever anything is removed - otherwise you won't be able to tell the "right" thread to wake up.
If it really is just a set of items, you might want to use HashSet<T> instead of List<T> to represent the collection, by the way - nothing you've mentioned is to do with ordering.
Sample code, assuming that a set is okay for you:
using System;
using System.Collections.Generic;
using System.Threading;
public class LockCollection<T>
{
private readonly HashSet<T> items = new HashSet<T>();
private readonly object padlock = new object();
public bool Contains(T item)
{
lock (padlock)
{
return items.Contains(item);
}
}
public bool Add(T item)
{
lock (padlock)
{
// HashSet<T>.Add does what you want already :)
// Note that it will return true if the item
// *was* added (i.e. !Contains(item))
return items.Add(item);
}
}
public void WaitForNonExistence(T item)
{
lock (padlock)
{
while (items.Contains(item))
{
Monitor.Wait(padlock);
}
}
}
public void WaitForAndAdd(T item)
{
lock (padlock)
{
WaitForNonExistence(item);
items.Add(item);
}
}
public void Remove(T item)
{
lock (padlock)
{
if (items.Remove(item))
{
Monitor.PulseAll(padlock);
}
}
}
}
(Completely untested, admittedly. You might also want to specify timeouts for the waiting code...)
While #1 may be the simplest to write, it's essentially a useless method. Unless you are holding onto the same lock after finishing a query for "existence of an entry", you are actually returning "existence of an entry at some point in the past". It doesn't give you any information about the current existence of the entry.
In between the discovery of a value in the list then doing any operation to retrieve, remove the value, another thread could come and remove it for you.
Contains operations on a concurrent list should be combined with the operation you plan on doing in the case of true/false existence of that check. For instance TestAdd() or TestRemove() is much safer than Contains + Add or Contains + Remove
here is a proper, concurrent, thread-safe, parallelisable concurrent list implementation
http://www.deanchalk.me.uk/post/Task-Parallel-Concurrent-List-Implementation.aspx
There is a product for finding race conditions and suchlike in unit tests. It's called TypeMock Racer. I can't say anything for or against its effectiveness, though. :)
Related
I have a key to task mapping and I need to run the task only if the task for the given is not already running. Pseudo code follows. I believe there is lot of scope for improvement. I'm locking on the map and hence almost serializing access to CacheFreshener. Is there a better way of doing this? We know that when I'm trying to lock a key k1, there is no point in cache freshener call for key k2 waiting for lock.
class CacheFreshener
{
private ConcurrentDictionary<string,bool> lockMap;
public RefreshData(string key, Func<string, bool> cacheMissAction)
{
lock(lockMap)
{
if (lockMap.ContainsKey(key))
{
// no-op
return;
}
else
{
lockMap.Add(key, true);
}
}
// if you are here means task is not already present
cacheMissAction(key);
lock(lockMap) // Do we need to lock here??
{
lockMap.Remove(key);
}
}
}
As requested, here is an elaborated explanation of what I was getting at relative to my comments…
The basic issue here seems to be the question of concurrency, i.e. two or more threads accessing the same object at a time. This is the scenario ConcurrentDictionary is designed for. If you use the IDictionary methods of ContainsKey() and Add() separately, then you would need explicit synchronization (but only for that operation…in this particular scenario it wouldn't strictly be needed when calling Remove()) to ensure these are performed as a single atomic operation. But the ConcurrentDictionary class anticipates this need, and includes the TryAdd() method to accomplish the same, without the explicit synchronization.
<aside>
It is not entirely clear to me the intent behind the code example as given. The code appears to be meant to only store an object in the "cache" for the duration of the invocation of the cacheMissAction delegate. The key is removed immediately after. So it does seem like it's not really caching anything per se. It just prevents more than one thread from being in the process of invoking cacheMissAction at a time (subsequent threads will fail to invoke it, but also cannot count on it having completed by the time their call to the RefreshData() method has completed).
</aside>
But taking the code example as given, it's clear that no explicit locking is actually required. The ConcurrentDictionary class already provides thread-safe access (i.e. non-corruption of the data structure when used concurrently from multiple threads), and it provides the TryAdd() method as a mechanism for adding a key (and its value, though here that's just always a bool literal of true) to the dictionary that will ensure that only one thread ever has a key in the dictionary at a time.
So we can rewrite the code to look like this instead and accomplish the same goal:
private ConcurrentDictionary<string,bool> lockMap;
public RefreshData(string key, Func<string, bool> cacheMissAction)
{
if (!lockMap.TryAdd(key, true))
{
return;
}
// if you are here means task was not already present
cacheMissAction(key);
lockMap.Remove(key);
}
No lock statement is needed for either the add or remove, as the TryAdd() handles the entire "check for key and add if not present" operation atomically.
I will note that using a dictionary to do the job of a set could be considered inefficient. If the collection is likely not to be large, it's no big deal, but I do find it odd that Microsoft chose to make the same mistake they made originally when in the pre-generics days you had to use the non-generic dictionary object Hashtable to store a set, before HashSet<T> came along. Now we have all these easy-to-use classes in System.Collections.Concurrent, but no thread-safe implementation of ISet<T> in there. Sigh…
That said, if you do prefer a somewhat more efficient approach in terms of storage (this is not necessarily a faster implementation, depending on the concurrent access patterns of the object), something like this would work as an alternative:
private HashSet<string> lockSet;
private readonly object _lock = new object();
public RefreshData(string key, Func<string, bool> cacheMissAction)
{
lock (_lock)
{
if (!lockSet.Add(key))
{
return;
}
}
// if you are here means task was not already present
cacheMissAction(key);
lock (_lock)
{
lockSet.Remove(key);
}
}
In this case, you do need the lock statement, because the HashSet<T> class is not inherently thread-safe. This is of course very similar to your original implementation, just using the more set-like semantics of HashSet<T> instead.
If a class has an array, it doesn't really matter what of. Now one thread is adding data to said array, while another thread needs to process the data that is already in it. With my limited knowledge of multithreading, how could this work? The first problem I can think of is if an item is added while the other thread is processing what's still there. At first I thought that wouldn't be a problem, the processor thread would get it next time it processed, but then I realized that while the processor thread removes items it's already processed, the adding thread would not receive this change, possibly (?) wreaking havoc. Is there any good way to implement this behavior?
What you've described is basically the Reader Writers Problem. If you want to take care of multithreading, you're either going to need a concurrent collection, or use of a lock. The simplest implementation of a lock would just be locking an object
private Object myLock = new Object();
public MyClass ReadFromSharedArray()
{
lock(myLock)
{
//do whatever here
}
}
public void WriteToSharedArray(MyClass data)
{
lock(myLock)
{
//Do whatever here
}
}
There are better locks such as ReadWriterSlim locks but this sort of basic implementation should be a good starting point.
Also you mentioned adding/removing from arrays, I'm assuming you meant Lists (or better yet a Queue) - there's a ConcurrentQueuewhich could be a good replacement.
Consider the following code:
Dictionary<string, string> list = new Dictionary<string, string>();
object lockObj = new object();
public void MyMethod(string a) {
if (list.Contains(a))
return;
lock (lockObj) {
list.Add(a,"someothervalue");
}
}
Assuming I'm calling MyMethod("mystring") from different threads concurrently.
Would it be possible for more than one thread (we'll just take it as two) enter the if (!list.Contains(a)) statement in the same time (with a few CPU cycles differences), both threads get evaluated as false and one thread enters the critical region while another gets locked outside, so the second thread enters and add "mystring" to the list again after the first thread exits, resulting in the dictionary trying to add a duplicate key?
No, it's not thread-safe. You need the lock around the list.Contains too as it is possible for a thread to be switched out and back in again between the the if test and adding the data. Another thread may have added data in the meantime.
You need to lock the entire operation (check and add) or multiple threads may attempt to add the same value.
I would recommend using the ConcurrentDictionary(TKey, TValue) since it is designed to be thread safe.
private readonly ConcurrentDictionary<string, string> _items
= new ConcurrentDictionary<string, string>();
public void MyMethod(string item, string value)
{
_items.AddOrUpdate(item, value, (i, v) => value);
}
You need to lock around the whole statement. It's possible for you to run into issues on the .Contains portion (the way your code is now)
You should check the list after locking. e.g.
if (list.Contains(a))
return;
lock (lockObj) {
if (list.Contains(a))
return;
list.Add(a);
}
}
private Dictionary<string, string> list = new Dictionary<string, string>();
public void MyMethod(string a) {
lock (list) {
if (list.Contains(a))
return;
list.Add(a,"someothervalue");
}
}
Check out this guide to locking, it's good
A few guidelines to bear in mind
Generally lock around a private static object when locking on multiple writeable values
Do not lock on things with scope outside the class or local method such as lock(this), which could lead to deadlocks!
You may lock on the object being changed if it is the only concurrently accessed object
Ensure the object you lock is not null!
You can only lock on reference types
I am going to assume that you meant write ContainsKey instead of Contains. Contains on a Dictionary is explicitly implemented so it is not accessible via the type you declared.1
Your code is not safe. The reason is because there is nothing preventing ContainsKey and Add from executing at the same time. There are actually some quite remarkable failure scenarios that this would introduce. Because I looked at how the Dictionary is implemented I can see that your code could cause a situation where data structure contains duplicates. And I mean it literally contains duplicates. The exception will not necessarily be thrown. The other failure scenarios just keep getting stranger and stranger, but I will not go into those here.
One trivial modification to your code might involve a variation of the double-checked locking pattern.
public void MyMethod(string a)
{
if (!dictionary.ContainsKey(a))
{
lock (dictionary)
{
if (!dictionary.ContainsKey(a))
{
dictionary.Add(a, "someothervalue");
}
}
}
}
This, of course, is not any safer for the reason I already stated. Actually, the double-checked locking pattern is notoriously difficult to get right in all but the simplest cases (like the canonical implementation of a singleton). There are many variations on this theme. You can try it with TryGetValue or the default indexer, but ultimately all of these variations are just dead wrong.
So how could this be done correctly without taking a lock? You could try ConcurrentDictionary. It has the method GetOrAdd which is really useful in these scenarios. Your code would look like this.
public void MyMethod(string a)
{
// The variable 'dictionary' is a ConcurrentDictionary.
dictionary.GetOrAdd(a, "someothervalue");
}
That is all there is to it. The GetOrAdd function will check to see if the item exists. If it does not then it will be added. Otherwise, it will leave the data structure alone. This is all done in a thread-safe manner. In most cases the ConcurrentDictionary does this without waiting on a lock.2
1By the way, your variable name is obnoxious too. If it were not for Servy's comment I may have missed the fact that we were talking about a Dictionary as opposed to a List. In fact, based on the Contains call I first thought we were talking about a List.
2On the ConcurrentDictionary readers are completely lock free. However, writers always take a lock (adds and updates that is; the remove operation is still lock free). This includes the GetOrAdd function. The difference is that the data structure maintains several possible lock options so in most cases there is little or no lock contention. That is why this data structure is said to be "low lock" or "concurrent" as opposed to "lock free".
You can first do a non-locking check, but if you want to be thread-safe you need to repeat the check again within the lock. This way you don't lock unless you have to and ensure thread safety.
Dictionary<string, string> list = new Dictionary<string, string>();
object lockObj = new object();
public void MyMethod(string a) {
if (list.Contains(a))
return;
lock (lockObj) {
if (!list.Contains(a)){
list.Add(a,"someothervalue");
}
}
}
Suppose that I have a Dictionary<string, string>. The dictionary is declared as public static in my console program.
If I'm working with threads and I want to do foreach on this Dictionary from one thread but at the same time another thread want to add item to the dictionary. This would cause a bug here because we can't modify our Dictionary while we are running on it with a foreach loop in another thread.
To bypass this problem I created a lock statement on the same static object on each operation on the dictionary.
Is this the best way to bypass this problem? My Dictionary can be very big and I can have many threads that want to foreach on it. As it is currently, things can be very slow.
Try using a ConcurrentDictionary<TKey, TValue>, which is designed for this kind of scenario.
There's a nice tutorial here on how to use it.
The big question is: Do you need the foreach to be a snapshot?
If the answer is "no", then use a ConcurrentDictionary and you will probably be fine. (The one remaining question is whether the nature of your inserts and reads hit the striped locks in a bad way, but if that was the case you'd be finding normal reads and writes to the dictionary even worse).
However, because it's GetEnumerator doesn't provide a snapshot, it will not be enumerating the same start at the beginning as it is at the end. It could miss items, or duplicate items. The question is whether that's a disaster to you or not.
If it would be a disaster if you had duplicates, but not otherwise, then you can filter out duplicates with Distinct() (whether keyed on the keys or both the key and value, as required).
If you really need it to be a hard snapshot, then take the following approach.
Have a ConcurrentDictionary (dict) and a ReaderWriterLockSlim (rwls). On both reads and writes obtain a reader lock (yes even though you're writing):
public static void AddToDict(string key, string value)
{
rwls.EnterReadLock();
try
{
dict[key] = value;
}
finally
{
rwls.ExitReadLock();
}
}
public static bool ReadFromDict(string key, out string value)
{
rwls.EnterReadLock();
try
{
return dict.TryGetValue(key, out value);
}
finally
{
rwls.ExitReadLock();
}
}
Now, when we want to enumerate the dictionary, we acquire the write lock (even though we're reading):
public IEnumerable<KeyValuePair<string, string>> EnumerateDict()
{
rwls.EnterWriteLock();
try
{
return dict.ToList();
}
finally
{
rwls.ExitWriteLock();
}
}
This way we obtain the shared lock for reading and writing, because ConcurrentDictionary deals with the conflicts involved in that for us. We obtain the exclusive lock for enumerating, but just for long enough to obtain a snapshot of the dictionary in a list, which is then used only in that thread and not shared with any other.
With .NET 4 you get a fancy new ConcurrentDictionary. I think there are some .NET 3.5-based implementations floating around.
Yes, you will have a problem updating the global dictionary while an enumeration is running in another thread.
Solutions:
Require all users of the dictionary to acquire a mutex lock before accessing the object, and release the lock afterwards.
Use .NET 4.0's ConcurrentDictionary class.
I need to create a thread safe list of items to be added to a lucene index.
Is the following thread safe?
public sealed class IndexQueue
{
static readonly IndexQueue instance = new IndexQueue();
private List<string> items = new List<string>();
private IndexQueue() { }
public static IndexQueue Instance {
get { return instance; }
}
private object padlock = new object();
public void AddItem(string item) {
lock (padlock) {
items.Add(item);
}
}
}
Is it necessary to lock even when getting items from the internal list?
The idea is that we will then have a separate task running to grab the items from indexqueue and add them to the lucene index.
Thanks
Ben
Your implementation seems thread-safe, although you will need to lock when reading from items as well - you can not safely read if there is a concurrent Add operation. If you ever enumerate, you will need locking around that as well and that will need to live as long as the enumerator.
If you can use .net 4, I'd strongly suggest looking at the System.Collections.Concurrent namespace. It has some well tested and pretty performant collections that are thread-safe and in fact optimized around multiple-thread access.
Is it necessary to lock even when getting items from the internal list?
The List class is not thread-safe when you make modifications. It's necessary to lock if:
You use a single instance of the class from multiple threads.
The contents of the list can change while you are modifying or reading from the list.
Presumably the first is true otherwise you wouldn't be asking the question. The second is clearly true because the Add method modifies the list. So, yes, you need it.
When you add a method to your class that allows you to read back the items it is also necessary to lock, and importantly you must use the same lock object as you did in AddItem.
Yes; while retrieval is not an intrinsically unsafe operation, if you're also writing to the list, then you run the risk of retrieving in the middle of a write.
This is especially true if this will operate like a traditional queue, where a retrieval will actually remove the retrieved value from the list.