Compare performance of Concurrent Dictionary with Dictionary+Locks - c#

I wanted to measure the performance of Concurrent Dictionary vs Dictionary+Locks in a multithreaded environment. So I created my own SyncDict class of type<int,int[]>. Whenever there is a key match, it adds the int[] array value to itself, it also locks the whole dictionary with ReaderWriterLockSlim while updating the value.
I replicated the code through Concurrent Dictionary and I am mainly using AddOrUpdate() method.
Whole console app code can be found here https://dotnetfiddle.net/1kFbGy Just copy paste the code in console app to run. It will not run fiddle
After running both codes with the same inputs I see a considerable amount of difference in running time. For example for one particular run on my machine Concurrent dictionary took 4.5 seconds vs SyncDict took less than 1 second.
I would like to know any thoughts / suggestions explaining the above running time. Is there anything wrong am I doing here.
class SyncDict<TKey>
{
private ReaderWriterLockSlim cacheLock;
private Dictionary<TKey, int[]> dictionary;
public SyncDict()
{
cacheLock = new ReaderWriterLockSlim();
dictionary = new Dictionary<TKey, int[]>();
}
public Dictionary<TKey, int[]> Dictionary
{
get { return dictionary; }
}
public int[] Read(TKey key)
{
cacheLock.EnterReadLock();
try
{
return dictionary[key];
}
finally
{
cacheLock.ExitReadLock();
}
}
public void Add(TKey key, int[] value)
{
cacheLock.EnterWriteLock();
try
{
dictionary.Add(key, value);
}
finally
{
cacheLock.ExitWriteLock();
}
}
public AddOrUpdateStatus AddOrUpdate(TKey key, int[] value)
{
cacheLock.EnterUpgradeableReadLock();
try
{
int[] result = null;
if (dictionary.TryGetValue(key, out result))
{
if (result == value)
return AddOrUpdateStatus.Unchanged;
else
{
cacheLock.EnterWriteLock();
try
{
Parallel.For(0, value.Length,
(i, state) =>
{
result[i] = result[i] + value[i];
});
}
finally
{
cacheLock.ExitWriteLock();
}
return AddOrUpdateStatus.Updated;
}
}
else
{
Add(key, value);
return AddOrUpdateStatus.Added;
}
}
finally
{
cacheLock.ExitUpgradeableReadLock();
}
}
public void Delete(TKey key)
{
cacheLock.EnterWriteLock();
try
{
dictionary.Remove(key);
}
finally
{
cacheLock.ExitWriteLock();
}
}
public enum AddOrUpdateStatus
{
Added,
Updated,
Unchanged
};
}

There are multiple problems with your test.
1) You are populating a dictionary with ~150.000 different keys, all with the same value.
2) The shared value of all entries is an array of 30.000 integers, and you are updating every element of it at half of the calls to AddOrUpdate. But this only happens when you test the ConcurrentDictionary. At the SyncDict test there is a condition if (result == value) return AddOrUpdateStatus.Unchanged that skips all updates (because the value is shared).
3) You are feeding the two tests with different random inputs.
4) You are updating the array using a Parallel.For loop, while already being in an outer Parallel.For loop, over-parallelizing your workload.
5) When calling the method AddOrUpdate you ignore the documented fact that the updateValueFactory function is called in a thread-unsafe manner, and since multiple AddOrUpdate are executed synchronously and the value is shared, you are corrupting the state of the value.
The updateValueFactory delegate is called outside the locks to avoid the problems that can arise from executing unknown code under a lock.
ConcurrentDictionary.AddOrUpdate Method
I suggest that you modify your test to reflect the intended use of the ConcurrentDictionary class.

Related

Is my C# implementation of a dictionary with item expiration safe?

I'm in my early days of working with C# and threading and I'm in need of a data structure that can store key-value pairs like a dictionary, but after a certain time after being added I want the values to "expire", or get removed from the dictionary.
I have come up with an implementation that seems to work, but I was wondering if there might be any problems to it when used in a multi threaded environment.
Are there any direct issues with my current implementation?
As all usage of the dictionary is protected by the semaphore, is this implementation to be considered thread-safe out of the box?
public class TokenCache<TKey, TValue>
where TKey : IEquatable<TKey>
where TValue : class
{
private readonly TimeSpan expirationTime;
private readonly SemaphoreSlim semaphore;
private readonly Dictionary<TKey, TValue> cache;
public TokenCache(TimeSpan expirationTime)
{
this.expirationTime = expirationTime;
this.semaphore = new SemaphoreSlim(1, 1);
this.cache = new Dictionary<TKey, TValue>();
}
public void Add(TKey key, TValue value)
{
this.semaphore.Wait();
try
{
this.cache[key] = value;
Task
.Run(() => Thread.Sleep(this.expirationTime))
.ContinueWith(t =>
{
this.Remove(key);
});
}
finally
{
this.semaphore.Release();
}
}
public void Remove(TKey key)
{
this.semaphore.Wait();
try
{
this.cache.Remove(key);
}
finally
{
this.semaphore.Release();
}
}
public TValue GetValueOrDefault(TKey key)
{
this.semaphore.Wait();
try
{
if (this.cache.TryGetValue(key, out var value))
{
return value;
}
return null;
}
finally
{
this.semaphore.Release();
}
}
}
Problems I see in no particular order (there might be more though):
Task.Run
Task.Run(() => Thread.Sleep(...)) is a very bad idea. The ThreadPool, which contains the threads that run Tasks has a limited amount of threads (it can grow, but it grows very slowly!!) and by calling Task.Run(() => Thread.Sleep(...)) instead of Task.Delay(...) you're wasting a ThreadPool thread for the duration of the wait. Task.Delay(...) doesn't use up a thread.
Wrong Item Expiration
Another problem is your Add method. Think about the following:
You create an instance of TokenCache, for example via new TokenCache<int, string>(TimeSpan.FromMinutes(1))
You call Add(10, "ten")
After 50 seconds, you call Remove(10)
After 5 more seconds, you call Add(10, "The Number 10");
Because of the way you've written your Add method, you're now going to remove the value "The Number 10" after only 5 seconds
Pointless Semaphore
Also, I don't really see the point of using a SemaphoreSlim. In this scenario you could just use the lock keyword on something like private readonly object locker = new();

Caching Objects with Expensive Build & Allowing Updates

I am working on a caching manager for a MVC web application. For this app, I have some very large objects that are costly to build. During the application lifetime, I may need to create several of these objects, based upon user requests. When built, the user will be working with the data in the objects, resulting in many read actions. On occasion, I will need to update some minor data points in the cached object (create & replace would take too much time).
Below is a cache manager class that I have created to help me in this. Beyond basic thread safety, my goals were to:
Allow multiple reads against a object, but lock all reads to that object upon an
update request
Ensure that the object is only ever created 1 time if
it does not already exist (keep in mind that its a long build
action).
Allow the cache to store many objects, and maintain a lock
per object (rather than one lock for all objects).
public class CacheManager
{
private static readonly ObjectCache Cache = MemoryCache.Default;
private static readonly ConcurrentDictionary<string, ReaderWriterLockSlim>
Locks = new ConcurrentDictionary<string, ReaderWriterLockSlim>();
private const int CacheLengthInHours = 1;
public object AddOrGetExisting(string key, Func<object> factoryMethod)
{
Locks.GetOrAdd(key, new ReaderWriterLockSlim());
var policy = new CacheItemPolicy
{
AbsoluteExpiration = DateTimeOffset.Now.AddHours(CacheLengthInHours)
};
return Cache.AddOrGetExisting
(key, new Lazy<object>(factoryMethod), policy);
}
public object Get(string key)
{
var targetLock = AcquireLockObject(key);
if (targetLock != null)
{
targetLock.EnterReadLock();
try
{
var cacheItem = Cache.GetCacheItem(key);
if(cacheItem!= null)
return cacheItem.Value;
}
finally
{
targetLock.ExitReadLock();
}
}
return null;
}
public void Update<T>(string key, Func<T, object> updateMethod)
{
var targetLock = AcquireLockObject(key);
var targetItem = (Lazy<object>) Get(key);
if (targetLock == null || key == null) return;
targetLock.EnterWriteLock();
try
{
updateMethod((T)targetItem.Value);
}
finally
{
targetLock.ExitWriteLock();
}
}
private ReaderWriterLockSlim AcquireLockObject(string key)
{
return Locks.ContainsKey(key) ? Locks[key] : null;
}
}
Am I accomplishing my goals while remaining thread safe? Do you all see a better way to achieve my goals?
Thanks!
UPDATE: So the bottom line here was that I was really trying to do too much in 1 area. For some reason, I was convinced that managing the Get / Update operations in the same class that managed the cache was a good idea. After looking at Groo's solution & rethinking the issue, I was able to do a good amount of refactoring which removed this issue I was facing.
Well, I don't think this class does what you need.
Allow multiple reads against the object, but lock all reads upon an update request
You may lock all reads to the cache manager, but you are not locking reads (nor updates) to the actual cached instance.
Ensure that the object is only ever created 1 time if it does not already exist (keep in mind that its a long build action).
I don't think you ensured that. You are not locking anything while adding the object to the dictionary (and, furthermore, you are adding a lazy constructor, so you don't even know when the object is going to be instantiated).
Edit: This part holds, the only thing I would change is to make Get return a Lazy<object>. While writing my program, I forgot to cast it and calling ToString on the return value returned `"Value not created".
Allow the cache to store many objects, and maintain a lock per object (rather than one lock for all objects).
That's the same as point 1: you are locking the dictionary, not the access to the object. And your update delegate has a strange signature (it accepts a typed generic parameter, and returns an object which is never used). This means you are really modifying the object's properties, and these changes are immediately visible to any part of your program holding a reference to that object.
How to resolve this
If your object is mutable (and I presume it is), there is no way to ensure transactional consistency unless each of your properties also acquires a lock on each read access. A way to simplify this is to make it immutable (that why these are so popular for multithreading).
Alternatively, you may consider breaking this large object into smaller pieces and caching each piece separately, making them immutable if needed.
[Edit] Added a race condition example:
class Program
{
static void Main(string[] args)
{
CacheManager cache = new CacheManager();
cache.AddOrGetExisting("item", () => new Test());
// let one thread modify the item
ThreadPool.QueueUserWorkItem(s =>
{
Thread.Sleep(250);
cache.Update<Test>("item", i =>
{
i.First = "CHANGED";
Thread.Sleep(500);
i.Second = "CHANGED";
return i;
});
});
// let one thread just read the item and print it
ThreadPool.QueueUserWorkItem(s =>
{
var item = ((Lazy<object>)cache.Get("item")).Value;
Log(item.ToString());
Thread.Sleep(500);
Log(item.ToString());
});
Console.Read();
}
class Test
{
private string _first = "Initial value";
public string First
{
get { return _first; }
set { _first = value; Log("First", value); }
}
private string _second = "Initial value";
public string Second
{
get { return _second; }
set { _second = value; Log("Second", value); }
}
public override string ToString()
{
return string.Format("--> PRINTING: First: [{0}], Second: [{1}]", First, Second);
}
}
private static void Log(string message)
{
Console.WriteLine("Thread {0}: {1}", Thread.CurrentThread.ManagedThreadId, message);
}
private static void Log(string property, string value)
{
Console.WriteLine("Thread {0}: {1} property was changed to [{2}]", Thread.CurrentThread.ManagedThreadId, property, value);
}
}
Something like this should happen:
t = 0ms : thread A gets the item and prints the initial value
t = 250ms: thread B modifies the first property
t = 500ms: thread A prints the INCONSISTENT value (only the first prop. changed)
t = 750ms: thread B modifies the second property

Thread safety for high-performance in-memory cache

I have a static in-memory cache that is written to only once an hour (or longer), and read by many threads at an extremely high rate. Conventional wisdom suggests I follow a pattern such as the following:
public static class MyCache
{
private static IDictionary<int, string> _cache;
private static ReaderWriterLockSlim _sharedLock;
static MyCache()
{
_cache = new Dictionary<int, string>();
_sharedLock = new ReaderWriterLockSlim();
}
public static string GetData(int key)
{
_sharedLock.EnterReadLock();
try
{
string returnValue;
_cache.TryGetValue(key, out returnValue);
return returnValue;
}
finally
{
_sharedLock.ExitReadLock();
}
}
public static void AddData(int key, string data)
{
_sharedLock.EnterWriteLock();
try
{
if (!_cache.ContainsKey(key))
_cache.Add(key, data);
}
finally
{
_sharedLock.ExitWriteLock();
}
}
}
As an excercise in micro-optimization, how can I shave off even more ticks in the relative expense of shared read locks? Time to write can be expensive, since it rarely happens. I need to make reads as fast as possible. Can I just drop the read locks (below) and remain thread-safe in this scenario? Or is there a lock-free version I can use? I'm familiar with memory-fencing but don't know how to safely apply it in this instance.
Note: I'm not tied to either pattern so any suggestions are welcome as long as the end result is faster and in C# 4.x.*
public static class MyCache2
{
private static IDictionary<int, string> _cache;
private static object _fullLock;
static MyCache2()
{
_cache = new Dictionary<int, string>();
_fullLock = new object();
}
public static string GetData(int key)
{
//Note: There is no locking here... Is that ok?
string returnValue;
_cache.TryGetValue(key, out returnValue);
return returnValue;
}
public static void AddData(int key, string data)
{
lock (_fullLock)
{
if (!_cache.ContainsKey(key))
_cache.Add(key, data);
}
}
}
You don't need a lock when there are threads only ever reading from the data structure. So, since writes are so rare (and, I assume, not concurrent), an option might be to make a full copy of the dictionary, make the modifications to the copy, and then atomically exchange the old dictionary with the new one:
public static class MyCache2
{
private static IDictionary<int, string> _cache;
static MyCache2()
{
_cache = new Dictionary<int, string>();
}
public static string GetData(int key)
{
string returnValue;
_cache.TryGetValue(key, out returnValue);
return returnValue;
}
public static void AddData(int key, string data)
{
IDictionary<int, string> clone = Clone(_cache);
if (!clone.ContainsKey(key))
clone.Add(key, data);
Interlocked.Exchange(ref _cache, clone);
}
}
I would be looking to go lock free here, and achieve thread safety by simply not changing any published dictionary. What I mean is: when you need to add data, create a complete copy of the dictionary, and append/update/etc the copy. Since this is once an hour this shouldn't be a problem even for large data. Then, when you have made the changes, simply swap the reference from the old dictionary to the new dictionary (reference reads/writes are guaranteed to be atomic).
One caveat: any code that needs consistent state between multiple operations should capture the dictionary into a variable first, I.e.
var snapshot = someField;
// multiple reads on snapshot
This ensures that any related logic is all made using the same version of the data, to avoid confusion when the reference swaps during the operation.
I would also take a lock when writing (not when reading) to ensure no squabbling over the data. There are lock-free multi-writer approaches too (primarily Interlocked.CompareExchange and reapply if it fails), but I would use the simplest approach first, and a single writer is exactly that.
Alternative option: the .net 1.x Hashtable (essentially Dictionary, minus the generics) has an interesting threading story; the reads are thread safe without locks - you only need to use locks to ensure at most one writer.
So: you might consider using a non-generic Hashtable, no locking on reads, and then take a lock during writes.
This is the main reason I still find myself using Hashtable sometimes, even in .net 4.x applications.
One problem though - it'll cause the int key to be boxed for both storage and query.
This makes a copy of the dictionary only when data is being added. A lock is used for adding but you can take that out if you don't intend to add from more than one thread. If there's no copy then data is pulled from the original dictionary, otherwise the copy is used while adding.
Just in case the copy gets nulled out after it's checked and seen as not null but before it's able to retrieve the value, I added a try catch which in that rare event, it will pull the data from the original which is then locked but again, this should happen very rarely if at all.
public static class MyCache2
{
private static IDictionary<int, string> _cache;
private static IDictionary<int, string> _cacheClone;
private static Object _lock;
static MyCache2()
{
_cache = new Dictionary<int, string>();
_lock = new Object();
}
public static string GetData(int key)
{
string returnValue;
if (_cacheClone == null)
{
_cache.TryGetValue(key, out returnValue);
}
else
{
try
{
_cacheClone.TryGetValue(key, out returnValue);
}
catch
{
lock (_lock)
{
_cache.TryGetValue(key, out returnValue);
}
}
}
return returnValue;
}
public static void AddData(int key, string data)
{
lock (_lock)
{
_cacheClone = Clone(_cache);
if (!_cache.ContainsKey(key))
_cache.Add(key, data);
_cacheClone = null;
}
}
}
You might also look at lock free data structures. http://www.boyet.com/Articles/LockfreeStack.html is a good example

No ConcurrentList<T> in .Net 4.0?

I was thrilled to see the new System.Collections.Concurrent namespace in .Net 4.0, quite nice! I've seen ConcurrentDictionary, ConcurrentQueue, ConcurrentStack, ConcurrentBag and BlockingCollection.
One thing that seems to be mysteriously missing is a ConcurrentList<T>. Do I have to write that myself (or get it off the web :) )?
Am I missing something obvious here?
I gave it a try a while back (also: on GitHub). My implementation had some problems, which I won't get into here. Let me tell you, more importantly, what I learned.
Firstly, there's no way you're going to get a full implementation of IList<T> that is lockless and thread-safe. In particular, random insertions and removals are not going to work, unless you also forget about O(1) random access (i.e., unless you "cheat" and just use some sort of linked list and let the indexing suck).
What I thought might be worthwhile was a thread-safe, limited subset of IList<T>: in particular, one that would allow an Add and provide random read-only access by index (but no Insert, RemoveAt, etc., and also no random write access).
This was the goal of my ConcurrentList<T> implementation. But when I tested its performance in multithreaded scenarios, I found that simply synchronizing adds to a List<T> was faster. Basically, adding to a List<T> is lightning fast already; the complexity of the computational steps involved is miniscule (increment an index and assign to an element in an array; that's really it). You would need a ton of concurrent writes to see any sort of lock contention on this; and even then, the average performance of each write would still beat out the more expensive albeit lockless implementation in ConcurrentList<T>.
In the relatively rare event that the list's internal array needs to resize itself, you do pay a small cost. So ultimately I concluded that this was the one niche scenario where an add-only ConcurrentList<T> collection type would make sense: when you want guaranteed low overhead of adding an element on every single call (so, as opposed to an amortized performance goal).
It's simply not nearly as useful a class as you would think.
What would you use a ConcurrentList for?
The concept of a Random Access container in a threaded world isn't as useful as it may appear. The statement
if (i < MyConcurrentList.Count)
x = MyConcurrentList[i];
as a whole would still not be thread-safe.
Instead of creating a ConcurrentList, try to build solutions with what's there. The most common classes are the ConcurrentBag and especially the BlockingCollection.
With all due respect to the great answers provided already, there are times that I simply want a thread-safe IList. Nothing advanced or fancy. Performance is important in many cases but at times that just isn't a concern. Yes, there are always going to be challenges without methods like "TryGetValue" etc, but most cases I just want something that I can enumerate without needing to worry about putting locks around everything. And yes, somebody can probably find some "bug" in my implementation that might lead to a deadlock or something (I suppose) but lets be honest: When it comes to multi-threading, if you don't write your code correctly, it is going deadlock anyway. With that in mind I decided to make a simple ConcurrentList implementation that provides these basic needs.
And for what its worth: I did a basic test of adding 10,000,000 items to regular List and ConcurrentList and the results were:
List finished in: 7793 milliseconds.
Concurrent finished in: 8064 milliseconds.
public class ConcurrentList<T> : IList<T>, IDisposable
{
#region Fields
private readonly List<T> _list;
private readonly ReaderWriterLockSlim _lock;
#endregion
#region Constructors
public ConcurrentList()
{
this._lock = new ReaderWriterLockSlim(LockRecursionPolicy.NoRecursion);
this._list = new List<T>();
}
public ConcurrentList(int capacity)
{
this._lock = new ReaderWriterLockSlim(LockRecursionPolicy.NoRecursion);
this._list = new List<T>(capacity);
}
public ConcurrentList(IEnumerable<T> items)
{
this._lock = new ReaderWriterLockSlim(LockRecursionPolicy.NoRecursion);
this._list = new List<T>(items);
}
#endregion
#region Methods
public void Add(T item)
{
try
{
this._lock.EnterWriteLock();
this._list.Add(item);
}
finally
{
this._lock.ExitWriteLock();
}
}
public void Insert(int index, T item)
{
try
{
this._lock.EnterWriteLock();
this._list.Insert(index, item);
}
finally
{
this._lock.ExitWriteLock();
}
}
public bool Remove(T item)
{
try
{
this._lock.EnterWriteLock();
return this._list.Remove(item);
}
finally
{
this._lock.ExitWriteLock();
}
}
public void RemoveAt(int index)
{
try
{
this._lock.EnterWriteLock();
this._list.RemoveAt(index);
}
finally
{
this._lock.ExitWriteLock();
}
}
public int IndexOf(T item)
{
try
{
this._lock.EnterReadLock();
return this._list.IndexOf(item);
}
finally
{
this._lock.ExitReadLock();
}
}
public void Clear()
{
try
{
this._lock.EnterWriteLock();
this._list.Clear();
}
finally
{
this._lock.ExitWriteLock();
}
}
public bool Contains(T item)
{
try
{
this._lock.EnterReadLock();
return this._list.Contains(item);
}
finally
{
this._lock.ExitReadLock();
}
}
public void CopyTo(T[] array, int arrayIndex)
{
try
{
this._lock.EnterReadLock();
this._list.CopyTo(array, arrayIndex);
}
finally
{
this._lock.ExitReadLock();
}
}
public IEnumerator<T> GetEnumerator()
{
return new ConcurrentEnumerator<T>(this._list, this._lock);
}
IEnumerator IEnumerable.GetEnumerator()
{
return new ConcurrentEnumerator<T>(this._list, this._lock);
}
~ConcurrentList()
{
this.Dispose(false);
}
public void Dispose()
{
this.Dispose(true);
}
private void Dispose(bool disposing)
{
if (disposing)
GC.SuppressFinalize(this);
this._lock.Dispose();
}
#endregion
#region Properties
public T this[int index]
{
get
{
try
{
this._lock.EnterReadLock();
return this._list[index];
}
finally
{
this._lock.ExitReadLock();
}
}
set
{
try
{
this._lock.EnterWriteLock();
this._list[index] = value;
}
finally
{
this._lock.ExitWriteLock();
}
}
}
public int Count
{
get
{
try
{
this._lock.EnterReadLock();
return this._list.Count;
}
finally
{
this._lock.ExitReadLock();
}
}
}
public bool IsReadOnly
{
get { return false; }
}
#endregion
}
public class ConcurrentEnumerator<T> : IEnumerator<T>
{
#region Fields
private readonly IEnumerator<T> _inner;
private readonly ReaderWriterLockSlim _lock;
#endregion
#region Constructor
public ConcurrentEnumerator(IEnumerable<T> inner, ReaderWriterLockSlim #lock)
{
this._lock = #lock;
this._lock.EnterReadLock();
this._inner = inner.GetEnumerator();
}
#endregion
#region Methods
public bool MoveNext()
{
return _inner.MoveNext();
}
public void Reset()
{
_inner.Reset();
}
public void Dispose()
{
this._lock.ExitReadLock();
}
#endregion
#region Properties
public T Current
{
get { return _inner.Current; }
}
object IEnumerator.Current
{
get { return _inner.Current; }
}
#endregion
}
The reason why there is no ConcurrentList is because it fundamentally cannot be written. The reason why is that several important operations in IList rely on indices, and that just plain won't work. For example:
int catIndex = list.IndexOf("cat");
list.Insert(catIndex, "dog");
The effect that the author is going after is to insert "dog" before "cat", but in a multithreaded environment, anything can happen to the list between those two lines of code. For example, another thread might do list.RemoveAt(0), shifting the entire list to the left, but crucially, catIndex will not change. The impact here is that the Insert operation will actually put the "dog" after the cat, not before it.
The several implementations that you see offered as "answers" to this question are well-meaning, but as the above shows, they don't offer reliable results. If you really want list-like semantics in a multithreaded environment, you can't get there by putting locks inside the list implementation methods. You have to ensure that any index you use lives entirely inside the context of the lock. The upshot is that you can use a List in a multithreaded environment with the right locking, but the list itself cannot be made to exist in that world.
If you think you need a concurrent list, there are really just two possibilities:
What you really need is a ConcurrentBag
You need to create your own collection, perhaps implemented with a List and your own concurrency control.
If you have a ConcurrentBag and are in a position where you need to pass it as an IList, then you have a problem, because the method you're calling has specified that they might try to do something like I did above with the cat & dog. In most worlds, what that means is that the method you're calling is simply not built to work in a multi-threaded environment. That means you either refactor it so that it is or, if you can't, you're going to have to handle it very carefully. You you'll almost certainly be required to create your own collection with its own locks, and call the offending method within a lock.
ConcurrentList (as a resizeable array, not a linked list) is not easy to write with nonblocking operations. Its API doesn't translate well to a "concurrent" version.
In cases where reads greatly outnumber writes, or (however frequent) writes are non-concurrent, a copy-on-write approach may be appropriate.
The implementation shown below is
lockless
blazingly fast for concurrent reads, even while concurrent modifications are ongoing - no matter how long they take
because "snapshots" are immutable, lockless atomicity is possible, i.e. var snap = _list; snap[snap.Count - 1]; will never (well, except for an empty list of course) throw, and you also get thread-safe enumeration with snapshot semantics for free.. how I LOVE immutability!
implemented generically, applicable to any data structure and any type of modification
dead simple, i.e. easy to test, debug, verify by reading the code
usable in .Net 3.5
For copy-on-write to work, you have to keep your data structures effectively immutable, i.e. no one is allowed to change them after you made them available to other threads. When you want to modify, you
clone the structure
make modifications on the clone
atomically swap in the reference to the modified clone
Code
static class CopyOnWriteSwapper
{
public static void Swap<T>(ref T obj, Func<T, T> cloner, Action<T> op)
where T : class
{
while (true)
{
var objBefore = Volatile.Read(ref obj);
var newObj = cloner(objBefore);
op(newObj);
if (Interlocked.CompareExchange(ref obj, newObj, objBefore) == objBefore)
return;
}
}
}
Usage
CopyOnWriteSwapper.Swap(ref _myList,
orig => new List<string>(orig),
clone => clone.Add("asdf"));
If you need more performance, it will help to ungenerify the method, e.g. create one method for every type of modification (Add, Remove, ...) you want, and hard code the function pointers cloner and op.
N.B. #1 It is your responsibility to make sure the no one modifies the (supposedly) immutable data structure. There's nothing we can do in a generic implementation to prevent that, but when specializing to List<T>, you could guard against modification using List.AsReadOnly()
N.B. #2 Be careful about the values in the list. The copy on write approach above guards their list membership only, but if you'd put not strings, but some other mutable objects in there, you have to take care of thread safety (e.g. locking). But that is orthogonal to this solution and e.g. locking of the mutable values can be easily used without issues. You just need to be aware of it.
N.B. #3 If your data structure is huge and you modify it frequently, the copy-all-on-write approach might be prohibitive both in terms of memory consumption and the CPU cost of copying involved. In that case, you might want to use MS's Immutable Collections instead.
System.Collections.Generic.List<t> is already thread safe for multiple readers. Trying to make it thread safe for multiple writers wouldn't make sense. (For reasons Henk and Stephen already mentioned)
Some people hilighted some goods points (and some of my thoughts):
It could looklikes insane to unable random accesser (indexer) but to me it appears fine. You only have to think that there is many methods on multi-threaded collections that could fail like Indexer and Delete. You could also define failure (fallback) action for write accessor like "fail" or simply "add at the end".
It is not because it is a multithreaded collection that it will always be used in a multithreaded context. Or it could also be used by only one writer and one reader.
Another way to be able to use indexer in a safe manner could be to wrap actions into a lock of the collection using its root (if made public).
For many people, making a rootLock visible goes agaist "Good practice". I'm not 100% sure about this point because if it is hidden you remove a lot of flexibility to the user. We always have to remember that programming multithread is not for anybody. We can't prevent every kind of wrong usage.
Microsoft will have to do some work and define some new standard to introduce proper usage of Multithreaded collection. First the IEnumerator should not have a moveNext but should have a GetNext that return true or false and get an out paramter of type T (this way the iteration would not be blocking anymore). Also, Microsoft already use "using" internally in the foreach but sometimes use the IEnumerator directly without wrapping it with "using" (a bug in collection view and probably at more places) - Wrapping usage of IEnumerator is a recommended pratice by Microsoft. This bug remove good potential for safe iterator... Iterator that lock collection in constructor and unlock on its Dispose method - for a blocking foreach method.
That is not an answer. This is only comments that do not really fit to a specific place.
... My conclusion, Microsoft has to make some deep changes to the "foreach" to make MultiThreaded collection easier to use. Also it has to follow there own rules of IEnumerator usage. Until that, we can write a MultiThreadList easily that would use a blocking iterator but that will not follow "IList". Instead, you will have to define own "IListPersonnal" interface that could fail on "insert", "remove" and random accessor (indexer) without exception. But who will want to use it if it is not standard ?
I implemented one similar to Brian's. Mine is different:
I manage the array directly.
I don't enter the locks within the try block.
I use yield return for producing an enumerator.
I support lock recursion. This allows reads from list during iteration.
I use upgradable read locks where possible.
DoSync and GetSync methods allowing sequential interactions that require exclusive access to the list.
The code:
public class ConcurrentList<T> : IList<T>, IDisposable
{
private ReaderWriterLockSlim _lock = new ReaderWriterLockSlim(LockRecursionPolicy.SupportsRecursion);
private int _count = 0;
public int Count
{
get
{
_lock.EnterReadLock();
try
{
return _count;
}
finally
{
_lock.ExitReadLock();
}
}
}
public int InternalArrayLength
{
get
{
_lock.EnterReadLock();
try
{
return _arr.Length;
}
finally
{
_lock.ExitReadLock();
}
}
}
private T[] _arr;
public ConcurrentList(int initialCapacity)
{
_arr = new T[initialCapacity];
}
public ConcurrentList():this(4)
{ }
public ConcurrentList(IEnumerable<T> items)
{
_arr = items.ToArray();
_count = _arr.Length;
}
public void Add(T item)
{
_lock.EnterWriteLock();
try
{
var newCount = _count + 1;
EnsureCapacity(newCount);
_arr[_count] = item;
_count = newCount;
}
finally
{
_lock.ExitWriteLock();
}
}
public void AddRange(IEnumerable<T> items)
{
if (items == null)
throw new ArgumentNullException("items");
_lock.EnterWriteLock();
try
{
var arr = items as T[] ?? items.ToArray();
var newCount = _count + arr.Length;
EnsureCapacity(newCount);
Array.Copy(arr, 0, _arr, _count, arr.Length);
_count = newCount;
}
finally
{
_lock.ExitWriteLock();
}
}
private void EnsureCapacity(int capacity)
{
if (_arr.Length >= capacity)
return;
int doubled;
checked
{
try
{
doubled = _arr.Length * 2;
}
catch (OverflowException)
{
doubled = int.MaxValue;
}
}
var newLength = Math.Max(doubled, capacity);
Array.Resize(ref _arr, newLength);
}
public bool Remove(T item)
{
_lock.EnterUpgradeableReadLock();
try
{
var i = IndexOfInternal(item);
if (i == -1)
return false;
_lock.EnterWriteLock();
try
{
RemoveAtInternal(i);
return true;
}
finally
{
_lock.ExitWriteLock();
}
}
finally
{
_lock.ExitUpgradeableReadLock();
}
}
public IEnumerator<T> GetEnumerator()
{
_lock.EnterReadLock();
try
{
for (int i = 0; i < _count; i++)
// deadlocking potential mitigated by lock recursion enforcement
yield return _arr[i];
}
finally
{
_lock.ExitReadLock();
}
}
IEnumerator IEnumerable.GetEnumerator()
{
return this.GetEnumerator();
}
public int IndexOf(T item)
{
_lock.EnterReadLock();
try
{
return IndexOfInternal(item);
}
finally
{
_lock.ExitReadLock();
}
}
private int IndexOfInternal(T item)
{
return Array.FindIndex(_arr, 0, _count, x => x.Equals(item));
}
public void Insert(int index, T item)
{
_lock.EnterUpgradeableReadLock();
try
{
if (index > _count)
throw new ArgumentOutOfRangeException("index");
_lock.EnterWriteLock();
try
{
var newCount = _count + 1;
EnsureCapacity(newCount);
// shift everything right by one, starting at index
Array.Copy(_arr, index, _arr, index + 1, _count - index);
// insert
_arr[index] = item;
_count = newCount;
}
finally
{
_lock.ExitWriteLock();
}
}
finally
{
_lock.ExitUpgradeableReadLock();
}
}
public void RemoveAt(int index)
{
_lock.EnterUpgradeableReadLock();
try
{
if (index >= _count)
throw new ArgumentOutOfRangeException("index");
_lock.EnterWriteLock();
try
{
RemoveAtInternal(index);
}
finally
{
_lock.ExitWriteLock();
}
}
finally
{
_lock.ExitUpgradeableReadLock();
}
}
private void RemoveAtInternal(int index)
{
Array.Copy(_arr, index + 1, _arr, index, _count - index-1);
_count--;
// release last element
Array.Clear(_arr, _count, 1);
}
public void Clear()
{
_lock.EnterWriteLock();
try
{
Array.Clear(_arr, 0, _count);
_count = 0;
}
finally
{
_lock.ExitWriteLock();
}
}
public bool Contains(T item)
{
_lock.EnterReadLock();
try
{
return IndexOfInternal(item) != -1;
}
finally
{
_lock.ExitReadLock();
}
}
public void CopyTo(T[] array, int arrayIndex)
{
_lock.EnterReadLock();
try
{
if(_count > array.Length - arrayIndex)
throw new ArgumentException("Destination array was not long enough.");
Array.Copy(_arr, 0, array, arrayIndex, _count);
}
finally
{
_lock.ExitReadLock();
}
}
public bool IsReadOnly
{
get { return false; }
}
public T this[int index]
{
get
{
_lock.EnterReadLock();
try
{
if (index >= _count)
throw new ArgumentOutOfRangeException("index");
return _arr[index];
}
finally
{
_lock.ExitReadLock();
}
}
set
{
_lock.EnterUpgradeableReadLock();
try
{
if (index >= _count)
throw new ArgumentOutOfRangeException("index");
_lock.EnterWriteLock();
try
{
_arr[index] = value;
}
finally
{
_lock.ExitWriteLock();
}
}
finally
{
_lock.ExitUpgradeableReadLock();
}
}
}
public void DoSync(Action<ConcurrentList<T>> action)
{
GetSync(l =>
{
action(l);
return 0;
});
}
public TResult GetSync<TResult>(Func<ConcurrentList<T>,TResult> func)
{
_lock.EnterWriteLock();
try
{
return func(this);
}
finally
{
_lock.ExitWriteLock();
}
}
public void Dispose()
{
_lock.Dispose();
}
}
In sequentially executing code the data structures used are different from (well written) concurrently executing code. The reason is that sequential code implies implicit order. Concurrent code however does not imply any order; better yet it implies the lack of any defined order!
Due to this, data structures with implied order (like List) are not very useful for solving concurrent problems. A list implies order, but it does not clearly define what that order is. Because of this the execution order of the code manipulating the list will determine (to some degree) the implicit order of the list, which is in direct conflict with an efficient concurrent solution.
Remember concurrency is a data problem, not a code problem! You cannot Implement the code first (or rewriting existing sequential code) and get a well designed concurrent solution. You need to design the data structures first while keeping in mind that implicit ordering doesn’t exist in a concurrent system.
lockless Copy and Write approach works great if you're not dealing with too many items.
Here's a class I wrote:
public class CopyAndWriteList<T>
{
public static List<T> Clear(List<T> list)
{
var a = new List<T>(list);
a.Clear();
return a;
}
public static List<T> Add(List<T> list, T item)
{
var a = new List<T>(list);
a.Add(item);
return a;
}
public static List<T> RemoveAt(List<T> list, int index)
{
var a = new List<T>(list);
a.RemoveAt(index);
return a;
}
public static List<T> Remove(List<T> list, T item)
{
var a = new List<T>(list);
a.Remove(item);
return a;
}
}
example usage:
orders_BUY = CopyAndWriteList.Clear(orders_BUY);
I'm surprised no-one has mentioned using LinkedList as a base for writing a specialised class.
Often we don't need the full API's of the various collection classes, and if you write mostly functional side effect free code, using immutable classes as far as possible, then you'll actually NOT want to mutate the collection favouring various snapshot implementations.
LinkedList solves some difficult problems of creating snapshot copies/clones of large collections. I also use it to create "threadsafe" enumerators to enumerate over the collection. I can cheat, because I know that I'm not changing the collection in any way other than appending, I can keep track of the list size, and only lock on changes to list size. Then my enumerator code simply enumerates from 0 to n for any thread that wants a "snapshot" of the append only collection, that will be guaranteed to represent a "snapshot" of the collection at any moment in time, regardless of what other threads are appending to the head of the collection.
I'm pretty certain that most requirements are often extremely simple, and you need 2 or 3 methods only. Writing a truly generic library is awfully difficult, but solving your own codes needs can sometimes be easy with a trick or two.
Long live LinkedList and good functional programming.
Cheers, ... love ya all!
Al
p.s. sample hack AppendOnly class here : https://github.com/goblinfactory/AppendOnly

Thread-safe List<T> property

I want an implementation of List<T> as a property which can be used thread-safely without any doubt.
Something like this:
private List<T> _list;
private List<T> MyT
{
get { // return a copy of _list; }
set { _list = value; }
}
It seems still I need to return a copy (cloned) of collection so if somewhere we are iterating the collection and at the same time the collection is set, then no exception is raised.
How to implement a thread-safe collection property?
If you are targetting .Net 4 there are a few options in System.Collections.Concurrent Namespace
You could use ConcurrentBag<T> in this case instead of List<T>
Even as it got the most votes, one usually can't take System.Collections.Concurrent.ConcurrentBag<T> as a thread-safe replacement for System.Collections.Generic.List<T> as it is (Radek Stromský already pointed it out) not ordered.
But there is a class called System.Collections.Generic.SynchronizedCollection<T> that is already since .NET 3.0 part of the framework, but it is that well hidden in a location where one does not expect it that it is little known and probably you have never ever stumbled over it (at least I never did).
SynchronizedCollection<T> is compiled into assembly System.ServiceModel.dll (which is part of the client profile but not of the portable class library).
I would think making a sample ThreadSafeList class would be easy:
public class ThreadSafeList<T> : IList<T>
{
protected List<T> _internalList = new List<T>();
// Other Elements of IList implementation
public IEnumerator<T> GetEnumerator()
{
return Clone().GetEnumerator();
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return Clone().GetEnumerator();
}
protected static object _lock = new object();
public List<T> Clone()
{
List<T> newList = new List<T>();
lock (_lock)
{
_internalList.ForEach(x => newList.Add(x));
}
return newList;
}
}
You simply clone the list before requesting an enumerator, and thus any enumeration is working off a copy that can't be modified while running.
Even accepted answer is ConcurrentBag, I don't think it's real replacement of list in all cases, as Radek's comment to the answer says: "ConcurrentBag is unordered collection, so unlike List it does not guarantee ordering. Also you cannot access items by index".
So if you use .NET 4.0 or higher, a workaround could be to use ConcurrentDictionary with integer TKey as array index and TValue as array value. This is recommended way of replacing list in Pluralsight's C# Concurrent Collections course. ConcurrentDictionary solves both problems mentioned above: index accessing and ordering (we can not rely on ordering as it's hash table under the hood, but current .NET implementation saves order of elements adding).
C#'s ArrayList class has a Synchronized method.
var threadSafeArrayList = ArrayList.Synchronized(new ArrayList());
This returns a thread safe wrapper around any instance of IList. All operations need to be performed through the wrapper to ensure thread safety.
In .NET Core (any version), you can use ImmutableList, which has all the functionality of List<T>.
If you look at the source code for List of T (https://referencesource.microsoft.com/#mscorlib/system/collections/generic/list.cs,c66df6f36c131877) you will notice there is a class there (which is of course internal - why, Microsoft, why?!?!) called SynchronizedList of T. I am copy pasting the code here:
[Serializable()]
internal class SynchronizedList : IList<T> {
private List<T> _list;
private Object _root;
internal SynchronizedList(List<T> list) {
_list = list;
_root = ((System.Collections.ICollection)list).SyncRoot;
}
public int Count {
get {
lock (_root) {
return _list.Count;
}
}
}
public bool IsReadOnly {
get {
return ((ICollection<T>)_list).IsReadOnly;
}
}
public void Add(T item) {
lock (_root) {
_list.Add(item);
}
}
public void Clear() {
lock (_root) {
_list.Clear();
}
}
public bool Contains(T item) {
lock (_root) {
return _list.Contains(item);
}
}
public void CopyTo(T[] array, int arrayIndex) {
lock (_root) {
_list.CopyTo(array, arrayIndex);
}
}
public bool Remove(T item) {
lock (_root) {
return _list.Remove(item);
}
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator() {
lock (_root) {
return _list.GetEnumerator();
}
}
IEnumerator<T> IEnumerable<T>.GetEnumerator() {
lock (_root) {
return ((IEnumerable<T>)_list).GetEnumerator();
}
}
public T this[int index] {
get {
lock(_root) {
return _list[index];
}
}
set {
lock(_root) {
_list[index] = value;
}
}
}
public int IndexOf(T item) {
lock (_root) {
return _list.IndexOf(item);
}
}
public void Insert(int index, T item) {
lock (_root) {
_list.Insert(index, item);
}
}
public void RemoveAt(int index) {
lock (_root) {
_list.RemoveAt(index);
}
}
}
Personally I think they knew a better implementation using SemaphoreSlim could be created, but didn't get to it.
I would suggest anyone dealing with a List<T> in multi-threading scenarios to take look at Immutable Collections in particular the ImmutableArray.
I've found it very useful when you have:
Relatively few items in the list
Not so many read/write operations
A LOT of concurrent access (i.e. many threads that access the list in reading mode)
Also can be useful when you need to implement some sort of transaction-like behavior (i.e. revert an insert/update/delete operation in case of fail)
It seems like many of the people finding this are wanting a thread safe indexed dynamically sized collection. The closest and easiest thing I know of would be.
System.Collections.Concurrent.ConcurrentDictionary<int, YourDataType>
This would require you to ensure your key is properly incremented if you want normal indexing behavior. If you are careful .count() could suffice as the key for any new key value pairs you add.
You can also use the more primitive
Monitor.Enter(lock);
Monitor.Exit(lock);
which lock uses (see this post C# Locking an object that is reassigned in lock block).
If you are expecting exceptions in the code this is not safe but it allows you to do something like the following:
using System;
using System.Collections.Generic;
using System.Threading;
using System.Linq;
public class Something
{
private readonly object _lock;
private readonly List<string> _contents;
public Something()
{
_lock = new object();
_contents = new List<string>();
}
public Modifier StartModifying()
{
return new Modifier(this);
}
public class Modifier : IDisposable
{
private readonly Something _thing;
public Modifier(Something thing)
{
_thing = thing;
Monitor.Enter(Lock);
}
public void OneOfLotsOfDifferentOperations(string input)
{
DoSomethingWith(input);
}
private void DoSomethingWith(string input)
{
Contents.Add(input);
}
private List<string> Contents
{
get { return _thing._contents; }
}
private object Lock
{
get { return _thing._lock; }
}
public void Dispose()
{
Monitor.Exit(Lock);
}
}
}
public class Caller
{
public void Use(Something thing)
{
using (var modifier = thing.StartModifying())
{
modifier.OneOfLotsOfDifferentOperations("A");
modifier.OneOfLotsOfDifferentOperations("B");
modifier.OneOfLotsOfDifferentOperations("A");
modifier.OneOfLotsOfDifferentOperations("A");
modifier.OneOfLotsOfDifferentOperations("A");
}
}
}
One of the nice things about this is you'll get the lock for the duration of the series of operations (rather than locking in each operation). Which means that the output should come out in the right chunks (my usage of this was getting some output onto screen from an external process)
I do really like the simplicity + transparency of the ThreadSafeList + that does the important bit in stopping crashes
I believe _list.ToList() will make you a copy. You can also query it if you need to such as :
_list.Select("query here").ToList();
Anyways, msdn says this is indeed a copy and not simply a reference. Oh, and yes, you will need to lock in the set method as the others have pointed out.
Looking at the original sample one may guess that the intention was to be able to simply replace the list with the new one. The setter on the property tells us about it.
The Micrisoft's Thread-Safe Collections are for safely adding and removing items from collection. But if in the application logic you are intending to replace the collection with the new one, one may guess, again, that the adding and deleting functionality of the List is not required.
If this is the case then, the simple answer would be to use IReadOnlyList interface:
private IReadOnlyList<T> _readOnlyList = new List<T>();
private IReadOnlyList<T> MyT
{
get { return _readOnlyList; }
set { _readOnlyList = value; }
}
One doesn't need to use any locking in this situation because there is no way to modify the collection. If in the setter the "_readOnlyList = value;" will be replaced by something more complicated then the lock could be required.
Basically if you want to enumerate safely, you need to use lock.
Please refer to MSDN on this. http://msdn.microsoft.com/en-us/library/6sh2ey19.aspx
Here is part of MSDN that you might be interested:
Public static (Shared in Visual Basic) members of this type are thread safe. Any instance members are not guaranteed to be thread safe.
A List can support multiple readers concurrently, as long as the collection is not modified. Enumerating through a collection is intrinsically not a thread-safe procedure. In the rare case where an enumeration contends with one or more write accesses, the only way to ensure thread safety is to lock the collection during the entire enumeration. To allow the collection to be accessed by multiple threads for reading and writing, you must implement your own synchronization.
Here is the class for thread safe list without lock
public class ConcurrentList
{
private long _i = 1;
private ConcurrentDictionary<long, T> dict = new ConcurrentDictionary<long, T>();
public int Count()
{
return dict.Count;
}
public List<T> ToList()
{
return dict.Values.ToList();
}
public T this[int i]
{
get
{
long ii = dict.Keys.ToArray()[i];
return dict[ii];
}
}
public void Remove(T item)
{
T ov;
var dicItem = dict.Where(c => c.Value.Equals(item)).FirstOrDefault();
if (dicItem.Key > 0)
{
dict.TryRemove(dicItem.Key, out ov);
}
this.CheckReset();
}
public void RemoveAt(int i)
{
long v = dict.Keys.ToArray()[i];
T ov;
dict.TryRemove(v, out ov);
this.CheckReset();
}
public void Add(T item)
{
dict.TryAdd(_i, item);
_i++;
}
public IEnumerable<T> Where(Func<T, bool> p)
{
return dict.Values.Where(p);
}
public T FirstOrDefault(Func<T, bool> p)
{
return dict.Values.Where(p).FirstOrDefault();
}
public bool Any(Func<T, bool> p)
{
return dict.Values.Where(p).Count() > 0 ? true : false;
}
public void Clear()
{
dict.Clear();
}
private void CheckReset()
{
if (dict.Count == 0)
{
this.Reset();
}
}
private void Reset()
{
_i = 1;
}
}
Use the lock statement to do this. (Read here for more information.)
private List<T> _list;
private List<T> MyT
{
get { return _list; }
set
{
//Lock so only one thread can change the value at any given time.
lock (_list)
{
_list = value;
}
}
}
FYI this probably isn't exactly what your asking - you likely want to lock farther out in your code but I can't assume that. Have a look at the lock keyword and tailor its use to your specific situation.
If you need to, you could lock in both the get and set block using the _list variable which would make it so a read/write can not occur at the same time.

Categories

Resources