I'm in my early days of working with C# and threading and I'm in need of a data structure that can store key-value pairs like a dictionary, but after a certain time after being added I want the values to "expire", or get removed from the dictionary.
I have come up with an implementation that seems to work, but I was wondering if there might be any problems to it when used in a multi threaded environment.
Are there any direct issues with my current implementation?
As all usage of the dictionary is protected by the semaphore, is this implementation to be considered thread-safe out of the box?
public class TokenCache<TKey, TValue>
where TKey : IEquatable<TKey>
where TValue : class
{
private readonly TimeSpan expirationTime;
private readonly SemaphoreSlim semaphore;
private readonly Dictionary<TKey, TValue> cache;
public TokenCache(TimeSpan expirationTime)
{
this.expirationTime = expirationTime;
this.semaphore = new SemaphoreSlim(1, 1);
this.cache = new Dictionary<TKey, TValue>();
}
public void Add(TKey key, TValue value)
{
this.semaphore.Wait();
try
{
this.cache[key] = value;
Task
.Run(() => Thread.Sleep(this.expirationTime))
.ContinueWith(t =>
{
this.Remove(key);
});
}
finally
{
this.semaphore.Release();
}
}
public void Remove(TKey key)
{
this.semaphore.Wait();
try
{
this.cache.Remove(key);
}
finally
{
this.semaphore.Release();
}
}
public TValue GetValueOrDefault(TKey key)
{
this.semaphore.Wait();
try
{
if (this.cache.TryGetValue(key, out var value))
{
return value;
}
return null;
}
finally
{
this.semaphore.Release();
}
}
}
Problems I see in no particular order (there might be more though):
Task.Run
Task.Run(() => Thread.Sleep(...)) is a very bad idea. The ThreadPool, which contains the threads that run Tasks has a limited amount of threads (it can grow, but it grows very slowly!!) and by calling Task.Run(() => Thread.Sleep(...)) instead of Task.Delay(...) you're wasting a ThreadPool thread for the duration of the wait. Task.Delay(...) doesn't use up a thread.
Wrong Item Expiration
Another problem is your Add method. Think about the following:
You create an instance of TokenCache, for example via new TokenCache<int, string>(TimeSpan.FromMinutes(1))
You call Add(10, "ten")
After 50 seconds, you call Remove(10)
After 5 more seconds, you call Add(10, "The Number 10");
Because of the way you've written your Add method, you're now going to remove the value "The Number 10" after only 5 seconds
Pointless Semaphore
Also, I don't really see the point of using a SemaphoreSlim. In this scenario you could just use the lock keyword on something like private readonly object locker = new();
Related
I wanted to measure the performance of Concurrent Dictionary vs Dictionary+Locks in a multithreaded environment. So I created my own SyncDict class of type<int,int[]>. Whenever there is a key match, it adds the int[] array value to itself, it also locks the whole dictionary with ReaderWriterLockSlim while updating the value.
I replicated the code through Concurrent Dictionary and I am mainly using AddOrUpdate() method.
Whole console app code can be found here https://dotnetfiddle.net/1kFbGy Just copy paste the code in console app to run. It will not run fiddle
After running both codes with the same inputs I see a considerable amount of difference in running time. For example for one particular run on my machine Concurrent dictionary took 4.5 seconds vs SyncDict took less than 1 second.
I would like to know any thoughts / suggestions explaining the above running time. Is there anything wrong am I doing here.
class SyncDict<TKey>
{
private ReaderWriterLockSlim cacheLock;
private Dictionary<TKey, int[]> dictionary;
public SyncDict()
{
cacheLock = new ReaderWriterLockSlim();
dictionary = new Dictionary<TKey, int[]>();
}
public Dictionary<TKey, int[]> Dictionary
{
get { return dictionary; }
}
public int[] Read(TKey key)
{
cacheLock.EnterReadLock();
try
{
return dictionary[key];
}
finally
{
cacheLock.ExitReadLock();
}
}
public void Add(TKey key, int[] value)
{
cacheLock.EnterWriteLock();
try
{
dictionary.Add(key, value);
}
finally
{
cacheLock.ExitWriteLock();
}
}
public AddOrUpdateStatus AddOrUpdate(TKey key, int[] value)
{
cacheLock.EnterUpgradeableReadLock();
try
{
int[] result = null;
if (dictionary.TryGetValue(key, out result))
{
if (result == value)
return AddOrUpdateStatus.Unchanged;
else
{
cacheLock.EnterWriteLock();
try
{
Parallel.For(0, value.Length,
(i, state) =>
{
result[i] = result[i] + value[i];
});
}
finally
{
cacheLock.ExitWriteLock();
}
return AddOrUpdateStatus.Updated;
}
}
else
{
Add(key, value);
return AddOrUpdateStatus.Added;
}
}
finally
{
cacheLock.ExitUpgradeableReadLock();
}
}
public void Delete(TKey key)
{
cacheLock.EnterWriteLock();
try
{
dictionary.Remove(key);
}
finally
{
cacheLock.ExitWriteLock();
}
}
public enum AddOrUpdateStatus
{
Added,
Updated,
Unchanged
};
}
There are multiple problems with your test.
1) You are populating a dictionary with ~150.000 different keys, all with the same value.
2) The shared value of all entries is an array of 30.000 integers, and you are updating every element of it at half of the calls to AddOrUpdate. But this only happens when you test the ConcurrentDictionary. At the SyncDict test there is a condition if (result == value) return AddOrUpdateStatus.Unchanged that skips all updates (because the value is shared).
3) You are feeding the two tests with different random inputs.
4) You are updating the array using a Parallel.For loop, while already being in an outer Parallel.For loop, over-parallelizing your workload.
5) When calling the method AddOrUpdate you ignore the documented fact that the updateValueFactory function is called in a thread-unsafe manner, and since multiple AddOrUpdate are executed synchronously and the value is shared, you are corrupting the state of the value.
The updateValueFactory delegate is called outside the locks to avoid the problems that can arise from executing unknown code under a lock.
ConcurrentDictionary.AddOrUpdate Method
I suggest that you modify your test to reflect the intended use of the ConcurrentDictionary class.
I'm looking for a solution that allows multiple threads to read the shared resource (concurrency permitted) but then locks these reading threads once a thread enters a mutating block, to achieve best of both world.
I've looked up this reference but it seems the solution is to lock both reading and writing threads.
class Foo {
List<string> sharedResource;
public void reading() // multiple reading threads allowed, concurrency ok, lock this only if a thread enters the mutating block below.
{
}
public void mutating() // this should lock any threads entering this block as well as lock the reading threads above
{
lock(this)
{
}
}
}
Is there such a solution in C#?
Edit
All threads entering in both GetMultiton and constructor should return the same instance. want them to be thread safe.
class Foo: IFoo {
public static IFoo GetMultiton(string key, Func<IFoo> fooRef)
{
if (instances.TryGetValue(key, out IFoo obj))
{
return obj;
}
return fooRef();
}
public Foo(string key) {
instances.Add(key, this);
}
}
protected static readonly IDictionary<string, IFoo> instances = new ConcurrentDictionary<string, IFoo>();
Use
Foo.GetMultiton("key1", () => new Foo("key1"));
There is a pre-built class for this behavior ReaderWriterLockSlim
class Foo {
List<string> sharedResource;
ReaderWriterLockSlim _lock = new ReaderWriterLockSlim();
public void reading() // multiple reading threads allowed, concurrency ok, lock this only if a thread enters the mutating block below.
{
_lock.EnterReadLock();
try
{
//Do reading stuff here.
}
finally
{
_lock.ExitReadLock();
}
}
public void mutating() // this should lock any threads entering this block as well as lock the reading threads above
{
_lock.EnterWriteLock();
try
{
//Do writing stuff here.
}
finally
{
_lock.ExitWriteLock();
}
}
}
Multiple threads can enter the read lock at the same time but if a write lock tries to be taken it will block till all current readers finish then block all new writers and new readers till the write lock finishes.
With your update you don't need locks at all. Just use GetOrAdd from ConcurrentDictionary
class Foo: IFoo {
public static IFoo GetMultiton(string key, Func<IFoo> fooRef)
{
return instances.GetOrAdd(key, k=> fooRef());
}
public Foo(string key) {
instances.Add(key, this);
}
}
Note that fooRef() may be called more than once, but only the first one to return will be used as the result for all the threads. If you want fooRef() to only be called once it will require slightly more complicated code.
class Foo: IFoo {
public static IFoo GetMultiton(string key, Func<IFoo> fooRef)
{
return instances.GetOrAdd(key, k=> new Lazy<IFoo>(fooRef)).Value;
}
public Foo(string key) {
instances.Add(key, new Lazy<IFoo>(()=>this);
}
}
protected static readonly IDictionary<string, Lazy<IFoo>> instances = new ConcurrentDictionary<string, Lazy<IFoo>>();
The solution depends on your requirements. If performance of ReaderWriterLockSlim (note that it's approximately twice slower than regular lock in current .NET Framework, so maximum performance you can achieve if you modify rarely and reading is quite heavy operation, otherwise overhead will be more than profit), you can try to create copy of data, modify it and atomically swap reference with help of Interlocked class (if it's not a requirement to have the most recent data in each thread as soon as it was changed).
class Foo
{
IReadOnlyList<string> sharedResource = new List<string>();
public void reading()
{
// Here you can safely* read from sharedResource
}
public void mutating()
{
var copyOfData = new List<string>(sharedResource);
// modify copyOfData here
// Following line is correct only in case of single writer:
Interlocked.Exchange(ref sharedResource, copyOfData);
}
}
Benefits of lock-free case:
We have no locks on read, so we get maximum performance.
Drawbacks:
We have to copy data => memory traffic (allocations, garbage collection)
Reader thread can observe not the most recent update (if it reads reference before it was updated)
If reader uses sharedResource reference multiple times, then we must copy this reference to local variable via Interlocked.Exchange (if this usages of reference assume that it's the same collection)
If sharedResource is a list of mutable objects, then we must be careful with updating this objects in mutating since reader might be using them at the same moment => in this case it's better to make copies of these objects as well
If there are several updater threads, then we must use Interlocked.CompareExchange instead of Interlocked.Exchange in mutating and a kind of a loop
So, if you want to go lock-free, then it's better to use immutable objects. And anyway you will pay with memory allocations/GC for the performance.
UPDATE
Here is version that allows multiple writers as well:
class Foo
{
IReadOnlyList<string> sharedResource = new List<string>();
public void reading()
{
// Here you can safely* read from sharedResource
}
public void mutating()
{
IReadOnlyList<string> referenceToCollectionForCopying;
List<string> copyOfData;
do
{
referenceToCollectionForCopying = Volatile.Read(ref sharedResource);
copyOfData = new List<string>(referenceToCollectionForCopying);
// modify copyOfData here
} while (!ReferenceEquals(Interlocked.CompareExchange(ref sharedResource, copyOfData,
referenceToCollectionForCopying), referenceToCollectionForCopying));
}
}
I have the following scenario:
I'm trying to lock a thread in place, if that threads 'custom' id matches the one that has already entered the locked section off code, but not if the id differs.
I created some sample code to explain the behaviour I want
class A
{
private static Dictionary<int, object> _idLocks = new Dictionary<int, object>();
private static readonly object _DictionaryLock = new object();
private int _id;
private void A (int id)
{
_id = id;
}
private object getObject()
{
lock (_DictionaryLock)
{
if (!_idLocks.ContainsKey(_id))
_idLocks.Add(_id, new object());
}
lock (_idLocks[_id])
{
if (TestObject.Exists(_id))
return TestObject(_id);
else
return CreateTestObject(_id);
}
}
}
Now this works 100% for what I extended, where id example 1 does not check to see if its object has been created while another thread with id 1 is already busy creating that object.
But having two locks and a static dictionary does not seem correct way of doing it at all, so I'm hoping someone can show me an improved method of stopping a thread from accessing code only if that thread was created with the same id as the one already busy executing the code in the locked section.
I was looking at the ReaderWriterLockSlim class but to me it didn't really make sense to be used cause I don't want object TestObject(id) to be read at all while it's still being created.
I don't care about locking the thread from accessing a dictionary.
What I'm trying to avoid at all cost is the _id which that thread runs should not be used inside CreateTestObject(_id) while there is already one busy, because files are being created and deleted with that id which will throw exceptions if two threads are trying to access the same files
Which is fixable with just a normal lock, but in this case I still want a thread whose _id is not currently running inside the CreateTestObject(_id) method to be able to enter the code within the lock.
This is all because what happens inside CreateTestObject takes time and performance will be impacted if a thread is waiting to access it.
It looks like you're using this code to populate a dictionary in a thread-safe manner - could you use a ConcurrentDictionary instead?
class A {
private static ConcurrentDictionary<int, object> _dictionary = new ConcurrentDictionary<int, object>();
private int _id;
private object GetObject() {
object output = null;
if(_dictionary.TryGetValue(_id, output)) {
return output;
} else {
return _dictionary.GetOrAdd(_id, CreateTestObject(_id));
}
}
}
Edit: If you want to completely eliminate the possibility of invoking duplicate CreateTestObject methods then you can store a wrapper in _dictionary that lazily sets object
class Wrapper {
private volatile object _obj = null;
public object GetObj() {
while(_obj == null) {
// spin, or sleep, or whatever
}
return _obj;
}
public void SetObj(object obj) {
_obj = obj;
}
}
class A {
private static ConcurrentDictionary<int, Wrapper> _dictionary = new ConcurrentDictionary<int, Wrapper>();
private int _id;
private object GetObject() {
Wrapper wrapper = null;
if(_dictionary.TryGetValue(_id, wrapper)) {
return wrapper.GetObj();
} else {
Wrapper newWrapper = new Wrapper();
wrapper = _dictionary.GetOrAdd(_id, newWrapper);
if(wrapper == newWrapper) {
wrapper.SetObj(CreateTestObject(_id));
}
return wrapper.GetObj();
}
}
}
Only one thread will be able to put a new Wrapper in _dictionary at the specified _id - that thread will initialize the object inside of the wrapper == newWrapper conditional. Wrapper#GetObj spins until the object is set, this can be rewritten to block instead.
This can't work, because Monitor (which is used internally by the lock statement) is re-entrant. That means that a thread can enter any lock it already owns any number of times.
You could solve this by using a Semaphore instead of a Monitor, but stop for a while and listen to what you're asking - you want the thread to block on a lock owned by that same thread. How is that thread ever going to wake up? It will deadlock forever - waiting for the lock to be released, while also being the one holding the lock.
Or are you just trying to handle lazy initialization of some object without having to block all the other threads? That's actually quite simple:
ConcurrentDictionary<int, YourObject> dictionary;
return dictionary.GetOrAdd(id, i => CreateTestObject(i));
Note that CreateTextObject is called only if the key doesn't exist in the dictionary yet.
I have a static in-memory cache that is written to only once an hour (or longer), and read by many threads at an extremely high rate. Conventional wisdom suggests I follow a pattern such as the following:
public static class MyCache
{
private static IDictionary<int, string> _cache;
private static ReaderWriterLockSlim _sharedLock;
static MyCache()
{
_cache = new Dictionary<int, string>();
_sharedLock = new ReaderWriterLockSlim();
}
public static string GetData(int key)
{
_sharedLock.EnterReadLock();
try
{
string returnValue;
_cache.TryGetValue(key, out returnValue);
return returnValue;
}
finally
{
_sharedLock.ExitReadLock();
}
}
public static void AddData(int key, string data)
{
_sharedLock.EnterWriteLock();
try
{
if (!_cache.ContainsKey(key))
_cache.Add(key, data);
}
finally
{
_sharedLock.ExitWriteLock();
}
}
}
As an excercise in micro-optimization, how can I shave off even more ticks in the relative expense of shared read locks? Time to write can be expensive, since it rarely happens. I need to make reads as fast as possible. Can I just drop the read locks (below) and remain thread-safe in this scenario? Or is there a lock-free version I can use? I'm familiar with memory-fencing but don't know how to safely apply it in this instance.
Note: I'm not tied to either pattern so any suggestions are welcome as long as the end result is faster and in C# 4.x.*
public static class MyCache2
{
private static IDictionary<int, string> _cache;
private static object _fullLock;
static MyCache2()
{
_cache = new Dictionary<int, string>();
_fullLock = new object();
}
public static string GetData(int key)
{
//Note: There is no locking here... Is that ok?
string returnValue;
_cache.TryGetValue(key, out returnValue);
return returnValue;
}
public static void AddData(int key, string data)
{
lock (_fullLock)
{
if (!_cache.ContainsKey(key))
_cache.Add(key, data);
}
}
}
You don't need a lock when there are threads only ever reading from the data structure. So, since writes are so rare (and, I assume, not concurrent), an option might be to make a full copy of the dictionary, make the modifications to the copy, and then atomically exchange the old dictionary with the new one:
public static class MyCache2
{
private static IDictionary<int, string> _cache;
static MyCache2()
{
_cache = new Dictionary<int, string>();
}
public static string GetData(int key)
{
string returnValue;
_cache.TryGetValue(key, out returnValue);
return returnValue;
}
public static void AddData(int key, string data)
{
IDictionary<int, string> clone = Clone(_cache);
if (!clone.ContainsKey(key))
clone.Add(key, data);
Interlocked.Exchange(ref _cache, clone);
}
}
I would be looking to go lock free here, and achieve thread safety by simply not changing any published dictionary. What I mean is: when you need to add data, create a complete copy of the dictionary, and append/update/etc the copy. Since this is once an hour this shouldn't be a problem even for large data. Then, when you have made the changes, simply swap the reference from the old dictionary to the new dictionary (reference reads/writes are guaranteed to be atomic).
One caveat: any code that needs consistent state between multiple operations should capture the dictionary into a variable first, I.e.
var snapshot = someField;
// multiple reads on snapshot
This ensures that any related logic is all made using the same version of the data, to avoid confusion when the reference swaps during the operation.
I would also take a lock when writing (not when reading) to ensure no squabbling over the data. There are lock-free multi-writer approaches too (primarily Interlocked.CompareExchange and reapply if it fails), but I would use the simplest approach first, and a single writer is exactly that.
Alternative option: the .net 1.x Hashtable (essentially Dictionary, minus the generics) has an interesting threading story; the reads are thread safe without locks - you only need to use locks to ensure at most one writer.
So: you might consider using a non-generic Hashtable, no locking on reads, and then take a lock during writes.
This is the main reason I still find myself using Hashtable sometimes, even in .net 4.x applications.
One problem though - it'll cause the int key to be boxed for both storage and query.
This makes a copy of the dictionary only when data is being added. A lock is used for adding but you can take that out if you don't intend to add from more than one thread. If there's no copy then data is pulled from the original dictionary, otherwise the copy is used while adding.
Just in case the copy gets nulled out after it's checked and seen as not null but before it's able to retrieve the value, I added a try catch which in that rare event, it will pull the data from the original which is then locked but again, this should happen very rarely if at all.
public static class MyCache2
{
private static IDictionary<int, string> _cache;
private static IDictionary<int, string> _cacheClone;
private static Object _lock;
static MyCache2()
{
_cache = new Dictionary<int, string>();
_lock = new Object();
}
public static string GetData(int key)
{
string returnValue;
if (_cacheClone == null)
{
_cache.TryGetValue(key, out returnValue);
}
else
{
try
{
_cacheClone.TryGetValue(key, out returnValue);
}
catch
{
lock (_lock)
{
_cache.TryGetValue(key, out returnValue);
}
}
}
return returnValue;
}
public static void AddData(int key, string data)
{
lock (_lock)
{
_cacheClone = Clone(_cache);
if (!_cache.ContainsKey(key))
_cache.Add(key, data);
_cacheClone = null;
}
}
}
You might also look at lock free data structures. http://www.boyet.com/Articles/LockfreeStack.html is a good example
I have the following helper class (simplified):
public static class Cache
{
private static readonly object _syncRoot = new object();
private static Dictionary<Type, string> _lookup = new Dictionary<Type, string>();
public static void Add(Type type, string value)
{
lock (_syncRoot)
{
_lookup.Add(type, value);
}
}
public static string Lookup(Type type)
{
string result;
lock (_syncRoot)
{
_lookup.TryGetValue(type, out result);
}
return result;
}
}
Add will be called roughly 10/100 times in the application and Lookup will be called by many threads, many of thousands of times. What I would like is to get rid of the read lock.
How do you normally get rid of the read lock in this situation?
I have the following ideas:
Require that _lookup is stable before the application starts operation. The could be build up from an Attribute. This is done automatically through the static constructor the attribute is assigned to. Requiring the above would require me to go through all types that could have the attribute and calling RuntimeHelpers.RunClassConstructor which is an expensive operation;
Move to COW semantics.
public static void Add(Type type, string value)
{
lock (_syncRoot)
{
var lookup = new Dictionary<Type, string>(_lookup);
lookup.Add(type, value);
_lookup = lookup;
}
}
(With the lock (_syncRoot) removed in the Lookup method.) The problem with this is that this uses an unnecessary amount of memory (which might not be a problem) and I would probably make _lookup volatile, but I'm not sure how this should be applied. (John Skeets' comment here gives me pause.)
Using ReaderWriterLock. I believe this would make things worse since the region being locked is small.
Suggestions are very welcome.
UPDATE:
The values of the cache are immutable.
To remove locks completely (slightly differnt then "lock free" where locks almost eliminated and remaining are cleverly replaced with Interlocked instructions) you need to make sure that your dictionary is immutable. If items in the dictionary are not immutable (and as result have they own locks) you probably should not worry about locking on dictionary level.
is the best and easiest solution if you can use it.
reasonable and easy to debug. (Note: as written it does not work well for concurrent adding of the same item. Conside double checking locking pattern if needed - Double-checked locking in .NET)
I would not do it if 1/2 is an option.
If you can use new 4.0 collections - ConcurrentDictionary there matches your criteria (see http://msdn.microsoft.com/en-us/library/dd997305.aspx and http://blogs.msdn.com/b/pfxteam/archive/2010/01/26/9953725.aspx).
At work at the moment, so nothing elegant, came up with this (untested)
public static class Cache
{
private static readonly object _syncRoot = new object();
private static Dictionary<Type, string> _lookup = new Dictionary<Type, string>();
public static class OneToManyLocker
{
private static readonly Object WriteLocker = new Object();
private static readonly List<Object> ReadLockers = new List<Object>();
private static readonly Object myLocker = new Object();
public static Object GetLock(LockType lockType)
{
lock(WriteLocker)
{
if(lockType == LockType.Read)
{
var newReadLocker = new Object();
lock(myLocker)
{
ReadLockers.Add(newReadLocker);
}
return newReadLocker;
}
foreach(var readLocker in ReadLockers)
{
lock(readLocker) { }
}
return WriteLocker;
}
}
public enum LockType {Read, Write};
}
public static void Add(Type type, string value)
{
lock(OneToManyLocker.GetLock(OneToManyLocker.LockType.Write))
{
_lookup.Add(type, value);
}
}
public static string Lookup(Type type)
{
string result;
lock (OneToManyLocker.GetLock(OneToManyLocker.LockType.Read))
{
_lookup.TryGetValue(type, out result);
}
return result;
}
}
You will need some sort of cleanup for the read lockers, but should be threadsafe allowing multiple reads at a time while also locking on writes, unless I'm totally missing something
Either:
Dont use normal locks, go spinlock if the lookup is fast (dictionary is not).
If that is not the case, then use http://msdn.microsoft.com/en-us/library/system.threading.readerwriterlock.aspx. This allows multiple readers and only one writer.