Locking based on parameters - c#

Suppose I have this method:
void Foo(int bar)
{
// do stuff
}
Here is the behavior I want Foo to have:
If thread 1 calls Foo(1) and thread 2 calls Foo(2), both threads can run concurrently.
If thread 1 calls Foo(1) and thread 2 calls Foo(1), both threads cannot run concurrently.
Is there a good, standard way in .net to specify this type of behavior? I have a solution that uses a dictionary of objects to lock on, but that feels kind of messy.

Use a dictionary that provides different lock objects for the different arguments. Set up the dictionary when you instantiate the underlying object (or statically, if applicable):
var locks = new Dictionary<int, object>() {
{1, new Object()},
{2, new Object()},
…
};
And then use it inside your method:
void Foo(int bar) {
lock (locks[bar]) {
…
}
}
I wouldn’t say that this solution is messy, on the contrary: providing a fine lock granularity is commendable and since locks on value types don’t work in .NET, having a mapping is the obvious solution.
Be careful though: the above only works as long as the dictionary isn’t concurrently modified and read. It is therefore best to treat the dictionary as read-only after its set-up.

Bottom line: you can't lock on value types.
The dictionary you're using is the best approach I can think of. It's kludgey, but it works.
Personally, I'd pursue an architectural solution that makes the locking unnecessary, but I don't know enough about your system to give you pointers there.

Using Dictionary is not enough, you should use "ConcurrentDictionary" or implement a data structure that supports multi-thread access.

Creating a Dictionary<> so that you can lock on a value seems overkill to me. I got this working using a string. There are people (e.g. Jon Skeet) who do not like this approach (and for valid reasons - see this post: Is it OK to use a string as a lock object?)
But I have a way to mitigate for those concerns: intern the string on the fly and combine it with an unique identifier.
// you should insert your own guid here
string lockIdentifier = "a8ef3042-e866-4667-8673-6e2268d5ab8e";
public void Foo(int bar)
{
lock (string.Intern(string.Format("{0}-{1}", lockIdentifier, bar)))
{
// do stuff
}
}
What happens is that distinct values are stored in a string intern pool (which crosses AppDomain boundaries). Adding lockIdentifier to the string ensures that the string won't conflict with interned strings used in other applications, meaning the lock will only take effect in your own application.
So the intern pool will return a reference to an interned string - this is ok to lock on.

Related

What is a practical application of using an immutable type in a thread-safe way that differs from using a mutable type in the same way?

Consider the following code:
class Program
{
static object locker = new object();
static string data;
static void Main(string[] args)
{
Task.Factory.StartNew(async () =>
{
while(true)
{
await Task.Delay(5000);
string localCopy;
lock (locker)
{
localCopy = data;
}
// do some read operation with localCopy;
// write to log file, call a web API, etc
Log(localCopy);
}
});
while(true)
{
// data is written to from time to time on the main thread;
// can be user input, etc.
string input = Console.ReadLine();
lock(locker)
{
data = input;
}
}
}
}
Since .NET, strings are immutable, and one of the benefits of immutability is thread safety, are the lock statements necessary?
EDIT: I chose an immutable type, string in the above example, just for context; I am generally trying to understand the "thread-safe" property of immutable types, if, based on comments (and my own understanding of things), some sort of lock semantics is still necessary in multi-threaded code when using such types cross-thread.
What is a practical application of using an immutable type in a thread-safe way that differs from using a mutable type in the same way?
As noted in the comments, it's all about the variables.
If you have multiple threads accessing the same variable, then yes, you have to protect the variable in some way (lock, Interlocked, etc).
The benefit of immutable types comes in when you pass that data to another thread - creating another variable. All you need to do is copy the reference from one variable to another, and now the first variable can change however much it wants; the second variable remains immutable.
I think it's a bit easier to understand with an example like ImmutableStack<string>. Let's say there's a "main" thread that pushes and pops that ImmutableStack<string>; since this is immutable, each push/pop updates its own variable. If our "main" thread wants to give another thread a snapshot, it just copies its current variable to another variable for that thread. Then the "main" thread can continue pushing/popping/updating its own variable with impunity. The "secondary" thread has its own immutable snapshot.
In a more general situation, this can be useful with one or more readers/responders, where each "read" loop starts with capturing the current state of the shared variable and using that local copy for the duration of the loop.
If you wanted to snapshot a mutable value, that would require doing a deep clone. Imagine if string was mutable, like it is in other languages. In that case, copying the value (reference) of the string would be insufficient; one thread could change a single character while another thread was trying to do something else with the value. In order to capture a true snapshot of a mutable string value, you'd have to copy the entire string to a new string.
There are other benefits to immutable types in general (design, etc), but this "reference snapshot" benefit is one that specifically benefits multithreading.

Should I use a C# Dictionary if I only need fast lookup of keys, and values are irrelevant?

I am in need of a data type that is able to insert entries and then be able to quickly determine if an entry has already been inserted. A Dictionary seems to suit this need (see example). However, I have no use for the dictionary's values. Should I still use a dictionary or is there another better suited data type?
public class Foo
{
private Dictionary<string, bool> Entities;
...
public void AddEntity(string bar)
{
if (!Entities.ContainsKey(bar))
{
// bool value true here has no use and is just a placeholder
Entities.Add(bar, true);
}
}
public string[] GetEntities()
{
return Entities.Keys.ToArray();
}
}
You can use HashSet<T>.
The HashSet<T> class provides high-performance set operations. A set
is a collection that contains no duplicate elements, and whose
elements are in no particular order.
Habib's answer is excellent, but for multi-threaded environments if you use a HashSet<T> then by consequence you have to use locks to protect access to it. I find myself more prone to creating deadlocks with lock statements. Also, locks yield a worse speedup per Amdahl's law because adding a lock statement reduces the percentage of your code that is actually parallel.
For those reasons, a ConcurrentDictionary<T,object> fits the bill in multi-threaded environments. If you end up using one, then wrap it like you did in your question. Just new up objects to toss in as values as needed, since the values won't be important. You can verify that there are no lock statements in its source code.
If you didn't need mutability of the collection then this would be moot. But your question implies that you do need it, since you have an AddEntity method.
Additional info 2017-05-19 - actually, ConcurrentDictionary does use locks internally, although not lock statements per se--it uses Monitor.Enter (check out the TryAddInternal method). However, it seems to lock on individual buckets within the dictionary, which means there will be less contention than putting the entire thing in a lock statement.
So all in all, ConcurrentDictionary is often better for multithreaded environments.
It's actually quite difficult (impossible?) to make a concurrent hash set using only the Interlocked methods. I tried on my own and kept running into the problem of needing to alter two things at the same time--something that only locking can do in general. One workaround I found was to use singly-linked lists for the hash buckets and intentionally create cycles in a list when one thread needed to operate on a node without interference from other threads; this would cause other threads to get caught spinning around in the same spot until that thread was done with its node and undid the cycle. Sure, it technically didn't use locks, but it did not scale well.

Lock and refresh implementation

I have a key to task mapping and I need to run the task only if the task for the given is not already running. Pseudo code follows. I believe there is lot of scope for improvement. I'm locking on the map and hence almost serializing access to CacheFreshener. Is there a better way of doing this? We know that when I'm trying to lock a key k1, there is no point in cache freshener call for key k2 waiting for lock.
class CacheFreshener
{
private ConcurrentDictionary<string,bool> lockMap;
public RefreshData(string key, Func<string, bool> cacheMissAction)
{
lock(lockMap)
{
if (lockMap.ContainsKey(key))
{
// no-op
return;
}
else
{
lockMap.Add(key, true);
}
}
// if you are here means task is not already present
cacheMissAction(key);
lock(lockMap) // Do we need to lock here??
{
lockMap.Remove(key);
}
}
}
As requested, here is an elaborated explanation of what I was getting at relative to my comments…
The basic issue here seems to be the question of concurrency, i.e. two or more threads accessing the same object at a time. This is the scenario ConcurrentDictionary is designed for. If you use the IDictionary methods of ContainsKey() and Add() separately, then you would need explicit synchronization (but only for that operation…in this particular scenario it wouldn't strictly be needed when calling Remove()) to ensure these are performed as a single atomic operation. But the ConcurrentDictionary class anticipates this need, and includes the TryAdd() method to accomplish the same, without the explicit synchronization.
<aside>
It is not entirely clear to me the intent behind the code example as given. The code appears to be meant to only store an object in the "cache" for the duration of the invocation of the cacheMissAction delegate. The key is removed immediately after. So it does seem like it's not really caching anything per se. It just prevents more than one thread from being in the process of invoking cacheMissAction at a time (subsequent threads will fail to invoke it, but also cannot count on it having completed by the time their call to the RefreshData() method has completed).
</aside>
But taking the code example as given, it's clear that no explicit locking is actually required. The ConcurrentDictionary class already provides thread-safe access (i.e. non-corruption of the data structure when used concurrently from multiple threads), and it provides the TryAdd() method as a mechanism for adding a key (and its value, though here that's just always a bool literal of true) to the dictionary that will ensure that only one thread ever has a key in the dictionary at a time.
So we can rewrite the code to look like this instead and accomplish the same goal:
private ConcurrentDictionary<string,bool> lockMap;
public RefreshData(string key, Func<string, bool> cacheMissAction)
{
if (!lockMap.TryAdd(key, true))
{
return;
}
// if you are here means task was not already present
cacheMissAction(key);
lockMap.Remove(key);
}
No lock statement is needed for either the add or remove, as the TryAdd() handles the entire "check for key and add if not present" operation atomically.
I will note that using a dictionary to do the job of a set could be considered inefficient. If the collection is likely not to be large, it's no big deal, but I do find it odd that Microsoft chose to make the same mistake they made originally when in the pre-generics days you had to use the non-generic dictionary object Hashtable to store a set, before HashSet<T> came along. Now we have all these easy-to-use classes in System.Collections.Concurrent, but no thread-safe implementation of ISet<T> in there. Sigh…
That said, if you do prefer a somewhat more efficient approach in terms of storage (this is not necessarily a faster implementation, depending on the concurrent access patterns of the object), something like this would work as an alternative:
private HashSet<string> lockSet;
private readonly object _lock = new object();
public RefreshData(string key, Func<string, bool> cacheMissAction)
{
lock (_lock)
{
if (!lockSet.Add(key))
{
return;
}
}
// if you are here means task was not already present
cacheMissAction(key);
lock (_lock)
{
lockSet.Remove(key);
}
}
In this case, you do need the lock statement, because the HashSet<T> class is not inherently thread-safe. This is of course very similar to your original implementation, just using the more set-like semantics of HashSet<T> instead.

Lock for ConcurrentDictionary when AddOrUpdate-ing?

I use a ConcurrentDictioanry<string, HashSet<string>> to access some data across many threads.
I read in this article (scroll down) that the method AddOrUpdate is not executed in the lock, so it could endanger thread-safety.
My code is as follows:
//keys and bar are not the concern here
ConcurrentDictioanry<string, HashSet<string>> foo = new ...;
foreach(var key in keys) {
foo.AddOrUpdate(key, new HashSet<string> { bar }, (key, val) => {
val.Add(bar);
return val;
});
}
Should I enclose the AddOrUpdate call in a lock statement in order to be sure everything is thread-safe?
Locking during AddOrUpdate on its own wouldn't help - you'd still have to lock every time you read from the set.
If you're going to treat this collection as thread-safe, you really need the values to be thread-safe too. You need a ConcurrentSet, ideally. Now that doesn't exist within the framework (unless I've missed something) but you could probably create your own ConcurrentSet<T> which used a ConcurrentDictionary<T, int> (or whatever TValue you like) as its underlying data structure. Basically you'd ignore the value within the dictionary, and just treat the presence of the key as the important part.
You don't need to implement everything within ISet<T> - just the bits you actually need.
You'd then create a ConcurrentDictionary<string, ConcurrentSet<string>> in your application code, and you're away - no need for locking.
You'll need to fix this code, it creates a lot of garbage. You create a new HashSet even if none is required. Use the other overload, the one that accepts the valueFactory delegate. So the HashSet is only created when the key isn't yet present in the dictionary.
The valueFactory might be called multiple times if multiple threads concurrently try to add the same value of key and it is not present. Very low odds but not zero. Only one of these hashsets will be used. Not a problem, creating the HashSet has no side effects that could cause threading trouble, the extra copies just get garbage collected.
The article states that the add delegate is not executed in the dictionary's lock, and that the element you get might not be the element created in that thread by the add delegate. That's not a thread safety issue; the dictionary's state will be consistent and all callers will get the same instance, even if a different instance was created for each of them (and all but one get dropped).
Seems the better answer would be to use Lazy, per this article on the methods that pass in a delegate.
Also another good article Here on Lazy loading the add delegate.

Can I lock a collection in the get accessor?

I have a lightly used dictionary which is hardly ever going to be read or updated since the individual items raise events and return their results with their event args. In fact the thread is always going to be updated with the same thread. I was thinking about adding a simple lock just to be safe. I was wondering if I can just place the lock in the get accessor. Does this work?
Dictionary<string,Indicator> indicators = new Dictionary<string,Indicator>();
Dictionary<string, Indicator> Indicators
{
get
{
lock (indicators)
{
return indicators;
}
}
}
public void AddIndicator(Indicator i)
{
lock (indicators)
{
indicators.Add(i.Name, i);
}
}
That doesn't do anything particularly useful, no.
In particular, if you have:
x = foo.Indicators["blah"]
then the indexer will be executed without the thread holding the lock... so it's not thread-safe. Think of the above code like this:
Dictionary<string, Indicator> indicators = foo.Indicators;
// By now, your property getter has completed, and the lock has been released...
x = indicators["blah"];
Do you ever need to do anything with the collection other than access it via the indexer? If not, you might want to just replace the property with a method:
public Indicator GetIndicator(string name)
{
lock (indicators)
{
return indicators[name];
}
}
(You may want to use TryGetValue instead, etc - it depends on what you're trying to achieve.)
Personally I'd prefer to use a reference to a privately-owned-and-otherwise-unused lock object rather than locking on the collection reference, but that's a separate matter.
As mentioned elsewhere, ConcurrentDictionary is your friend if you're using .NET 4, but of course it's not available prior to that :(
Other than Jon's input, I'll say don't lock the collection indicators itself anyway, from MSDN:
Use caution when locking on instances,
for example lock(this) in C# or
SyncLock(Me) in Visual Basic. If other
code in your application, external to
the type, takes a lock on the object,
deadlocks could occur.
It is recommended to use a dedicated object instance to lock onto. There are other places where this is covered with more details and reasons why - even here on SO, should you care to search for the information when you have time.
Alternatively, you could use ConcurrentDictionary which handles the thread safety for you.
Short answer: YES.
Why shouldn't that work, but as mention by Jon, it does not lock as intended when using indexes?

Categories

Resources