C# lock free coding sanity check

C# lock free coding sanity check - c#

UPDATED: now using a read-only collection based on comments below
I believe that the following code should be thread safe "lock free" code, but want to make sure I'm not missing something...
public class ViewModel : INotifyPropertyChanged
{
//INotifyPropertyChanged and other boring stuff goes here...
private volatile List<string> _data;
public IEnumerable<string> Data
{
get { return _data; }
}
//this function is called on a timer and runs on a background thread
private void RefreshData()
{
List<string> newData = ACallToAService();
_data = newData.AsReadOnly();
OnPropertyChanged("Data"); // yes, this dispatches the to UI thread
}
}
Specifically, I know that I could use a lock(_lock) or even an Interlocked.Exchange() but I don't believe that there is a need for it in this case. The volatile keyword should be sufficient (to make sure the value isn't cached), no? Can someone please confirm this, or else let me know what I don't understand about threading :)

I have no idea whether that is "safe" or not; it depends on precisely what you mean by "safe". For example, if you define "safe" as "a consistent ordering of all volatile writes is guaranteed to be observed from all threads", then your program is not guaranteed to be "safe" on all hardware.
The best practice here is to use a lock unless you have an extremely good reason not to. What is your extremely good reason to write this risky code?
UPDATE: My point is that low-lock or no-lock code is extremely risky and that only a small number of people in the world actually understand it. Let me give you an example, from Joe Duffy:
// deeply broken, do not use!
class Singleton {
private static object slock = new object();
private static Singleton instance;
private static bool initialized;
private Singleton() {}
public Instance {
get {
if (!initialized) {
lock (slock) {
if (!initialized) {
instance = new Singleton();
initialized = true;
}
}
}
return instance;
}
}
}
This code is broken; it is perfectly legal for a correct implementation of the C# compiler to write you a program that returns null for the instance. Can you see how? If not, then you have no business doing low-lock or no-lock programming; you will get it wrong.
I can't figure out this stuff myself; it breaks my brain. That's why I try to never do low-lock programming that departs in any way from standard practices that have been analyzed by experts.

It depends on what the intent is. The get/set of the list is atomic (even without volatile) and non-cached (volatile), but callers can mutate the list, which is not guaranteed thread-safe.
There is also a race condition that could lose data:
obj.Data.Add(value);
Here value could easily be discarded.
I would use an immutable (read-only) collection.

I think that if you have only two threads like you described, your code is correct and safe. And also you don't need that volatile, it is useless here.
But please don't call it "thread safe", as it is safe only for your two threads using it your special way.

I believe that this is safe in itself (even without volatile), however there may be issues depending on how other threads use the Data property.
Provided that you can guarantee that all other threads read and cache the value of Data once before doing enumeration on it (and don't try to cast it to some broader interface to perform other operations), and make no consistency assumptions for a second access to the property, then you should be ok. If you can't make that guarantee (and it'd be hard to make that guarantee if eg. one of the users is the framework itself via data-binding, and hence code that you do not control), then you can't say that it's safe.
For example, this would be safe:
foreach (var item in x.Data)
{
// do something with item
}
And this would be safe (provided that the JIT isn't allowed to optimise away the local, which I think is the case):
var data = x.Data;
var item1 = FindItem(data, a);
var item2 = FindItem(data, b);
DoSomething(item1, item2);
The above two might act on stale data, but it will always be consistent data. But this would not necessarily be safe:
var item1 = FindItem(x.Data, a);
var item2 = FindItem(x.Data, b);
DoSomething(item1, item2);
This one could possibly be searching two different states of the collection (before and after some thread replaces it), so it may not be safe to operate on items found in each separate enumeration, as they may not be consistent with each other.
The issue would be worse with a broader interface; eg. if Data exposed IList<T> you'd have to watch for consistency of Count and indexer operations as well.

Related

Is it safe to use Volatile.Read combined with Interlocked.Exchange for concurrently accessing a shared memory location from multiple threads in .NET?

Experts on threading/concurrency/memory model in .NET, could you verify that the following code is correct under all circumstances (that is, regardless of OS, .NET runtime, CPU architecture, etc.)?
class SomeClassWhoseInstancesAreAccessedConcurrently
{
private Strategy _strategy;
public SomeClassWhoseInstancesAreAccessedConcurrently()
{
_strategy = new SomeStrategy();
}
public void DoSomething()
{
Volatile.Read(ref _strategy).DoSomething();
}
public void ChangeStrategy()
{
Interlocked.Exchange(ref _strategy, new AnotherStrategy());
}
}
This pattern comes up pretty frequently. We have an object which is used concurrently by multiple threads and at some point the value of one of its fields needs to be changed. We want to guarantee that from that point on every access to that field coming from any thread observe the new value.
Considering the example above, we want to make sure that after the point in time when ChangeStrategy is executed, it can't happen that SomeStrategy.DoSomething is called instead of AnotherStrategy.DoSomething because some of the threads don't observe the change and use the old value cached in a register/CPU cache/whatever.
To my knowledge of the topic, we need at least volatile read to prevent such caching. The main question is that is it enough or we need Interlocked.CompareExchange(ref _strategy, null, null) instead to achieve the correct behavior?
If volatile read is enough, a further question arises: do we need Interlocked.Exchange at all or even volatile write would be ok in this case?
As I understand, volatile reads/writes use half-fences which allows a write followed by a read reordered, whose implications I still can't fully understand, to be honest. However, as per ECMA 335 specification, section I.12.6.5, "The class library provides a variety of atomic operations in the
System.Threading.Interlocked class. These operations (e.g., Increment, Decrement, Exchange,
and CompareExchange) perform implicit acquire/release operations." So, if I understand this correctly, Interlocked.Exchange should create a full-fence, which looks enough.
But, to complicate things further, it seems that not all Interlocked operations were implemented according to the specification on every platform.
I'd be very grateful if someone could clear this up.

Yes, your code is safe. It is functionally equivalent with using a lock like this:
public void DoSomething()
{
Strategy strategy;
lock (_locker) strategy = _strategy;
strategy.DoSomething();
}
public void ChangeStrategy()
{
Strategy strategy = new AnotherStrategy();
lock (_locker) _strategy = strategy;
}
Your code is more performant though, because the lock imposes a full fence, while the Volatile.Read imposes a potentially cheaper half fence.
You could improve the performance even more by replacing the Interlocked.Exchange (full fence) with a Volatile.Write (half fence). The only reason to prefer the Interlocked.Exchange over the Volatile.Write is when you want to retrieve the previous strategy as an atomic operation. Apparently this is not needed in your case.
For simplicity you could even get rid of the Volatile.Write/Volatile.Read calls, and just declare the _strategy field as volatile.

Proper way to synchronize a property's value in a multi-threaded application

I've recently started revisiting some of my old multi-threaded code and wondering if it's all safe and correct (No issues in production yet...). In particular am I handling object references correctly? I've read a ton of examples using simple primitives like integers, but not a lot pertaining to references and any possible nuances.
First, I recently learned that object reference assignments are atomic, at least on a 64 bit machine which is all I'm focused on for this particular application. Previously, I was locking class properties' get/sets to avoid corrupting the reference as I didn't realize reference assignments were atomic.
For example:
// Immutable collection of options for a Contact
public class ContactOptions
{
public string Email { get; }
public string PhoneNumber { get; }
}
// Sample class that implements the Options
public class Contact
{
private readonly object OptionsLock = new object();
private ContactOptions _Options;
public ContactOptions Options { get { lock(OptionsLock) { return _Options; } }
set { lock(OptionsLock) { _Options = value; } } };
}
Now that I know that a reference assignment is atomic, I thought "great, time to remove these ugly and unnecessary locks!"
Then I read further and learned of synchronization of memory between threads. Now I'm back to keeping the locks to ensure the data doesn't go stale when accessing it. For example, if I access a Contact's Options, I want to ensure I'm always receiving the latest set of Options assigned.
Questions:
Correct me if I'm wrong here, but the above code does ensure that I'm achieving the goal of getting the latest value of Options when I get it in a thread safe manner? Any other issues using this method?
I believe there is some overhead with the lock (Converts to Monitor.Enter/Exit). I thought I could use Interlocked for a nominal performance gain, but more importantly to me, a cleaner set of code. Would the following work to achieve synchronization?
private ContactOptions _Options;
public ContactOptions Options {
get { return Interlocked.CompareExchange(ref _Options, null, null); }
set { Interlocked.Exchange(ref _Options, value); } }
Since a reference assignment is atomic, is the synchronization (using either lock or Interlocked) necessary when assigning the reference? If I omit the set logic and only maintain the get, will I still maintain atomicity and synchronization? My hopeful thinking is that the lock/Interlock usage in the get would provide the synchronization I'm looking for. I've tried writing sample programs to force stale value scenarios, but I couldn't get it done reliably.
private ContactOptions _Options;
public ContactOptions Options {
get { return Interlocked.CompareExchange(ref _Options, null, null); }
set { _Options = value; } }
Side Notes:
The ContactOptions class is deliberately immutable as I don't want to have to synchronize or worry about atomicity within the options themselves. They may contain any kind of data type, so I think it's a lot cleaner/safer to assign a new set of Options when a change is necessary.
I'm familiar of the non-atomic implications of getting a value, working with that value, then setting the value. Consider the following snippet:
public class SomeInteger
{
private readonly object ValueLock = new object();
private int _Value;
public int Value { get { lock(ValueLock) { return _Value; } }
private set { lock(ValueLock) { _Value = value; } } };
// WRONG
public void manipulateBad()
{
Value++;
}
// OK
public void manipulateOk()
{
lock (ValueLock)
{
Value++;
// Or, even better: _Value++; // And remove the lock around the setter
}
}
}
Point being, I'm really only focused on the memory synchronization issue.
SOLUTION:
I went with the Volatile.Read and Volatile.Write methods as they do make the code more explicit, they're cleaner than Interlocked and lock, and they're faster than that aforementioned.
// Sample class that implements the Options
public class Contact
{
public ContactOptions Options { get { return Volatile.Read(ref _Options); } set { Volatile.Write(ref _Options, value); } }
private ContactOptions _Options;
}

Yes, the lock (OptionsLock) ensures that all threads will see the latest value of the Options, because memory barriers are inserted when entering and exiting the lock.
Replacing the lock with methods of the Interlocked or the Volatile class would serve equally well the latest-value-visibility goal. Memory barriers are inserted by these methods as well. I think that using the Volatile communicates better the intentions of the code:
public ContactOptions Options
{
get => Volatile.Read(ref _Options);
set => Volatile.Write(ref _Options, value);
}
Omitting the barrier in either the get or the set accessor puts you automatically in the big black forest of memory models, cache coherency protocols and CPU architectures. In order to know if it's safe to omit it, intricate knowledge of the targeted hardware/OS configuration is required. You will need either an expert's advice, or to become an expert yourself. If you prefer to stay in the realm of software development, don't omit the barrier!

Correct me if I'm wrong here, but the above code does ensure that I'm achieving the goal of getting the latest value of Options when I get it in a thread safe manner? Any other issues using this method?
Yes, locks will emit memory barriers, so it will ensure the value is read from memory. There are no real issues other than potentially being more conservative than it has to be. But I have a saying, if in doubt, use a lock.
I believe there is some overhead with the lock (Converts to Monitor.Enter/Exit). I thought I could use Interlocked for a nominal performance gain, but more importantly to me, a cleaner set of code. Would the following work to achieve synchronization?
Interlocked should also emit memory barriers, so I would think this should do more or less the same thing.
Since a reference assignment is atomic, is the synchronization (using either lock or Interlocked) necessary when assigning the reference? If I omit the set logic and only maintain the get, will I still maintain atomicity and synchronization? My hopeful thinking is that the lock/Interlock usage in the get would provide the synchronization I'm looking for. I've tried writing sample programs to force stale value scenarios, but I couldn't get it done reliably.
I would think that just making the field volatile should be sufficient in this scenario. As far as I understand it the problem with "stale values" is somewhat exaggerated, the cache coherency protocols should take care of most issues.
To my knowledge, the main problem is preventing the compiler from just put the value in a register and not do any subsequent load at all. And that volatile should prevent this, forcing the compiler to issue a load each time it is read. But this would mostly be an issue when repeatedly checking a value in a loop.
But it is not very useful to look at just a single property. Problems more often crop up when you have multiple values that needs to be synchronized. A potential issue is reordering instructions by the compiler or processor. Locks & memory barriers prevent such reordering, but if that is a potential issue, it is probably better to lock a larger section of code.
Overall I consider it prudent to be paranoid when dealing with multiple threads. It is probably better to use to much synchronization than to little. One exception would be deadlocks that may be caused by having too many locks. My recommendation regarding this is to be very careful what you are calling when holding a lock. Ideally a lock should only be held for a short, predictable, time.
Also keep using pure functions and immutable data structures. These are a great way to avoid worrying about threading issues.

Does a multithreaded write-once, read-many need a volatile?

Here is the scenario. I've got a class that will be accessed by multiple threads (ASP.NET) that can benefit from storing a result in a write-once, read-many cache. This cached object is the result of an operation that cannot be performed as part of a static initializer, but must wait for the first execution. So I implement a simple null check as seen below. I'm aware that if two threads hit this check at the same moment I will have ExpensiveCalculation calculated twice, but that isn't the end of the world. My question is, do I need to worry about the static _cachedResult still being seen as null by other threads due to optimizations or other thread caching. Once written, the object is only ever read so I don't think full-scale locking is needed.
public class Bippi
{
private static ExpensiveCalculation _cachedResult;
public int DoSomething(Something arg)
{
// calculate only once. recalculating is not harmful, just wastes time.
if (_cachedResult == null);
_cachedResult = new ExpensiveCalculation(arg);
// additional work with both arg and the results of the precalculated
// values of _cachedResult.A, _cachedResult.B, and _cachedResult.C
int someResult = _cachedResult.A + _cachedResult.B + _cachedResult.C + arg.ChangableProp;
return someResult;
}
}
public class ExpensiveCalculation
{
public int A { get; private set; }
public int B { get; private set; }
public int C { get; private set; }
public ExpensiveCalculation(Something arg)
{
// arg is used to calculate A, B, and C
}
}
Additional notes, this is in a .NET 4.0 application.

My question is, do I need to worry about the static _cachedResult still being seen as null by other threads due to optimizations or other thread caching.
Yes, you do. That's one of the primary reasons volatile exists.
And it's worth mentioning that uncontested locks add an entirely negligible performance cost, so there's really no reason to just to just lock the null check and resource generation, as it's almost certainly not going to cause any performance problems, and makes the program much easier to reason about.
And the best solution is to avoid the question entirely and use a higher level of abstraction that is specifically designed to solve the exact problem that you have. In this case, that means Lazy. You can create a Lazy object that defines how to create your expensive resource, access it wherever you need the object, and the Lazy implementation becomes responsible for ensuring that the resource is created no more than once, and that it is properly exposed to the code asking for said resource, and that it is handled efficiently.

You need not need volatile, you - especially - need a memory barrier so that the processor caches synchronize.

I think you can altogether optimistically avoid locking, and yet avoid volatile performance penalties. You can test for nullability in a two-step fashion.
object readonly _cachedResultLock = new object();
...
if (_cachedResult == null)
{
lock(_cachedResultLock)
{
if (_cachedResult == null)
{
_cachedResult = new ExpensiveCalculation(arg);
}
}
}
Here most of the time you will not reach lock and will not serialize access. You may serialize access only on first access - but will guarantee that work is not wasted (though may cause another thread to wait a bit while first one finishes ExpensiveCalculation).

Can I lock a collection in the get accessor?

I have a lightly used dictionary which is hardly ever going to be read or updated since the individual items raise events and return their results with their event args. In fact the thread is always going to be updated with the same thread. I was thinking about adding a simple lock just to be safe. I was wondering if I can just place the lock in the get accessor. Does this work?
Dictionary<string,Indicator> indicators = new Dictionary<string,Indicator>();
Dictionary<string, Indicator> Indicators
{
get
{
lock (indicators)
{
return indicators;
}
}
}
public void AddIndicator(Indicator i)
{
lock (indicators)
{
indicators.Add(i.Name, i);
}
}

That doesn't do anything particularly useful, no.
In particular, if you have:
x = foo.Indicators["blah"]
then the indexer will be executed without the thread holding the lock... so it's not thread-safe. Think of the above code like this:
Dictionary<string, Indicator> indicators = foo.Indicators;
// By now, your property getter has completed, and the lock has been released...
x = indicators["blah"];
Do you ever need to do anything with the collection other than access it via the indexer? If not, you might want to just replace the property with a method:
public Indicator GetIndicator(string name)
{
lock (indicators)
{
return indicators[name];
}
}
(You may want to use TryGetValue instead, etc - it depends on what you're trying to achieve.)
Personally I'd prefer to use a reference to a privately-owned-and-otherwise-unused lock object rather than locking on the collection reference, but that's a separate matter.
As mentioned elsewhere, ConcurrentDictionary is your friend if you're using .NET 4, but of course it's not available prior to that :(

Other than Jon's input, I'll say don't lock the collection indicators itself anyway, from MSDN:
Use caution when locking on instances,
for example lock(this) in C# or
SyncLock(Me) in Visual Basic. If other
code in your application, external to
the type, takes a lock on the object,
deadlocks could occur.
It is recommended to use a dedicated object instance to lock onto. There are other places where this is covered with more details and reasons why - even here on SO, should you care to search for the information when you have time.

Alternatively, you could use ConcurrentDictionary which handles the thread safety for you.

Short answer: YES.
Why shouldn't that work, but as mention by Jon, it does not lock as intended when using indexes?

thread safety question

I know we need to take care of thread safety for static member variables inside the class. Do we need to worry about the instance member variables?

It depends on whether you want your type to be thread-safe... and what you mean by that.
Most of the time I think it's entirely reasonable to document that the type isn't thread-safe, but can be used safely from different threads with appropriate synchronization. Most .NET types fall into this category.
That would you can usually make sure that only "coordinating" objects need to worry about synchronization, rather than putting a lock in every method and property - a strategy which is painful, and doesn't really address the wider synchronization issues you're likely to run into anyway.
Of course, types which will naturally be used from multiple threads - ones specifically design to enable concurrency, or service locators etc, should be thread-safe - and be documented so. Likewise fully immutable types are naturally thread-safe to start with.
Finally, there's the matter of what counts as "thread-safe" to start with. You should read Eric Lippert's blog post on the matter to clarify what sort of thing you should be thinking about and documenting.

Yes you should care because the same class instance method could be passed as callback to multiple threads. Example:
var instance = new Foo();
ThreadPool.QueueUserWorkItem(instance.SomeInstanceMethod);
ThreadPool.QueueUserWorkItem(instance.SomeInstanceMethod);
The instance method now needs to be synchronized because in this case the shared state is the instance itself.

Consider the following code:
public void Execute()
{
Task.Factory.StartNew(Iterrate);
Task.Factory.StartNew(Add);
}
private List<int> _list = Enumerable.Range(1, 10).ToList();
private void Iterrate()
{
foreach (var item in _list)
{
Console.WriteLine(item);
}
}
private void Add()
{
_list.Add(_list.Count);
}
This code will result (most of the times):
InvalidOperationException: Collection was modified; enumeration operation may not execute.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.