The System.Threading.Interlocked object allows for Addition (subtraction) and comparison as an atomic operation. It seems that a CompareExchange that just doesn't do equality but also GreaterThan/LessThan as an atomic comparison would be quite valuable.
Would a hypothetical Interlocked.GreaterThan a feature of the IL or is it a CPU-level feature? Both?
Lacking any other option, is it possible to create such a feature in C++ or direct IL code and expose that functionality to C#?
You can build other atomic operations out of InterlockedCompareExchange.
public static bool InterlockedExchangeIfGreaterThan(ref int location, int comparison, int newValue)
{
int initialValue;
do
{
initialValue = location;
if (initialValue >= comparison) return false;
}
while (System.Threading.Interlocked.CompareExchange(ref location, newValue, initialValue) != initialValue);
return true;
}
With these helper methods you can not only exchange value but also detect was it replaced or not.
Usage looks like this:
int currentMin = 10; // can be changed from other thread at any moment
int potentialNewMin = 8;
if (InterlockedExtension.AssignIfNewValueSmaller(ref currentMin, potentialNewMin))
{
Console.WriteLine("New minimum: " + potentialNewMin);
}
And here are methods:
public static class InterlockedExtension
{
public static bool AssignIfNewValueSmaller(ref int target, int newValue)
{
int snapshot;
bool stillLess;
do
{
snapshot = target;
stillLess = newValue < snapshot;
} while (stillLess && Interlocked.CompareExchange(ref target, newValue, snapshot) != snapshot);
return stillLess;
}
public static bool AssignIfNewValueBigger(ref int target, int newValue)
{
int snapshot;
bool stillMore;
do
{
snapshot = target;
stillMore = newValue > snapshot;
} while (stillMore && Interlocked.CompareExchange(ref target, newValue, snapshot) != snapshot);
return stillMore;
}
}
Update to the later post I made here: we found a better way to make the greater comparison by using additional lock object. We wrote many unit tests in order to validate that a lock and Interlocked can be used together, but only for some cases.
How the code works: Interlocked uses memory barriers that a read or write is atomic. The sync-lock is needed to make the greater-than comparison an atomic operation. So the rule now is that inside this class no other operation writes the value without this sync lock.
What we get with this class is an interlocked value which can be read very fast, but write takes a little bit more. Read is about 2-4 times faster in our application.
Here the code as view:
See here: http://files.thekieners.com/blogcontent/2012/ExchangeIfGreaterThan2.png
Here as code to copy&paste:
public sealed class InterlockedValue
{
private long _myValue;
private readonly object _syncObj = new object();
public long ReadValue()
{
// reading of value (99.9% case in app) will not use lock-object,
// since this is too much overhead in our highly multithreaded app.
return Interlocked.Read(ref _myValue);
}
public bool SetValueIfGreaterThan(long value)
{
// sync Exchange access to _myValue, since a secure greater-than comparisons is needed
lock (_syncObj)
{
// greather than condition
if (value > Interlocked.Read(ref _myValue))
{
// now we can set value savely to _myValue.
Interlocked.Exchange(ref _myValue, value);
return true;
}
return false;
}
}
}
This isn't actually true, but it is useful to think of concurrency as coming in 2 forms:
Lock free concurrency
Lock based concurrency
It's not true because software lock based concurrency ends up being implemented using lock free atomic instructions somewhere on the stack (often in the Kernel). Lock free atomic instructions, however, all ultimately end up acquiring a hardware lock on the memory bus. So, in reality, lock free concurrency and lock based concurrency are the same.
But conceptually, at the level of a user application, they are 2 distinct ways of doing things.
Lock based concurrency is based on the idea of "locking" access to a critical section of code. When one thread has "locked" a critical section, no other thread may have code running inside that same critical section. This is usually done by the use of "mutexes", which interface with the os scheduler and cause threads to become un-runnable while waiting to enter a locked critical section. The other approach is to use "spin locks" which cause a thread to spin in a loop, doing nothing useful, until the critical section becomes available.
Lock free concurrency is based on the idea of using atomic instructions (specially supported by the CPU), that are guaranteed by hardware to run atomically. Interlocked.Increment is a good example of lock free concurrency. It just calls special CPU instructions that do an atomic increment.
Lock free concurrency is hard. It gets particularly hard as the length and complexity of critical sections increase. Any step in a critical section can be simultaneously executed by any number of threads at once, and they can move at wildly different speeds. You have to make sure that despite that, the results of a system as a whole remain correct. For something like an increment, it can be simple (the cs is just one instruction). For more complex critical sections, things can get very complex very quickly.
Lock based concurrency is also hard, but not quite as hard as lock free concurrency. It allows you to create arbitrarily complex regions of code and know that only 1 thread is executing it at any time.
Lock free concurrency has one big advantage, however: speed. When used correctly it can be orders of magnitude faster than lock based concurrency. Spin loops are bad for long running critical sections because they waste CPU resources doing nothing. Mutexes can be bad for small critical sections because they introduce a lot of overhead. They involve a mode switch at a minimum, and multiple context switches in the worst case.
Consider implementing the Managed heap. Calling into the OS everytime "new" is called would be horrible. It would destroy the performance of your app. However, using lock free concurrency it's possible to implement gen 0 memory allocations using an interlocked increment (I don't know for sure if that's what the CLR does, but I'd be surprised if it wasn't. That can be a HUGE savings.
There are other uses, such as in lock free data structures, like persistent stacks and avl trees. They usually use "cas" (compare and swap).
The reason, however, that locked based concurrency and lock free concurrency are really equivalent is because of the implementation details of each.
Spin locks usually use atomic instructions (typically cas) in their loop condition. Mutexes need to use either spin locks or atomic updates of internal kernel structures in their implementation.
The atomic instructions are in turn implemented using hardware locks.
In any case, they both have their sets of trade offs, usually centering on perf vs complexity. Mutexes can be both faster and slower than lock free code. Lock free code can be both more and less complex than a mutex. The appropriate mechanism to use depends on specific circumstances.
Now, to answer your question:
A method that did an interlocked compare exchange if less than would imply to callers that it is not using locks. You can't implement it with a single instruction the same way increment or compare exchange can be done. You could simulate it doing a subtraction (to compute less than), with an interlocked compare exchange in a loop. You can also do it with a mutex (but that would imply a lock and so using "interlocked" in the name would be misleading). Is it appropriate to build the "simulated interlocked via cas" version? That depends. If the code is called very frequently, and has very little thread contention then the answer is yes. If not, you can turn a O(1) operation with moderately high constant factors into an infinite (or very long) loop, in which case it would be better to use a mutex.
Most of the time it's not worth it.
What do you think about this implementation:
// this is a Interlocked.ExchangeIfGreaterThan implementation
private static void ExchangeIfGreaterThan(ref long location, long value)
{
// read
long current = Interlocked.Read(ref location);
// compare
while (current < value)
{
// set
var previous = Interlocked.CompareExchange(ref location, value, current);
// if another thread has set a greater value, we can break
// or if previous value is current value, then no other thread has it changed in between
if (previous == current || previous >= value) // note: most commmon case first
break;
// for all other cases, we need another run (read value, compare, set)
current = Interlocked.Read(ref location);
}
}
All interlocked operations have direct support in the hardware.
Interlocked operations and atomic data type are different things.. Atomic type is a library level feature. On some platforms and for some data types atomics are implemented using interlocked instructions. In this case they are very effective.
In other cases when platform does not have interlocked operations at all or they are not available for some particular data type, library implements these operations using appropriate synchronization (crit_sect, mutex, etc).
I am not sure if Interlocked.GreaterThan is really needed. Otherwise it might be already implemented. If your know good example where it can be useful, I am sure that everybody here will be happy to hear this.
Greater/Less than and equal to are already atomic operations. That doesn't address the safe concurrent behavior of your application tho.
There is no point in making them part of the Interlocked family, so the question is: what are you actually trying to achieve?
Related
Experts on threading/concurrency/memory model in .NET, could you verify that the following code is correct under all circumstances (that is, regardless of OS, .NET runtime, CPU architecture, etc.)?
class SomeClassWhoseInstancesAreAccessedConcurrently
{
private Strategy _strategy;
public SomeClassWhoseInstancesAreAccessedConcurrently()
{
_strategy = new SomeStrategy();
}
public void DoSomething()
{
Volatile.Read(ref _strategy).DoSomething();
}
public void ChangeStrategy()
{
Interlocked.Exchange(ref _strategy, new AnotherStrategy());
}
}
This pattern comes up pretty frequently. We have an object which is used concurrently by multiple threads and at some point the value of one of its fields needs to be changed. We want to guarantee that from that point on every access to that field coming from any thread observe the new value.
Considering the example above, we want to make sure that after the point in time when ChangeStrategy is executed, it can't happen that SomeStrategy.DoSomething is called instead of AnotherStrategy.DoSomething because some of the threads don't observe the change and use the old value cached in a register/CPU cache/whatever.
To my knowledge of the topic, we need at least volatile read to prevent such caching. The main question is that is it enough or we need Interlocked.CompareExchange(ref _strategy, null, null) instead to achieve the correct behavior?
If volatile read is enough, a further question arises: do we need Interlocked.Exchange at all or even volatile write would be ok in this case?
As I understand, volatile reads/writes use half-fences which allows a write followed by a read reordered, whose implications I still can't fully understand, to be honest. However, as per ECMA 335 specification, section I.12.6.5, "The class library provides a variety of atomic operations in the
System.Threading.Interlocked class. These operations (e.g., Increment, Decrement, Exchange,
and CompareExchange) perform implicit acquire/release operations." So, if I understand this correctly, Interlocked.Exchange should create a full-fence, which looks enough.
But, to complicate things further, it seems that not all Interlocked operations were implemented according to the specification on every platform.
I'd be very grateful if someone could clear this up.
Yes, your code is safe. It is functionally equivalent with using a lock like this:
public void DoSomething()
{
Strategy strategy;
lock (_locker) strategy = _strategy;
strategy.DoSomething();
}
public void ChangeStrategy()
{
Strategy strategy = new AnotherStrategy();
lock (_locker) _strategy = strategy;
}
Your code is more performant though, because the lock imposes a full fence, while the Volatile.Read imposes a potentially cheaper half fence.
You could improve the performance even more by replacing the Interlocked.Exchange (full fence) with a Volatile.Write (half fence). The only reason to prefer the Interlocked.Exchange over the Volatile.Write is when you want to retrieve the previous strategy as an atomic operation. Apparently this is not needed in your case.
For simplicity you could even get rid of the Volatile.Write/Volatile.Read calls, and just declare the _strategy field as volatile.
Considering the following example:
private int sharedState = 0;
private void FirstThread() {
Volatile.Write(ref sharedState, 1);
}
private void SecondThread() {
int sharedStateSnapshot = Volatile.Read(ref sharedState);
Console.WriteLine(sharedStateSnapshot);
}
Until recently, I was under the impression that, as long as FirstThread() really did execute before SecondThread(), this program could not output anything but 1.
However, my understanding now is that:
Volatile.Write() emits a release fence. This means no preceding load or store (in program order) may happen after the assignment of 1 to sharedState.
Volatile.Read() emits an acquire fence. This means no subsequent load or store (in program order) may happen before the copying of sharedState to sharedStateSnapshot.
Or, to put it another way:
When sharedState is actually released to all processor cores, everything preceding that write will also be released, and,
When the value in the address sharedStateSnapshot is acquired; sharedState must have been already acquired.
If my understanding is therefore correct, then there is nothing to prevent the acquisition of sharedState being 'stale', if the write in FirstThread() has not already been released.
If this is true, how can we actually ensure (assuming the weakest processor memory model, such as ARM or Alpha), that the program will always print 1? (Or have I made an error in my mental model somewhere?)
Your understanding is correct, and it is true that you cannot ensure that the program will always print 1 using these techniques. To ensure your program will print 1, assuming thread 2 runs after thread one, you need two fences on each thread.
The easiest way to achieve that is using the lock keyword:
private int sharedState = 0;
private readonly object locker = new object();
private void FirstThread()
{
lock (locker)
{
sharedState = 1;
}
}
private void SecondThread()
{
int sharedStateSnapshot;
lock (locker)
{
sharedStateSnapshot = sharedState;
}
Console.WriteLine(sharedStateSnapshot);
}
I'd like to quote Eric Lippert:
Frankly, I discourage you from ever making a volatile field. Volatile fields are a sign that you are doing something downright crazy: you're attempting to read and write the same value on two different threads without putting a lock in place.
The same applies to calling Volatile.Read and Volatile.Write. In fact, they are even worse than volatile fields, since they require you to do manually what the volatile modifier does automatically.
You're right, there's no guarantee that release stores will be immediately visible to all processors. Volatile.Read and Volatile.Write give you acquire/release semantics, but no immediacy guarantees.
The volatile modifier seems to do this though. The compiler will emit an OpCodes.Volatile IL instruction, and the jitter will tell the processor not to store the variable on any of its registers (see Hans Passant's answer).
But why do you need it to be immediate anyway? What if your SecondThread happens to run a couple of milliseconds sooner, before the values are actually wrote? Seeing as the scheduling is non-deterministic, the correctness of your program shouldn't depend on this "immediacy" anyway.
Until recently, I was under the impression that, as long as
FirstThread() really did execute before SecondThread(), this program
could not output anything but 1.
As you go on to explain yourself, this impression is wrong. Volatile.Read simply issues a read operation on its target followed by a memory barrier; the memory barrier prevents operation reordering on the processor executing the current thread but this does not help here because
There are no operations to reorder (just the single read or write in each thread).
The race condition across your threads means that even if the no-reorder guarantee applied across processors, it would simply mean that the order of operations which you cannot predict anyway would be preserved.
If my understanding is therefore correct, then there is nothing to
prevent the acquisition of sharedState being 'stale', if the write in
FirstThread() has not already been released.
That is correct. In essence you are using a tool designed to help with weak memory models against a possible problem caused by a race condition. The tool won't help you because that's not what it does.
If this is true, how can we actually ensure (assuming the weakest
processor memory model, such as ARM or Alpha), that the program will
always print 1? (Or have I made an error in my mental model
somewhere?)
To stress once again: the memory model is not the problem here. To ensure that your program will always print 1 you need to do two things:
Provide explicit thread synchronization that guarantees the write will happen before the read (in the simplest case, SecondThread can use a spin lock on a flag which FirstThread uses to signal it's done).
Ensure that SecondThread will not read a stale value. You can do this trivially by marking sharedState as volatile -- while this keyword has deservedly gotten much flak, it was designed explicitly for such use cases.
So in the simplest case you could for example have:
private volatile int sharedState = 0;
private volatile bool spinLock = false;
private void FirstThread()
{
sharedState = 1;
// ensure lock is released after the shared state write!
Volatile.Write(ref spinLock, true);
}
private void SecondThread()
{
SpinWait.SpinUntil(() => spinLock);
Console.WriteLine(sharedState);
}
Assuming no other writes to the two fields, this program is guaranteed to output nothing other than 1.
This code snippet is from ConcurrentQueue implementation given from here.
internal bool TryPeek(out T result)
{
result = default(T);
int lowLocal = Low;
if (lowLocal > High)
return false;
SpinWait spin = new SpinWait();
while (m_state[lowLocal] == 0)
{
spin.SpinOnce();
}
result = m_array[lowLocal];
return true;
}
Is it really lock-free instead of spinning?
Spinning is a lock. This is stated in MSDN, Wikipedia and many other resources.
It's not about word. Lock-free is a guarantee. It doesn't mean that the code shouldn't use lock statement. Algorithm is lock-free if there is guaranteed system-wide progress. I don't see any difference between this code and the code using locks. The only difference is that the spin uses busy wait and thread yielding instead of putting thread in a sleep mode.
I don't see how this guarantees system-wide process, so personally I think that this is not a lock-free implementation. At least not this function.
Lock free means not using locks. Spinwaiting is not locking. There are a number of methods of synchronizing access to data without using locks. Performing spin waits is one (of many) options. Not all lock-free code will use spin-waits.
Spinning places the CPU in a tight loop without yielding the rest of it's current slice of processor time, avoiding problems that a user-provided loop may create. This can be useful if it is known that the state change is imminent. It is rare for this to be the best option for ordinary code, and represents an alternative to locking for this specialized situation.
So yes, the code is lock-free as the term lock is used in the .NET Framework.
http://msdn.microsoft.com/en-us/library/hh228603.aspx
Recently I have seen this code in a WebSite, and my question is the following:
private bool mbTestFinished = false;
private bool IsFinished()
{
lock( mLock )
{
return mbTestFinished;
}
}
internal void SetFinished()
{
lock( mLock )
{
mbTestFinished = true;
}
}
In a multithread environment, is really necessary to lock the access to the mbTestFinished?
Yes, it is needed. .Net environment uses some optimizations, and sometimes if a memory location is accessed frequently, data is moved into CPU registers. So, in this case, if mbTestFinished is in a CPU register, then a thread reading it may get a wrong value. Thus using the volatile key ensures, all accesses to this variable is done at the memory location, not at the registers.
On the otherhand, I have no idea of the frequency of this occurence. This may occur at a very low frequency.
In my opinion, no, the lock is superfluous here, for two reasons:
Boolean variables cannot cause assignment tearing like long for example, hence locking is not necessary.
To solve the visibility problem volatile is enough. It's true that the lock introduces an implicit fence, but since the lock is not required for atomicity, volatile is enough.
If mLock is ONLY for the variable mbTestFinished, then it's a bit of an overkill. Instead, you can use volatile or Interlocked, because both are User-Mode constructs for thread synchronization. lock (or Monitor) is a Hybrid Construct, in the sense that it is well optimized to avoid transiting from/to the Kernel-Mode whenever possible. The book "CLR via C#" has a in-depth discussion of these concepts.
Simplified illustration below, how does .NET deal with such a situation?
and if it would cause problems, would i have to lock/gate access to each and every field/property that might at times be written to + accessed from different threads?
A field somewhere
public class CrossRoads(){
public int _timeouts;
}
A background thread writer
public void TimeIsUp(CrossRoads crossRoads){
crossRoads._timeouts++;
}
Possibly at the same time, trying to read elsewhere
public void HowManyTimeOuts(CrossRoads crossRoads){
int timeOuts = crossRoads._timeouts;
}
The simple answer is that the above code has the ability to cause problems if accessed simultaneously from multiple threads.
The .Net framework provides two solutions: interlocking and thread synchronization.
For simple data type manipulation (i.e. ints), interlocking using the Interlocked class will work correctly and is the recommended approach.
In fact, interlocked provides specific methods (Increment and Decrement) that make this process easy:
Add an IncrementCount method to your CrossRoads class:
public void IncrementCount() {
Interlocked.Increment(ref _timeouts);
}
Then call this from your background worker:
public void TimeIsUp(CrossRoads crossRoads){
crossRoads.IncrementCount();
}
The reading of the value, unless of a 64-bit value on a 32-bit OS, are atomic. See the Interlocked.Read method documentation for more detail.
For class objects or more complex operations, you will need to use thread synchronization locking (lock in C# or SyncLock in VB.Net).
This is accomplished by creating a static synchronization object at the level the lock is to be applied (for example, inside your class), obtaining a lock on that object, and performing (only) the necessary operations inside that lock:
private static object SynchronizationObject = new Object();
public void PerformSomeCriticalWork()
{
lock (SynchronizationObject)
{
// do some critical work
}
}
The good news is that reads and writes to ints are guaranteed to be atomic, so no torn values. However, it is not guaranteed to do a safe ++, and the read could potentially be cached in registers. There's also the issue of instruction re-ordering.
I would use:
Interlocked.Increment(ref crossroads._timeouts);
For the write, which will ensure no values are lost, and;
int timeouts = Interlocked.CompareExchange(ref crossroads._timeouts, 0, 0);
For the read, since this observes the same rules as the increment. Strictly speaking "volatile" is probably enough for the read, but it is so poorly understood that the Interlocked seems (IMO) safer. Either way, we're avoiding a lock.
Well, I'm not a C# developer, but this is how it typically works at this level:
how does .NET deal with such a situation?
Unlocked. Not likely to be guaranteed to be atomic.
Would i have to lock/gate access to each and every field/property that might at times be written to + accessed from different threads?
Yes. An alternative would be to make a lock for the object available to the clients, then tell the clients they must lock the object while using the instance. This will reduce the number of locks acquisitions, and guarantee a more consistent, predictable, state for your clients.
Forget dotnet. At the machine language level, crossRoads._timeouts++ will be implemented as an INC [memory] instruction. This is known as a Read-Modify-Write instruction. These instructions are atomic with respect to multi-threading on a single processor*, (essentially implemented with time-slicing,) but are not atomic with respect to multi-threading using multiple processors or multiple cores.
So:
If you can guarantee that only TimeIsUp() will ever modify crossRoads._timeouts, and if you can guarantee that only one thread will ever execute TimeIsUp(), then it will be safe to do this. The writing in TimeIsUp() will work fine, and the reading in HowManyTimeOuts() (and any place else) will work fine. But if you also modify crossRoads._timeouts elsewhere, or if you ever spawn one more background thread writer, you will be in trouble.
In either case, my advice would be to play it safe and lock it.
(*) They are atomic with respect to multi-threading on a single processor because context switches between threads happen on a periodic interrupt, and on the x86 architectures these instructions are atomic with respect to interrupts, meaning that if an interrupt occurs while the CPU is executing such an instruction, the interrupt will wait until the instruction completes. This does not hold true with more complex instructions, for example those with the REP prefix.
Although an int may be 'native' size to a CPU (dealing in 32 or 64 bits at a time), if you are reading and writing from different threads to the same variable, you are best off locking this variable and synchronizing access.
There is never a guarantee that reads/writes maybe atomic to an int.
You can also use Interlocked.Increment for your purposes here.