Read elimination and concurrency

Read elimination and concurrency - c#

Given the following simple code:
class Program
{
static bool finish = false;
static void Main(string[] args)
{
new Thread(ThreadProc).Start();
int x = 0;
while (!finish)
{
x++;
}
}
static void ThreadProc()
{
Thread.Sleep(1000);
finish = true;
}
}
and running it in the Release mode with MSVS2015(.NET 4.6) we will get a never ending application. That happens because the JIT-compiler generates code which reads finish only once, hence ignoring any future updates.
The question is: why is the JIT-compiler allowed to do such an optimization? What part of the specification allows it?

This is covered in section 10.5.3 - Volatile Fields in the C# Specification:
(I've emphasized the part which covers your observations below)
10.5.3 Volatile Fields
When a field-declaration includes a volatile modifier, the fields introduced by that declaration are volatile fields.
For non-volatile fields, optimization techniques that reorder instructions can lead to unexpected and unpredictable results in multithreaded programs that access fields without synchronization, such as that provided by the lock-statement (§8.12). These optimizations can be performed by the compiler, by the runtime system, or by hardware. For volatile fields, such reordering optimizations are restricted:
* A read of a volatile field is called a volatile read. A volatile read has “acquire semantics”; that is, it is guaranteed to occur prior to any references to memory that occur after it in the instruction sequence.
* A write of a volatile field is called a volatile write. A volatile write has “release semantics”; that is, it is guaranteed to happen after any memory references prior to the write instruction in the instruction sequence.
These restrictions ensure that all threads will observe volatile writes performed by any other thread in the order in which they were performed. A conforming implementation is not required to provide a single total ordering of volatile writes as seen from all threads of execution.

The compiler thinks (makes the following promises) as follows:
I will execute your code in the order that you've
asked me to, that is any instructions that run in your single-threaded
code will be performed in the order they were written and based on
this I will do any optimizations I please, that conform with
single-threaded sequentiality.
Well, the compiler sees the finish variable not marked as volatile and thus he considers that it will not be changed by other threads, so he optimizes that away into considering the condition as being always true.
On Debug mode, it has a more lax thinking and does not perform this optimization away.
More on this here.

Related

Does volatile prevent introduced reads or writes?

In C#, volatile keyword ensures that reads and writes have acquire and release semantics, respectively. However, does it say anything about introduced reads or writes?
For instance:
volatile Thing something;
volatile int aNumber;
void Method()
{
// Are these lines...
var local = something;
if (local != null)
local.DoThings();
// ...guaranteed not to be transformed into these by compiler, jitter or processor?
if (something != null)
something.DoThings(); // <-- Second read!
// Are these lines...
if (aNumber == 0)
aNumber = 1;
// ...guaranteed not to be transformed into these by compiler, jitter or processor?
var temp = aNumber;
if (temp == 0)
temp = 1;
aNumber = temp; // <-- An out-of-thin-air write!
}

Here's what the C# spec1 has to say about Execution Order:
Execution of a C# program proceeds such that the side effects of each executing thread are preserved at critical execution points. A side effect is defined as a read or write of a volatile field ...
The execution environment is free to change the order of execution of a C# program, subject to the following constraints:
...
The ordering of side effects is preserved with respect to volatile reads and writes ...
I would certainly consider introducing new side effects to be changing the order of side effects, but it's not explicitly stated like that here.
Link in answer is to the C# 6 spec which is listed as Draft. C# 5 spec isn't a draft but is not available on-line, only as a download. Identical wording, so far as I can see in this section.

This wording from the C# spec:
The ordering of side effects is preserved with respect to volatile
reads and writes...
may be interpreted as implying that read and write introductions on volatile variables are not allowed, but it is really ambiguous and it depends on the meaning of "ordering." If it is referring to relative ordering of existing accesses, then introducing new reads or writes does not change that and so it would not violate this part of the spec. If it is referring to the exact position of all memory accesses in program order, then introducing new accesses would violate the spec.
This article says that reads on non-volatile variables might be introduced but does not say explicitly whether this is not allowed on volatile variables.
This Q/A discusses how to prevent read introduction (but no discussion on write introduction).
In the comments under this article, two Microsoft employees (at the least at the time the comments were written) explicitly state that read and write introductions on volatile variables are not allowed.
Stephen Toub
"read introduction" is one mechanism by which a memory reordering
might be introduced.
Igor Ostrovsky
Elsewhere in the C# specification, a volatile read is defined to be a
"side effect". As a result, repeating the read of m_paused would be
equivalent to adding another side effect, which is not allowed.
I think we can conclude from these comments that introducing a side effect out-of-thin-air in C#, any kind of side effect, anywhere in the code is not allowed.
A related quote from the CLI standard states the following in Section I.12.6.7:
An optimizing compiler that converts CIL to native code shall not
remove any volatile operation, nor shall it coalesce multiple volatile
operations into a single operation.
As far as I know, the CLI does not explicitly talk about introducing new side effects.

I wonder whether you have misunderstood what volatile means. Volatile can be used with types that can be read or written as an atomic action.
There is no acquire/release of a lock, only a barrier to compile-time and run-time reordering to provide lockless aquire/release semantics (https://preshing.com/20120913/acquire-and-release-semantics/). On non-x86 this may require barrier instructions in the asm, but not taking a lock.
volatile indicates that a field may be modified by other threads, which is why read/writes need to be treated as atomic and not optimised.
Your question is a little ambiguous.
1/ If you mean, will the compiler transform:
var local = something;
if (local != null) local.DoThings();
into:
if (something != null) something.DoThings();
then the answer is no.
2/ If you mean, will "DoThings()" be called twice on the same object:
var local = something;
if (local != null) local.DoThings();
if (something != null) something.DoThings();
then the answer is mostly yes, unless another thread has changed the value of "something" before the second "DoThings()" is invoked. If this is the case then it could give you a run time error - if after the "if" condition is evaluated and before "DoThings" is called, another thread sets "something" to null then you will get a runtime error. I assume this is why you have your "var local = something;".
3/ If you mean will the following cause two reads:
if (something != null) something.DoThings();
then yes, one read for the condition and a second read when it invokes DoThings() (assuming that something is not null). Were it not marked volatile then the compiler might manage that with a single read.
In any event the implementation of the function "DoThings()" needs to be aware that it could be called by multiple threads, so would need to consider incorporating a combination of locks and its own volatile members.

Why make a bool volatile when multithreading in C#?

I understand the volatile keyword in C++ fairly well. But in C#, it appears to take a different meaning, more so related to multi-threading. I thought bool operations were atomic, and I thought that if the operation were atomic, you wouldn't have threading concerns. What am I missing?
https://msdn.microsoft.com/en-us/library/x13ttww7.aspx

I thought bool operations were atomic
They are are indeed atomic.
and I thought that if the operation were atomic, you wouldn't have threading concerns.
That is where you have an incomplete picture. Imagine you have two threads running on separate cores, each with their own cache layers. Thread #1 has foo in its cache, and Thread #2 updates the value of foo. Thread #1 won't see the change unless foo is marked as volatile, acquired a lock, used the Interlocked class, or explicitly called Thread.MemoryBarrier() which would cause the value to be invalidated in the cache. Thus, guaranteeing that you read the most up to date value:
Using the volatile modifier ensures that one thread retrieves the most up-to-date value written by another thread.
Eric Lippert has a great post about volatility where he explains:
The true semantics of volatile reads
and writes are considerably more complex than I've outlined here; in
fact they do not actually guarantee that every processor stops what it
is doing and updates caches to/from main memory. Rather, they provide
weaker guarantees about how memory accesses before and after reads and
writes may be observed to be ordered with respect to each other.
Edit:
As per #xanatos comment, This doesn't mean that volatile guarantees an immediate read, it guarantees a read on the most updated value.

The volatile keyword in C# is all about reading/writing reordering, so it is something quite esoteric.
http://www.albahari.com/threading/part4.aspx#_Memory_Barriers_and_Volatility
(that I consider to be one of the "bibles" about threading) writes:
The volatile keyword instructs the compiler to generate an acquire-fence on every read from that field, and a release-fence on every write to that field. An acquire-fence prevents other reads/writes from being moved before the fence; a release-fence prevents other reads/writes from being moved after the fence. These “half-fences” are faster than full fences because they give the runtime and hardware more scope for optimization.
It is something quite unreadable :-)
Now... What it doesn't mean:
It doesn't mean that a value will be read now/will be written now
it simply means that if you read something from a volatile variable, everything else that has been read/written before this "special" read won't be moved after this "special" read. So it creates a barrier. So paradoxically, by reading from a volatile variable, you guarantee that all the writes you have done to any other variable (volatile or not) at the point of the reading will be done.
A volatile write is something probably more important, and is something that is in part guaranteed by the Intel CPU, and something that wasn't guaranteed by the first version of Java: no write reordering. The problem was:
object myrefthatissharedwithotherthreads = new MyClass(5);
where
class MyClass
{
public int Value;
MyClass(int value)
{
Value = value;
}
}
Now... that expression can be imagined to be:
var temp = new MyClass();
temp.Value = 5;
myrefthatissharedwithotherthreads = temp;
where temp is something generated by the compiler that you can't see.
If the writes can be reordered, you could have:
var temp = new MyClass();
myrefthatissharedwithotherthreads = temp;
temp.Value = 5;
and another thread could see a partially initialized MyClass, because the value of myrefthatissharedwithotherthreads is readable before the class MyClass has finished initializing.

Volatile fields: How can I actually get the latest written value to a field?

Considering the following example:
private int sharedState = 0;
private void FirstThread() {
Volatile.Write(ref sharedState, 1);
}
private void SecondThread() {
int sharedStateSnapshot = Volatile.Read(ref sharedState);
Console.WriteLine(sharedStateSnapshot);
}
Until recently, I was under the impression that, as long as FirstThread() really did execute before SecondThread(), this program could not output anything but 1.
However, my understanding now is that:
Volatile.Write() emits a release fence. This means no preceding load or store (in program order) may happen after the assignment of 1 to sharedState.
Volatile.Read() emits an acquire fence. This means no subsequent load or store (in program order) may happen before the copying of sharedState to sharedStateSnapshot.
Or, to put it another way:
When sharedState is actually released to all processor cores, everything preceding that write will also be released, and,
When the value in the address sharedStateSnapshot is acquired; sharedState must have been already acquired.
If my understanding is therefore correct, then there is nothing to prevent the acquisition of sharedState being 'stale', if the write in FirstThread() has not already been released.
If this is true, how can we actually ensure (assuming the weakest processor memory model, such as ARM or Alpha), that the program will always print 1? (Or have I made an error in my mental model somewhere?)

Your understanding is correct, and it is true that you cannot ensure that the program will always print 1 using these techniques. To ensure your program will print 1, assuming thread 2 runs after thread one, you need two fences on each thread.
The easiest way to achieve that is using the lock keyword:
private int sharedState = 0;
private readonly object locker = new object();
private void FirstThread()
{
lock (locker)
{
sharedState = 1;
}
}
private void SecondThread()
{
int sharedStateSnapshot;
lock (locker)
{
sharedStateSnapshot = sharedState;
}
Console.WriteLine(sharedStateSnapshot);
}
I'd like to quote Eric Lippert:
Frankly, I discourage you from ever making a volatile field. Volatile fields are a sign that you are doing something downright crazy: you're attempting to read and write the same value on two different threads without putting a lock in place.
The same applies to calling Volatile.Read and Volatile.Write. In fact, they are even worse than volatile fields, since they require you to do manually what the volatile modifier does automatically.

You're right, there's no guarantee that release stores will be immediately visible to all processors. Volatile.Read and Volatile.Write give you acquire/release semantics, but no immediacy guarantees.
The volatile modifier seems to do this though. The compiler will emit an OpCodes.Volatile IL instruction, and the jitter will tell the processor not to store the variable on any of its registers (see Hans Passant's answer).
But why do you need it to be immediate anyway? What if your SecondThread happens to run a couple of milliseconds sooner, before the values are actually wrote? Seeing as the scheduling is non-deterministic, the correctness of your program shouldn't depend on this "immediacy" anyway.

Until recently, I was under the impression that, as long as
FirstThread() really did execute before SecondThread(), this program
could not output anything but 1.
As you go on to explain yourself, this impression is wrong. Volatile.Read simply issues a read operation on its target followed by a memory barrier; the memory barrier prevents operation reordering on the processor executing the current thread but this does not help here because
There are no operations to reorder (just the single read or write in each thread).
The race condition across your threads means that even if the no-reorder guarantee applied across processors, it would simply mean that the order of operations which you cannot predict anyway would be preserved.
If my understanding is therefore correct, then there is nothing to
prevent the acquisition of sharedState being 'stale', if the write in
FirstThread() has not already been released.
That is correct. In essence you are using a tool designed to help with weak memory models against a possible problem caused by a race condition. The tool won't help you because that's not what it does.
If this is true, how can we actually ensure (assuming the weakest
processor memory model, such as ARM or Alpha), that the program will
always print 1? (Or have I made an error in my mental model
somewhere?)
To stress once again: the memory model is not the problem here. To ensure that your program will always print 1 you need to do two things:
Provide explicit thread synchronization that guarantees the write will happen before the read (in the simplest case, SecondThread can use a spin lock on a flag which FirstThread uses to signal it's done).
Ensure that SecondThread will not read a stale value. You can do this trivially by marking sharedState as volatile -- while this keyword has deservedly gotten much flak, it was designed explicitly for such use cases.
So in the simplest case you could for example have:
private volatile int sharedState = 0;
private volatile bool spinLock = false;
private void FirstThread()
{
sharedState = 1;
// ensure lock is released after the shared state write!
Volatile.Write(ref spinLock, true);
}
private void SecondThread()
{
SpinWait.SpinUntil(() => spinLock);
Console.WriteLine(sharedState);
}
Assuming no other writes to the two fields, this program is guaranteed to output nothing other than 1.

Where to places fences/memory barriers to guarantee a fresh read/committed writes?

Like many other people, I've always been confused by volatile reads/writes and fences. So now I'm trying to fully understand what these do.
So, a volatile read is supposed to (1) exhibit acquire-semantics and (2) guarantee that the value read is fresh, i.e., it is not a cached value. Let's focus on (2).
Now, I've read that, if you want to perform a volatile read, you should introduce an acquire fence (or a full fence) after the read, like this:
int local = shared;
Thread.MemoryBarrier();
How exactly does this prevent the read operation from using a previously cached value?
According to the definition of a fence (no read/stores are allowed to be moved above/below the fence), I would insert the fence before the read, preventing the read from crossing the fence and being moved backwards in time (aka, being cached).
How does preventing the read from being moved forwards in time (or subsequent instructions from being moved backwards in time) guarantee a volatile (fresh) read? How does it help?
Similarly, I believe that a volatile write should introduce a fence after the write operation, preventing the processor from moving the write forward in time (aka, delaying the write). I believe this would make the processor flush the write to the main memory.
But to my surprise, the C# implementation introduces the fence before the write!
[MethodImplAttribute(MethodImplOptions.NoInlining)] // disable optimizations
public static void VolatileWrite(ref int address, int value)
{
MemoryBarrier(); // Call MemoryBarrier to ensure the proper semantic in a portable way.
address = value;
}
Update
According to this example, apparently taken from "C# 4 in a Nutshell", fence 2 , placed after a write is supposed to force the write to be flushed to main memory immediately, and fence 3, placed before a read, is supposed to guarantee a fresh read:
class Foo{
int _answer;
bool complete;
void A(){
_answer = 123;
Thread.MemoryBarrier(); // Barrier 1
_complete = true;
Thread.MemoryBarrier(); // Barrier 2
}
void B(){
Thread.MemoryBarrier(); // Barrier 3;
if(_complete){
Thread.MemoryBarrier(); // Barrier 4;
Console.WriteLine(_answer);
}
}
}
The ideas in this book (and my own personal beliefs) seem to contradict the ideas behind C#'s VolatileRead and VolatileWrite implementations.

How exactly does this prevent the read operation from using a
previously cached value?
It does no such thing. A volatile read does not guarantee that the latest value will be returned. In plain English all it really means is that the next read will return a newer value and nothing more.
How does preventing the read from being moved forwards in time (or
subsequent instructions from being moved backwards in time) guarantee
a volatile (fresh) read? How does it help?
Be careful with the terminology here. Volatile is not synonymous with fresh. As I already mentioned above its real usefulness lies in how two or more volatile reads are chained together. The next read in a sequence of volatile reads will absolutely return a newer value than the previous read of the same address. Lock-free code should be written with this premise in mind. That is, the code should be structured to work on the principal of dealing with a newer value and not the latest value. This is why most lock-free code spins in a loop until it can verify that the operation completely successfully.
The ideas in this book (and my own personal beliefs) seem to
contradict the ideas behind C#'s VolatileRead and VolatileWrite
implementations.
Not really. Remember volatile != fresh. Yes, if you want a "fresh" read then you need to place an acquire-fence before the read. But, that is not the same as doing a volatile read. What I am saying is that if the implementation of VolatileRead had the call to Thread.MemoryBarrier before the read instruction then it would not actually produce a volatile read. If would produce fresh a read though.

The important thing to understand is that volatile does not only mean "cannot cache value", but also gives important visibility guarantees (to be exact, it's entirely possible to have a volatile write that only goes to cache; depends solely on the hardware and its used cache coherency protocols)
A volatile read gives acquire semantics, while a volatile write has release semantics. An acquire fence means that you cannot reorder reads or writes before the fence, while a release fence means you cannot move them after the fence. The linked answer in the comments explains that actually quite nicely.
Now the question is, if we don't have any memory barrier before the load how is it guaranteed that we'll see the newest value? The answer to that is: Because we also put memory barriers after each volatile write to guarantee that.
Doug Lea wrote a great summary on which barriers exist, what they do and where to put them for volatile reads/writes for the JMM as a help for compiler writers, but the text is also quite useful for other people. Volatile reads and writes give the same guarantees in both Java and the CLR so that's generally applicable.
Source - scroll down to the "Memory Barriers" section (I'd copy the interesting parts, but the formatting doesn't survive it..)

volatile with release/acquire semantics

Since Java 5, the volatile keyword has release/acquire semantics to make side-effects visible to other threads (including assignments to non-volatile variables!). Take these two variables, for example:
int i;
volatile int v;
Note that i is a regular, non-volatile variable. Imagine thread 1 executing the following statements:
i = 42;
v = 0;
At some later point in time, thread 2 executes the following statements:
int some_local_variable = v;
print(i);
According to the Java memory model, the write of v in thread 1 followed by the read of v in thread 2 ensures that thread 2 sees the write to i executed in thread 1, so the value 42 is printed.
My question is: does volatile have the same release/acquire semantics in C#?

The semantics of "volatile" in C# are defined in sections 7.10 and 14.5.4 of the C# 6.0 specification. Rather than reproduce them here, I encourage you to look it up in the spec, and then decide that it is too complicated and dangerous to use "volatile", and go back to using locks. That's what I always do.
See "7.10 Execution order" and "14.5.4 Volatile fields" in the C# 6.0 specification.
In the C# 5.0 specification, see "3.10 Execution Order" and "10.5.3 Volatile Fields"

Well, I believe it ensures that if some_local_variable is read as 0 (due to the write to v), i will be read as 42.
The tricky part is "at some later point in time". While commonly volatility is talked about in terms of "flushing" writes, that's not how it's actually defined in the spec (either Java or C#).
From the C# 4 language spec, section 10.5.3:
For non-volatile fields, optimization techniques that reorder instructions can lead to unexpected and unpredictable results in multi-threaded programs that access fields without synchronization such as that provided by the lock-statement (§8.12). These optimizations can be performed by the compiler, by the run-time system, or by hardware. For volatile fields, such reordering optimizations are restricted:
A read of a volatile field is called a volatile read. A volatile read has “acquire semantics”; that is, it is guaranteed to occur prior to any references to memory that occur after it in the instruction sequence.
A write of a volatile field is called a volatile write. A volatile write has “release semantics”; that is, it is guaranteed to happen after any memory references prior to the write instruction in the instruction sequence.
There's then an example which is quite similar to yours, but conditional on the value read from the volatile variable.
And like Eric, I'd strongly avoid relying on volatile myself. It's hard to reason about, and best left to the Joe Duffy/Stephen Toubs of the world.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.