I am look for a reproducible example that can demonstrate how volatile keyword works. I'm looking for something that works "wrong" without variable(s) marked as volatile and works "correctly" with it.
I mean some example that will demonstrate that order of write/read operations during the execution is different from expected when variable is not marked as volatile and is not different when variable is not marked as volatile.
I thought that I got an example but then with help from others I realized that it just was a piece of wrong multithreading code. Why volatile and MemoryBarrier do not prevent operations reordering?
I've also found a link that demonstrates an effect of volatile on the optimizer but it is different from what I'm looking for. It demonstrates that requests to variable marked as volatile will not be optimized out.How to illustrate usage of volatile keyword in C#
Here is where I got so far. This code does not show any signs of read/write operation reordering. I'm looking for one that will show.
using System;
using System.Threading;
using System.Threading.Tasks;
using System.Runtime.CompilerServices;
namespace FlipFlop
{
class Program
{
//Declaring these variables
static byte a;
static byte b;
//Track a number of iteration that it took to detect operation reordering.
static long iterations = 0;
static object locker = new object();
//Indicates that operation reordering is not found yet.
static volatile bool continueTrying = true;
//Indicates that Check method should continue.
static volatile bool continueChecking = true;
static void Main(string[] args)
{
//Restarting test until able to catch reordering.
while (continueTrying)
{
iterations++;
a = 0;
b = 0;
var checker = new Task(Check);
var writter = new Task(Write);
lock (locker)
{
continueChecking = true;
checker.Start();
}
writter.Start();
checker.Wait();
writter.Wait();
}
Console.ReadKey();
}
static void Write()
{
//Writing is locked until Main will start Check() method.
lock (locker)
{
WriteInOneDirection();
WriteInOtherDirection();
//Stops spinning in the Check method.
continueChecking = false;
}
}
[MethodImpl(MethodImplOptions.NoInlining)]
static void WriteInOneDirection(){
a = 1;
b = 10;
}
[MethodImpl(MethodImplOptions.NoInlining)]
static void WriteInOtherDirection()
{
b = 20;
a = 2;
}
static void Check()
{
//Spins until finds operation reordering or stopped by Write method.
while (continueChecking)
{
int tempA = a;
int tempB = b;
if (tempB == 10 && tempA == 2)
{
continueTrying = false;
Console.WriteLine("Caught when a = {0} and b = {1}", tempA, tempB);
Console.WriteLine("In " + iterations + " iterations.");
break;
}
}
}
}
}
Edit:
As I understand an optimization that causes reordering can come from JITer or from hardware itself. I can rephrase my question. Does JITer or x86 CPUs reorder read/write operations AND is there a way to demonstrate it in C# if they do?
The exact semantics of volatile is a jitter implementation detail. The compiler emits the Opcodes.Volatile IL instruction where ever you access a variable that's declared volatile. It does some checking to verify that the variable type is legal, you can't declare value types larger than 4 bytes volatile but that's where the buck stops.
The C# language specification defines the behavior of volatile, quoted here by Eric Lippert. The 'release' and 'acquire' semantics is something that only makes sense on a processor core with a weak memory model. Those kind of processors have not done well in the market, probably because they are such an enormous pain to program. The odds that your code will ever run on a Titanium are slim to none.
What's especially bad about the C# language specification definition is that it doesn't mention at all what really happens. Declaring a variable volatile prevents the jitter optimizer from optimizing the code to store the variable in a cpu register. Which is why the code that Marc linked is hanging. This will only happen with the current x86 jitter, another strong hint that volatile is really a jitter implementation detail.
The poor semantics of volatile has a rich history, it comes from the C language. Whose code generators have lots of trouble getting it right as well. Here's a interesting report about it (pdf). It dates from 2008, a good 30+ years of opportunity to get it right. Or wrong, this goes belly-up when the code optimizer is forgetting about a variable being volatile. Unoptimized code never has a problem with it. Notable is that the jitter in the 'open source' version of .NET (SSLI20) completely ignores the IL instruction. It can also be argued that the current behavior of the x86 jitter is a bug. I think it is, it is not easy to bump it into the failure mode. But nobody can argue that it actually is a bug.
The writing is on the wall, only ever declare a variable volatile if it is stored in a memory mapped register. The original intention of the keyword. The odds that you'll run into such a usage in the C# language should be vanishingly small, code like that belongs in a device driver. And above all, never assume that it is useful in a multi-threading scenario.
You can use this example to demonstrate the different behavior with and without volatile. This example must be compiled using a Release build and ran outside of the debugger1. Experiment by adding and removing the volatile keyword to the stop flag.
What happens here is that the read of stop in the while loop is reordered so that it occurs before the loop if volatile is omitted. This prevents the thread from ending even after the main thread set the stop flag to true.
class Program
{
static bool stop = false;
public static void Main(string[] args)
{
var t = new Thread(() =>
{
Console.WriteLine("thread begin");
bool toggle = false;
while (!stop)
{
toggle = !toggle;
}
Console.WriteLine("thread end");
});
t.Start();
Thread.Sleep(1000);
stop = true;
Console.WriteLine("stop = true");
Console.WriteLine("waiting...");
// The Join call should return almost immediately.
// With volatile it DOES.
// Without volatile it does NOT.
t.Join();
}
}
It should also be noted that subtle changes to this example can reduce its probability of reproducibility. For example, adding Thread.Sleep (perhaps to simulate thread interleaving) will itself introduce a memory barrier and thus the similar semantics of the volatile keyword. I suspect Console.WriteLine introduces implicit memory barriers or otherwise prevents the jitter from using the instruction reordering operation. Just keep that in mind if you start messing with the example too much.
1I believe that framework version prior to 2.0 do not include this reordering optimization. That means you should be able to reproduce this behavior with version 2.0 and higher, but not the earlier versions.
Related
Given the following simple code:
class Program
{
static bool finish = false;
static void Main(string[] args)
{
new Thread(ThreadProc).Start();
int x = 0;
while (!finish)
{
x++;
}
}
static void ThreadProc()
{
Thread.Sleep(1000);
finish = true;
}
}
and running it in the Release mode with MSVS2015(.NET 4.6) we will get a never ending application. That happens because the JIT-compiler generates code which reads finish only once, hence ignoring any future updates.
The question is: why is the JIT-compiler allowed to do such an optimization? What part of the specification allows it?
This is covered in section 10.5.3 - Volatile Fields in the C# Specification:
(I've emphasized the part which covers your observations below)
10.5.3 Volatile Fields
When a field-declaration includes a volatile modifier, the fields introduced by that declaration are volatile fields.
For non-volatile fields, optimization techniques that reorder instructions can lead to unexpected and unpredictable results in multithreaded programs that access fields without synchronization, such as that provided by the lock-statement (§8.12). These optimizations can be performed by the compiler, by the runtime system, or by hardware. For volatile fields, such reordering optimizations are restricted:
* A read of a volatile field is called a volatile read. A volatile read has “acquire semantics”; that is, it is guaranteed to occur prior to any references to memory that occur after it in the instruction sequence.
* A write of a volatile field is called a volatile write. A volatile write has “release semantics”; that is, it is guaranteed to happen after any memory references prior to the write instruction in the instruction sequence.
These restrictions ensure that all threads will observe volatile writes performed by any other thread in the order in which they were performed. A conforming implementation is not required to provide a single total ordering of volatile writes as seen from all threads of execution.
The compiler thinks (makes the following promises) as follows:
I will execute your code in the order that you've
asked me to, that is any instructions that run in your single-threaded
code will be performed in the order they were written and based on
this I will do any optimizations I please, that conform with
single-threaded sequentiality.
Well, the compiler sees the finish variable not marked as volatile and thus he considers that it will not be changed by other threads, so he optimizes that away into considering the condition as being always true.
On Debug mode, it has a more lax thinking and does not perform this optimization away.
More on this here.
I have read at different places people saying one should always use lock instead of volatile. I found that there are lots of confusing statements about Multithreading out there and even experts have different opinions on some things here.
After lots of research I found that the lock statement also will insert MemoryBarriers at least.
For example:
public bool stopFlag;
void Foo()
{
lock (myLock)
{
while (!stopFlag)
{
// do something
}
}
}
But if I am not totally wrong, the JIT compiler is free to never actually read the variable inside the loop but instead it may only read a cached version of the variable from a register. AFAIK MemoryBarriers won't help if the JIT made a register assignment to a variable, it just ensures that if we read from memory that the value is current.
Unless there is some compiler magic saying something like "if a code block contains a MemoryBarrier, register assignment of all variables after the MemoryBarrier is prevented".
Unless declared volatile or read with Thread.VolatileRead(), if myBool is set from another Thread to false, the loop may still run infinite, is this correct?
If yes, wouldn't this apply to ALL variables shared between Threads?
Whenever I see a question like this, my knee-jerk reaction is "assume nothing!" The .NET memory model is quite weak and the C# memory model is especially noteworthy for using language that can only apply to a processor with a weak memory model that isn't even supported anymore. Nothing what's there tells you anything what's going to happen in this code, you can reason about locks and memory barriers until you're blue in the face but you don't get anywhere with it.
The x64 jitter is quite clean and rarely throws a surprise. But its days are numbered, it is going to be replaced by Ryujit in VS2015. A rewrite that started with the x86 jitter codebase as a starting point. Which is a concern, the x86 jitter can throw you for a loop. Pun intended.
Best thing to do is to just try it and see what happens. Rewriting your code a little bit and making that loop as tight as possible so the jitter optimizer can do anything it wants:
class Test {
public bool myBool;
private static object myLock = new object();
public int Foo() {
lock (myLock) {
int cnt = 0;
while (!myBool) cnt++;
return cnt;
}
}
}
And testing it like this:
static void Main(string[] args) {
var obj = new Test();
new Thread(() => {
Thread.Sleep(1000);
obj.myBool = true;
}).Start();
Console.WriteLine(obj.Foo());
}
Switch to the Release build. Project + Properties, Build tab, tick the "Prefer 32-bit" option. Tools + Options, Debugging, General, untick the "Suppress JIT optimization" option. First run the Debug build. Works fine, program terminates after a second. Now switch to the Release build, run and observe that it deadlocks, the loop never completes. Use Debug + Break All to see that it hangs in the loop.
To see why, look at the generated machine code with Debug + Windows + Disassembly. Focusing on the loop only:
int cnt = 0;
013E26DD xor edx,edx ; cnt = 0
while (myBool) {
013E26DF movzx eax,byte ptr [esi+4] ; load myBool
013E26E3 test eax,eax ; myBool == true?
013E26E5 jne 013E26EC ; yes => bail out
013E26E7 inc edx ; cnt++
013E26E8 test eax,eax ; myBool == true?
013E26EA jne 013E26E7 ; yes => loop
}
return cnt;
The instruction at address 013E26E8 tells the tale. Note how the myBool variable is stored in the eax register, cnt in the edx register. A standard duty of the jitter optimizer, using the processor registers and avoiding memory loads and stores makes the code much faster. And note that when it tests the value, it still uses the register and does not reload from memory. This loop can therefore never end and it will always hang your program.
Code is pretty fake of course, nobody will ever write this. In practice this tends to work by accident, you'll have more code inside the while() loop. Too much to allow the jitter to optimize the variable way entirely. But there are no hard rules that will tell you when this happens. Sometimes it does pull it off, assume nothing. Proper synchronization should never be skipped. You really are only safe with an extra lock for myBool or an ARE/MRE or Interlocked.CompareExchange(). And if you want to cut such a volatile corner then you must check.
And noted in the comments, try Thread.VolatileRead() instead. You need to use a byte instead of a bool. It still hangs, it is not a synchronization primitive.
the JIT compiler is free to never actually read the variable inside the loop but instead it may only read a cached version of the variable from a register.
Well, it'll read the variable once, in the first iteration of the loop, but other than that, yes, it will continue to read a cached value, unless there is a memory barrier. Any time the code crosses a memory barrier it cannot use the cached value.
Using Thread.VolatileRead() adds the appropriate memory barriers, as does marking the field as volatile. There are plenty of other things that one could do that also implicitly add memory barriers; one of them is entering or leaving a lock statement.`
Since your loop is saying within the body of a single lock and not entering or leaving it, it's free to continue using the cached value.
Of course, the solution here isn't to add in a memory barrier. If you want to wait for another thread to notify you of when you should continue on, use a AutoResetEvent (or another similar synchronization tool specifically designed to allow threads to communicate).
how about this
public class Singleton<T> where T : class, new()
{
private static T _instance;
private static object _syncRoot = new Object();
public static T Instance
{
get
{
var instance = _instance;
if (instance == null)
{
lock (_syncRoot)
{
instance = Volatile.Read(ref _instance);
if (instance == null)
{
instance = new T();
}
Volatile.Write(ref _instance, instance);
}
}
return instance;
}
}
}
Considering the following example:
private int sharedState = 0;
private void FirstThread() {
Volatile.Write(ref sharedState, 1);
}
private void SecondThread() {
int sharedStateSnapshot = Volatile.Read(ref sharedState);
Console.WriteLine(sharedStateSnapshot);
}
Until recently, I was under the impression that, as long as FirstThread() really did execute before SecondThread(), this program could not output anything but 1.
However, my understanding now is that:
Volatile.Write() emits a release fence. This means no preceding load or store (in program order) may happen after the assignment of 1 to sharedState.
Volatile.Read() emits an acquire fence. This means no subsequent load or store (in program order) may happen before the copying of sharedState to sharedStateSnapshot.
Or, to put it another way:
When sharedState is actually released to all processor cores, everything preceding that write will also be released, and,
When the value in the address sharedStateSnapshot is acquired; sharedState must have been already acquired.
If my understanding is therefore correct, then there is nothing to prevent the acquisition of sharedState being 'stale', if the write in FirstThread() has not already been released.
If this is true, how can we actually ensure (assuming the weakest processor memory model, such as ARM or Alpha), that the program will always print 1? (Or have I made an error in my mental model somewhere?)
Your understanding is correct, and it is true that you cannot ensure that the program will always print 1 using these techniques. To ensure your program will print 1, assuming thread 2 runs after thread one, you need two fences on each thread.
The easiest way to achieve that is using the lock keyword:
private int sharedState = 0;
private readonly object locker = new object();
private void FirstThread()
{
lock (locker)
{
sharedState = 1;
}
}
private void SecondThread()
{
int sharedStateSnapshot;
lock (locker)
{
sharedStateSnapshot = sharedState;
}
Console.WriteLine(sharedStateSnapshot);
}
I'd like to quote Eric Lippert:
Frankly, I discourage you from ever making a volatile field. Volatile fields are a sign that you are doing something downright crazy: you're attempting to read and write the same value on two different threads without putting a lock in place.
The same applies to calling Volatile.Read and Volatile.Write. In fact, they are even worse than volatile fields, since they require you to do manually what the volatile modifier does automatically.
You're right, there's no guarantee that release stores will be immediately visible to all processors. Volatile.Read and Volatile.Write give you acquire/release semantics, but no immediacy guarantees.
The volatile modifier seems to do this though. The compiler will emit an OpCodes.Volatile IL instruction, and the jitter will tell the processor not to store the variable on any of its registers (see Hans Passant's answer).
But why do you need it to be immediate anyway? What if your SecondThread happens to run a couple of milliseconds sooner, before the values are actually wrote? Seeing as the scheduling is non-deterministic, the correctness of your program shouldn't depend on this "immediacy" anyway.
Until recently, I was under the impression that, as long as
FirstThread() really did execute before SecondThread(), this program
could not output anything but 1.
As you go on to explain yourself, this impression is wrong. Volatile.Read simply issues a read operation on its target followed by a memory barrier; the memory barrier prevents operation reordering on the processor executing the current thread but this does not help here because
There are no operations to reorder (just the single read or write in each thread).
The race condition across your threads means that even if the no-reorder guarantee applied across processors, it would simply mean that the order of operations which you cannot predict anyway would be preserved.
If my understanding is therefore correct, then there is nothing to
prevent the acquisition of sharedState being 'stale', if the write in
FirstThread() has not already been released.
That is correct. In essence you are using a tool designed to help with weak memory models against a possible problem caused by a race condition. The tool won't help you because that's not what it does.
If this is true, how can we actually ensure (assuming the weakest
processor memory model, such as ARM or Alpha), that the program will
always print 1? (Or have I made an error in my mental model
somewhere?)
To stress once again: the memory model is not the problem here. To ensure that your program will always print 1 you need to do two things:
Provide explicit thread synchronization that guarantees the write will happen before the read (in the simplest case, SecondThread can use a spin lock on a flag which FirstThread uses to signal it's done).
Ensure that SecondThread will not read a stale value. You can do this trivially by marking sharedState as volatile -- while this keyword has deservedly gotten much flak, it was designed explicitly for such use cases.
So in the simplest case you could for example have:
private volatile int sharedState = 0;
private volatile bool spinLock = false;
private void FirstThread()
{
sharedState = 1;
// ensure lock is released after the shared state write!
Volatile.Write(ref spinLock, true);
}
private void SecondThread()
{
SpinWait.SpinUntil(() => spinLock);
Console.WriteLine(sharedState);
}
Assuming no other writes to the two fields, this program is guaranteed to output nothing other than 1.
Here is the code that I was trying on my workstation.
class Program
{
public static volatile bool status = true;
public static void Main()
{
Thread FirstStart = new Thread(threadrun);
FirstStart.Start();
Thread.Sleep(200);
Thread thirdstart = new Thread(threadrun2);
thirdstart.Start();
Console.ReadLine();
}
static void threadrun()
{
while (status)
{
Console.WriteLine("Waiting..");
}
}
static void threadrun2()
{
status = false;
Console.WriteLine("the bool value is now made FALSE");
}
}
As you can see I have fired three threads in Main. Then using breakpoints I tracked the threads. My initial conception was all the three threads will be fired simultaneously, but my breakpoint flow showed that the thread-execution-flow followed one after other (and so was the output format i.e. Top to bottom execution of threads). Guys why is that happening ?
Additionally I tried to run the same program without using the volatile keyword in declaration, and I found no change in program execution. I doubt the volatile keyword is of no practical live use. Am I going wrong somewhere?
Your method of thinking is flawed.
The very nature of threading related issues is that they're non-deterministic. This means that what you have observed is potentially no indicator of what may happen in the future.
This is the very nature of why multithreaded programming is "hard." It often defies ad hoc testing, or even most unit testing. The only way to do it effectively is to understand your entire software and hardware stack, and diagram every possible occurrence through use of state machines.
In summary, threaded programming is not about what you've seen happen, it's about what might possibly happen, no matter how improbable.
Ok I will try to explain a very long story as short as possible:
Number 1: Trying to inspect the behavior of threads with the debugger is as useful as repeatedly running a multithreaded program and concluding that it works fine because out of 100 tests none failed: WRONG! Threads behave in a completely nondeterministic (some would say random) way and you need different methods to make sure such a program will run correctly.
Number 2: The use of volatile will become clear once you remove it and then run your program in Debug mode and then switch to Release mode. I think you will have a surprise... What happens in Release mode is that the compiler will optimize code (this includes reordering instructions and caching of values). Now, if your two threads run on different processor cores, then the core executing the thread that is checking for the value of status will cache its value instead of repeatedly checking for it. The other thread will set it but the first one will never see the change: deadlock! volatile prevents this kind of situation from occurring.
In a sense, volatile is a guard in case the code does not actually (and most likely will not) run as you think it will in a multithreaded scenario.
The fact that your simple code doesn't behave dirrefently with volatile doesn't mean anything. Your code is too simple and has nothing to do with volatile. You need to write very computation-intensive code to create a clearly visible memory race condition.
Also, volatile keyword may be useful on other platforms than x86/x64 with other memory models. (I mean like for example Itanium.)
Joe Duffy wrote interesting information about volatile on his blog. I strongly recommend to read it.
Then using breakpoints I tracked the threads. My initial conception
was all the three threads will be fired simultaneously, but my
breakpoint flow showed that the thread-execution-flow followed one
after other (and so was the output format i.e. Top to bottom execution
of threads). Guys why is that happening?
The debugger is temporarily suspending the threads to make it easier to debug.
I doubt the volatile keyword is of no practical live use. Am I going
wrong somewhere?
The Console.WriteLine calls are very likely fixing masking the problem. They are most likely generating the necessary memory barrier for you implicitly. Here is a really simple snippet of code that demonstrates that there is, in fact, a problem when volatile is not used to declare the stop variable.
Compile the following code with the Release configuration and run it outside of the debugger.
class Program
{
static bool stop = false;
public static void Main(string[] args)
{
var t = new Thread(() =>
{
Console.WriteLine("thread begin");
bool toggle = false;
while (!stop)
{
toggle = !toggle;
}
Console.WriteLine("thread end");
});
t.Start();
Thread.Sleep(1000);
stop = true;
Console.WriteLine("stop = true");
Console.WriteLine("waiting...");
t.Join();
}
}
In "C# 4 in a Nutshell", the author shows that this class can write 0 sometimes without MemoryBarrier, though I can't reproduce in my Core2Duo:
public class Foo
{
int _answer;
bool _complete;
public void A()
{
_answer = 123;
//Thread.MemoryBarrier(); // Barrier 1
_complete = true;
//Thread.MemoryBarrier(); // Barrier 2
}
public void B()
{
//Thread.MemoryBarrier(); // Barrier 3
if (_complete)
{
//Thread.MemoryBarrier(); // Barrier 4
Console.WriteLine(_answer);
}
}
}
private static void ThreadInverteOrdemComandos()
{
Foo obj = new Foo();
Task.Factory.StartNew(obj.A);
Task.Factory.StartNew(obj.B);
Thread.Sleep(10);
}
This need seems crazy to me. How can I recognize all possible cases that this can occur? I think that if processor changes order of operations, it needs to guarantee that the behavior doesn't change.
Do you bother to use Barriers?
You are going to have a very hard time reproducing this bug. In fact, I would go as far as saying you will never be able to reproduce it using the .NET Framework. The reason is because Microsoft's implementation uses a strong memory model for writes. That means writes are treated as if they were volatile. A volatile write has lock-release semantics which means that all prior writes must be committed before the current write.
However, the ECMA specification has a weaker memory model. So it is theoretically possible that Mono or even a future version of the .NET Framework might start exhibiting the buggy behavior.
So what I am saying is that it is very unlikely that removing barriers #1 and #2 will have any impact on the behavior of the program. That, of course, is not a guarantee, but an observation based on the current implementation of the CLR only.
Removing barriers #3 and #4 will definitely have an impact. This is actually pretty easy to reproduce. Well, not this example per se, but the following code is one of the more well known demonstrations. It has to be compiled using the Release build and ran outside of the debugger. The bug is that the program does not end. You can fix the bug by placing a call to Thread.MemoryBarrier inside the while loop or by marking stop as volatile.
class Program
{
static bool stop = false;
public static void Main(string[] args)
{
var t = new Thread(() =>
{
Console.WriteLine("thread begin");
bool toggle = false;
while (!stop)
{
toggle = !toggle;
}
Console.WriteLine("thread end");
});
t.Start();
Thread.Sleep(1000);
stop = true;
Console.WriteLine("stop = true");
Console.WriteLine("waiting...");
t.Join();
}
}
The reason why some threading bugs are hard to reproduce is because the same tactics you use to simulate thread interleaving can actually fix the bug. Thread.Sleep is the most notable example because it generates memory barriers. You can verify that by placing a call inside the while loop and observing that the bug goes away.
You can see my answer here for another analysis of the example from the book you cited.
Odds are very good that the first task is completed by the time the 2nd task even starts running. You can only observe this behavior if both threads run that code simultaneously and there's no intervening cache-synchronizing operations. There is one in your code, the StartNew() method will take a lock inside the thread pool manager somewhere.
Getting two threads to run this code simultaneously is very hard. This code completes in a couple of nanoseconds. You would have to try billions of times and introduce variable delays to have any odds. Not much point to this of course, the real problem is when this happens randomly when you don't expect it.
Stay away from this, use the lock statement to write sane multi-threaded code.
If you use volatile and lock, the memory barrier is built in. But, yes, you do need it otherwise. Having said that, I suspect that you need half as many as your example shows.
Its very difficult to reproduce multithreaded bugs - usually you have to run the test code many times (thousands) and have some automated check that will flag if the bug occurs. You might try to add a short Thread.Sleep(10) in between some of the lines, but again it not always guarantees that you will get the same issues as without it.
Memory Barriers were introduced for people who need to do really hardcore low-level performance optimisation of their multithreaded code. In most cases you will be better off when using other synchronisation primitives, i.e. volatile or lock.
I'll just quote one of the great articles on multi-threading:
Consider the following example:
class Foo
{
int _answer;
bool _complete;
void A()
{
_answer = 123;
_complete = true;
}
void B()
{
if (_complete) Console.WriteLine (_answer);
}
}
If methods A and B ran concurrently on different threads, might it be
possible for B to write “0”? The answer is yes — for the following
reasons:
The compiler, CLR, or CPU may reorder your program's instructions to
improve efficiency. The compiler, CLR, or CPU may introduce caching
optimizations such that assignments to variables won't be visible to
other threads right away. C# and the runtime are very careful to
ensure that such optimizations don’t break ordinary single-threaded
code — or multithreaded code that makes proper use of locks. Outside
of these scenarios, you must explicitly defeat these optimizations by
creating memory barriers (also called memory fences) to limit the
effects of instruction reordering and read/write caching.
Full fences
The simplest kind of memory barrier is a full memory
barrier (full fence) which prevents any kind of instruction reordering
or caching around that fence. Calling Thread.MemoryBarrier generates a
full fence; we can fix our example by applying four full fences as
follows:
class Foo
{
int _answer;
bool _complete;
void A()
{
_answer = 123;
Thread.MemoryBarrier(); // Barrier 1
_complete = true;
Thread.MemoryBarrier(); // Barrier 2
}
void B()
{
Thread.MemoryBarrier(); // Barrier 3
if (_complete)
{
Thread.MemoryBarrier(); // Barrier 4
Console.WriteLine (_answer);
}
}
}
All the theory behind Thread.MemoryBarrier and why we need to use it in non-blocking scenarios to make the code safe and robust is described nicely here: http://www.albahari.com/threading/part4.aspx
If you are ever touching data from two different threads, this can occur. This is one of the tricks that processors use to increase speed - you could build processors that didn't do this, but they would be much slower, so no one does that anymore. You should probably read something like Hennessey and Patterson to recognize all of the various types of race conditions.
I always use some sort of higher level tool like a monitor or a lock, but internally they are doing something similar or are implemented with barriers.