Is volatile still needed inside lock statements?

Is volatile still needed inside lock statements? - c#

I have read at different places people saying one should always use lock instead of volatile. I found that there are lots of confusing statements about Multithreading out there and even experts have different opinions on some things here.
After lots of research I found that the lock statement also will insert MemoryBarriers at least.
For example:
public bool stopFlag;
void Foo()
{
lock (myLock)
{
while (!stopFlag)
{
// do something
}
}
}
But if I am not totally wrong, the JIT compiler is free to never actually read the variable inside the loop but instead it may only read a cached version of the variable from a register. AFAIK MemoryBarriers won't help if the JIT made a register assignment to a variable, it just ensures that if we read from memory that the value is current.
Unless there is some compiler magic saying something like "if a code block contains a MemoryBarrier, register assignment of all variables after the MemoryBarrier is prevented".
Unless declared volatile or read with Thread.VolatileRead(), if myBool is set from another Thread to false, the loop may still run infinite, is this correct?
If yes, wouldn't this apply to ALL variables shared between Threads?

Whenever I see a question like this, my knee-jerk reaction is "assume nothing!" The .NET memory model is quite weak and the C# memory model is especially noteworthy for using language that can only apply to a processor with a weak memory model that isn't even supported anymore. Nothing what's there tells you anything what's going to happen in this code, you can reason about locks and memory barriers until you're blue in the face but you don't get anywhere with it.
The x64 jitter is quite clean and rarely throws a surprise. But its days are numbered, it is going to be replaced by Ryujit in VS2015. A rewrite that started with the x86 jitter codebase as a starting point. Which is a concern, the x86 jitter can throw you for a loop. Pun intended.
Best thing to do is to just try it and see what happens. Rewriting your code a little bit and making that loop as tight as possible so the jitter optimizer can do anything it wants:
class Test {
public bool myBool;
private static object myLock = new object();
public int Foo() {
lock (myLock) {
int cnt = 0;
while (!myBool) cnt++;
return cnt;
}
}
}
And testing it like this:
static void Main(string[] args) {
var obj = new Test();
new Thread(() => {
Thread.Sleep(1000);
obj.myBool = true;
}).Start();
Console.WriteLine(obj.Foo());
}
Switch to the Release build. Project + Properties, Build tab, tick the "Prefer 32-bit" option. Tools + Options, Debugging, General, untick the "Suppress JIT optimization" option. First run the Debug build. Works fine, program terminates after a second. Now switch to the Release build, run and observe that it deadlocks, the loop never completes. Use Debug + Break All to see that it hangs in the loop.
To see why, look at the generated machine code with Debug + Windows + Disassembly. Focusing on the loop only:
int cnt = 0;
013E26DD xor edx,edx ; cnt = 0
while (myBool) {
013E26DF movzx eax,byte ptr [esi+4] ; load myBool
013E26E3 test eax,eax ; myBool == true?
013E26E5 jne 013E26EC ; yes => bail out
013E26E7 inc edx ; cnt++
013E26E8 test eax,eax ; myBool == true?
013E26EA jne 013E26E7 ; yes => loop
}
return cnt;
The instruction at address 013E26E8 tells the tale. Note how the myBool variable is stored in the eax register, cnt in the edx register. A standard duty of the jitter optimizer, using the processor registers and avoiding memory loads and stores makes the code much faster. And note that when it tests the value, it still uses the register and does not reload from memory. This loop can therefore never end and it will always hang your program.
Code is pretty fake of course, nobody will ever write this. In practice this tends to work by accident, you'll have more code inside the while() loop. Too much to allow the jitter to optimize the variable way entirely. But there are no hard rules that will tell you when this happens. Sometimes it does pull it off, assume nothing. Proper synchronization should never be skipped. You really are only safe with an extra lock for myBool or an ARE/MRE or Interlocked.CompareExchange(). And if you want to cut such a volatile corner then you must check.
And noted in the comments, try Thread.VolatileRead() instead. You need to use a byte instead of a bool. It still hangs, it is not a synchronization primitive.

the JIT compiler is free to never actually read the variable inside the loop but instead it may only read a cached version of the variable from a register.
Well, it'll read the variable once, in the first iteration of the loop, but other than that, yes, it will continue to read a cached value, unless there is a memory barrier. Any time the code crosses a memory barrier it cannot use the cached value.
Using Thread.VolatileRead() adds the appropriate memory barriers, as does marking the field as volatile. There are plenty of other things that one could do that also implicitly add memory barriers; one of them is entering or leaving a lock statement.`
Since your loop is saying within the body of a single lock and not entering or leaving it, it's free to continue using the cached value.
Of course, the solution here isn't to add in a memory barrier. If you want to wait for another thread to notify you of when you should continue on, use a AutoResetEvent (or another similar synchronization tool specifically designed to allow threads to communicate).

how about this
public class Singleton<T> where T : class, new()
{
private static T _instance;
private static object _syncRoot = new Object();
public static T Instance
{
get
{
var instance = _instance;
if (instance == null)
{
lock (_syncRoot)
{
instance = Volatile.Read(ref _instance);
if (instance == null)
{
instance = new T();
}
Volatile.Write(ref _instance, instance);
}
}
return instance;
}
}
}

Related

When can I guarantee value changed on one thread is visible to other threads?

I understand that a thread can cache a value and ignore changes made on another thread, but I'm wondering about a variation of this. Is it possible for a thread to change a cached value, which then never leaves its cache and so is never visible to other threads?
For example, could this code print "Flag is true" because thread a never sees the change that thread b makes to flag? (I can't make it do so, but I can't prove that it, or some variation of it, wouldn't.)
var flag = true;
var a = new Thread(() => {
Thread.Sleep(200);
Console.WriteLine($"Flag is {flag}");
});
var b = new Thread(() => {
flag = false;
while (true) {
// Do something to avoid a memory barrier
}
});
a.Start();
b.Start();
a.Join();
I can imagine that on thread b flag could be cached in a CPU register where it is then set to false, and when b enters the while loop it never gets the chance to (or never cares to) write the value of flag back to memory, hence a always sees flag as true.
From the memory barrier generators listed in this answer this seems, to me, to be possible in theory. Am I correct? I haven't been able to demonstrate it in practice. Can anyone come up with a example that does?

Is it possible for a thread to change a cached value, which then never leaves its cache and so is never visible to other threads?
If we're talking literally about the hardware caches, then we need to talk about specific processor families. And if you're working (as seems likely) on x86 (and x64), you need to be aware that those processors actually have a far stronger memory model than is required for .NET. In x86 systems, the caches maintain coherency, and so no write can be ignored by other processors.
If we're talking about the optimization wherein a particular memory location has been read into a processor register and then a subsequent read from memory just reuses the register, then there isn't a similar analogue on the write side. You'll note that there's always at least one read from the actual memory location before we assume that nothing else is changing that memory location and so we can reuse the register.
On the write side, we've been told to push something to a particular memory location. We have to at least push to that location once, and it would likely be a deoptimization to always store the previously known value at that location (especially if our thread never reads from it) in a separate register just to be able to perform a comparison and elide the write operation.

In order from easiest to do correctly to easiest to screw up
Use locks when reading/writing
Use the functions on the Interlocked class
Use memory barriers

I'm not entirely sure that this answers your question, but here goes.
If you run the release (not debug) version of the following code, it will never terminate because waitForFlag() never sees the changed version of flag.
However, if you comment-out either of the indicated lines, the program will terminate.
It looks like making any call to an external library in the while (flag) loop causes the optimiser to not cache the value of flag.
Also (of course) making flag volatile will prevent such an optimisation.
using System;
using System.Threading;
using System.Threading.Tasks;
namespace Demo
{
class Program
{
void run()
{
Task.Factory.StartNew(resetFlagAfter1s);
var waiter = Task.Factory.StartNew(waitForFlag);
waiter.Wait();
Console.WriteLine("Done");
}
void resetFlagAfter1s()
{
Thread.Sleep(1000);
flag = false;
}
void waitForFlag()
{
int x = 0;
while (flag)
{
++x;
// Uncommenting this line make this thread see the changed value of flag.
// Console.WriteLine("Spinning");
}
}
// Uncommenting "volatile" makes waitForFlag() see the changed value of flag.
/*volatile*/ bool flag = true;
static void Main()
{
new Program().run();
}
}
}

Research Thread.MemoryBarrier and you will be golden.

Static member as volatile [duplicate]

Can anyone provide a good explanation of the volatile keyword in C#? Which problems does it solve and which it doesn't? In which cases will it save me the use of locking?

I don't think there's a better person to answer this than Eric Lippert (emphasis in the original):
In C#, "volatile" means not only "make sure that the compiler and the
jitter do not perform any code reordering or register caching
optimizations on this variable". It also means "tell the processors to
do whatever it is they need to do to ensure that I am reading the
latest value, even if that means halting other processors and making
them synchronize main memory with their caches".
Actually, that last bit is a lie. The true semantics of volatile reads
and writes are considerably more complex than I've outlined here; in
fact they do not actually guarantee that every processor stops what it
is doing and updates caches to/from main memory. Rather, they provide
weaker guarantees about how memory accesses before and after reads and
writes may be observed to be ordered with respect to each other.
Certain operations such as creating a new thread, entering a lock, or
using one of the Interlocked family of methods introduce stronger
guarantees about observation of ordering. If you want more details,
read sections 3.10 and 10.5.3 of the C# 4.0 specification.
Frankly, I discourage you from ever making a volatile field. Volatile
fields are a sign that you are doing something downright crazy: you're
attempting to read and write the same value on two different threads
without putting a lock in place. Locks guarantee that memory read or
modified inside the lock is observed to be consistent, locks guarantee
that only one thread accesses a given chunk of memory at a time, and so
on. The number of situations in which a lock is too slow is very
small, and the probability that you are going to get the code wrong
because you don't understand the exact memory model is very large. I
don't attempt to write any low-lock code except for the most trivial
usages of Interlocked operations. I leave the usage of "volatile" to
real experts.
For further reading see:
Understand the Impact of Low-Lock Techniques in Multithreaded Apps
Sayonara volatile

If you want to get slightly more technical about what the volatile keyword does, consider the following program (I'm using DevStudio 2005):
#include <iostream>
void main()
{
int j = 0;
for (int i = 0 ; i < 100 ; ++i)
{
j += i;
}
for (volatile int i = 0 ; i < 100 ; ++i)
{
j += i;
}
std::cout << j;
}
Using the standard optimised (release) compiler settings, the compiler creates the following assembler (IA32):
void main()
{
00401000 push ecx
int j = 0;
00401001 xor ecx,ecx
for (int i = 0 ; i < 100 ; ++i)
00401003 xor eax,eax
00401005 mov edx,1
0040100A lea ebx,[ebx]
{
j += i;
00401010 add ecx,eax
00401012 add eax,edx
00401014 cmp eax,64h
00401017 jl main+10h (401010h)
}
for (volatile int i = 0 ; i < 100 ; ++i)
00401019 mov dword ptr [esp],0
00401020 mov eax,dword ptr [esp]
00401023 cmp eax,64h
00401026 jge main+3Eh (40103Eh)
00401028 jmp main+30h (401030h)
0040102A lea ebx,[ebx]
{
j += i;
00401030 add ecx,dword ptr [esp]
00401033 add dword ptr [esp],edx
00401036 mov eax,dword ptr [esp]
00401039 cmp eax,64h
0040103C jl main+30h (401030h)
}
std::cout << j;
0040103E push ecx
0040103F mov ecx,dword ptr [__imp_std::cout (40203Ch)]
00401045 call dword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (402038h)]
}
0040104B xor eax,eax
0040104D pop ecx
0040104E ret
Looking at the output, the compiler has decided to use the ecx register to store the value of the j variable. For the non-volatile loop (the first) the compiler has assigned i to the eax register. Fairly straightforward. There are a couple of interesting bits though - the lea ebx,[ebx] instruction is effectively a multibyte nop instruction so that the loop jumps to a 16 byte aligned memory address. The other is the use of edx to increment the loop counter instead of using an inc eax instruction. The add reg,reg instruction has lower latency on a few IA32 cores compared to the inc reg instruction, but never has higher latency.
Now for the loop with the volatile loop counter. The counter is stored at [esp] and the volatile keyword tells the compiler the value should always be read from/written to memory and never assigned to a register. The compiler even goes so far as to not do a load/increment/store as three distinct steps (load eax, inc eax, save eax) when updating the counter value, instead the memory is directly modified in a single instruction (an add mem,reg). The way the code has been created ensures the value of the loop counter is always up-to-date within the context of a single CPU core. No operation on the data can result in corruption or data loss (hence not using the load/inc/store since the value can change during the inc thus being lost on the store). Since interrupts can only be serviced once the current instruction has completed, the data can never be corrupted, even with unaligned memory.
Once you introduce a second CPU to the system, the volatile keyword won't guard against the data being updated by another CPU at the same time. In the above example, you would need the data to be unaligned to get a potential corruption. The volatile keyword won't prevent potential corruption if the data cannot be handled atomically, for example, if the loop counter was of type long long (64 bits) then it would require two 32 bit operations to update the value, in the middle of which an interrupt can occur and change the data.
So, the volatile keyword is only good for aligned data which is less than or equal to the size of the native registers such that operations are always atomic.
The volatile keyword was conceived to be used with IO operations where the IO would be constantly changing but had a constant address, such as a memory mapped UART device, and the compiler shouldn't keep reusing the first value read from the address.
If you're handling large data or have multiple CPUs then you'll need a higher level (OS) locking system to handle the data access properly.

If you are using .NET 1.1, the volatile keyword is needed when doing double checked locking. Why? Because prior to .NET 2.0, the following scenario could cause a second thread to access an non-null, yet not fully constructed object:
Thread 1 asks if a variable is null.
//if(this.foo == null)
Thread 1 determines the variable is null, so enters a lock.
//lock(this.bar)
Thread 1 asks AGAIN if the variable is null.
//if(this.foo == null)
Thread 1 still determines the variable is null, so it calls a constructor and assigns the value to the variable.
//this.foo = new Foo();
Prior to .NET 2.0, this.foo could be assigned the new instance of Foo, before the constructor was finished running. In this case, a second thread could come in (during thread 1's call to Foo's constructor) and experience the following:
Thread 2 asks if variable is null.
//if(this.foo == null)
Thread 2 determines the variable is NOT null, so tries to use it.
//this.foo.MakeFoo()
Prior to .NET 2.0, you could declare this.foo as being volatile to get around this problem. Since .NET 2.0, you no longer need to use the volatile keyword to accomplish double checked locking.
Wikipedia actually has a good article on Double Checked Locking, and briefly touches on this topic:
http://en.wikipedia.org/wiki/Double-checked_locking

Sometimes, the compiler will optimize a field and use a register to store it. If thread 1 does a write to the field and another thread accesses it, since the update was stored in a register (and not memory), the 2nd thread would get stale data.
You can think of the volatile keyword as saying to the compiler "I want you to store this value in memory". This guarantees that the 2nd thread retrieves the latest value.

From MSDN:
The volatile modifier is usually used for a field that is accessed by multiple threads without using the lock statement to serialize access. Using the volatile modifier ensures that one thread retrieves the most up-to-date value written by another thread.

The CLR likes to optimize instructions, so when you access a field in code it might not always access the current value of the field (it might be from the stack, etc). Marking a field as volatile ensures that the current value of the field is accessed by the instruction. This is useful when the value can be modified (in a non-locking scenario) by a concurrent thread in your program or some other code running in the operating system.
You obviously lose some optimization, but it does keep the code more simple.

Simply looking into the official page for volatile keyword you can see an example of typical usage.
public class Worker
{
public void DoWork()
{
bool work = false;
while (!_shouldStop)
{
work = !work; // simulate some work
}
Console.WriteLine("Worker thread: terminating gracefully.");
}
public void RequestStop()
{
_shouldStop = true;
}
private volatile bool _shouldStop;
}
With the volatile modifier added to the declaration of _shouldStop in place, you'll always get the same results. However, without that modifier on the _shouldStop member, the behavior is unpredictable.
So this is definitely not something downright crazy.
There exists Cache coherence that is responsible for CPU caches consistency.
Also if CPU employs strong memory model (as x86)
As a result, reads and writes of volatile fields require no special instructions on the x86: Ordinary reads and writes (for example, using the MOV instruction) are sufficient.
Example from C# 5.0 specification (chapter 10.5.3)
using System;
using System.Threading;
class Test
{
public static int result;
public static volatile bool finished;
static void Thread2() {
result = 143;
finished = true;
}
static void Main() {
finished = false;
new Thread(new ThreadStart(Thread2)).Start();
for (;;) {
if (finished) {
Console.WriteLine("result = {0}", result);
return;
}
}
}
}
produces the output: result = 143
If the field finished had not been declared volatile, then it would be permissible for the store to result to be visible to the main thread after the store to finished, and hence for the main thread to read the value 0 from the field result.
Volatile behavior is platform dependent so you should always consider using volatile when needed by case to be sure it satisfies your needs.
Even volatile could not prevent (all kind of) reordering (C# - The C# Memory Model in Theory and Practice, Part 2)
Even though the write to A is volatile and the read from A_Won is also volatile, the fences are both one-directional, and in fact allow this reordering.
So I believe if you want to know when to use volatile (vs lock vs Interlocked) you should get familiar with memory fences (full, half) and needs of a synchronization. Then you get your precious answer yourself for your good.

I found this article by Joydip Kanjilal very helpful!
When you mark an object or a variable as volatile, it becomes a candidate for volatile reads and writes. It should be noted that in C# all memory writes are volatile irrespective of whether you are writing data to a volatile or a non-volatile object. However, the ambiguity happens when you are reading data. When you are reading data that is non-volatile, the executing thread may or may not always get the latest value. If the object is volatile, the thread always gets the most up-to-date value
I'll just leave it here for reference

The compiler sometimes changes the order of statements in code to optimize it. Normally this is not a problem in single-threaded environment, but it might be an issue in multi-threaded environment. See following example:
private static int _flag = 0;
private static int _value = 0;
var t1 = Task.Run(() =>
{
_value = 10; /* compiler could switch these lines */
_flag = 5;
});
var t2 = Task.Run(() =>
{
if (_flag == 5)
{
Console.WriteLine("Value: {0}", _value);
}
});
If you run t1 and t2, you would expect no output or "Value: 10" as the result. It could be that the compiler switches line inside t1 function. If t2 then executes, it could be that _flag has value of 5, but _value has 0. So expected logic could be broken.
To fix this you can use volatile keyword that you can apply to the field. This statement disables the compiler optimizations so you can force the correct order in you code.
private static volatile int _flag = 0;
You should use volatile only if you really need it, because it disables certain compiler optimizations, it will hurt performance. It's also not supported by all .NET languages (Visual Basic doesn't support it), so it hinders language interoperability.

So to sum up all this, the correct answer to the question is:
If your code is running in the 2.0 runtime or later, the volatile keyword is almost never needed and does more harm than good if used unnecessarily. I.E. Don't ever use it. BUT in earlier versions of the runtime, it IS needed for proper double check locking on static fields. Specifically static fields whose class has static class initialization code.

multiple threads can access a variable.
The latest update will be on the variable

Volatile fields: How can I actually get the latest written value to a field?

Considering the following example:
private int sharedState = 0;
private void FirstThread() {
Volatile.Write(ref sharedState, 1);
}
private void SecondThread() {
int sharedStateSnapshot = Volatile.Read(ref sharedState);
Console.WriteLine(sharedStateSnapshot);
}
Until recently, I was under the impression that, as long as FirstThread() really did execute before SecondThread(), this program could not output anything but 1.
However, my understanding now is that:
Volatile.Write() emits a release fence. This means no preceding load or store (in program order) may happen after the assignment of 1 to sharedState.
Volatile.Read() emits an acquire fence. This means no subsequent load or store (in program order) may happen before the copying of sharedState to sharedStateSnapshot.
Or, to put it another way:
When sharedState is actually released to all processor cores, everything preceding that write will also be released, and,
When the value in the address sharedStateSnapshot is acquired; sharedState must have been already acquired.
If my understanding is therefore correct, then there is nothing to prevent the acquisition of sharedState being 'stale', if the write in FirstThread() has not already been released.
If this is true, how can we actually ensure (assuming the weakest processor memory model, such as ARM or Alpha), that the program will always print 1? (Or have I made an error in my mental model somewhere?)

Your understanding is correct, and it is true that you cannot ensure that the program will always print 1 using these techniques. To ensure your program will print 1, assuming thread 2 runs after thread one, you need two fences on each thread.
The easiest way to achieve that is using the lock keyword:
private int sharedState = 0;
private readonly object locker = new object();
private void FirstThread()
{
lock (locker)
{
sharedState = 1;
}
}
private void SecondThread()
{
int sharedStateSnapshot;
lock (locker)
{
sharedStateSnapshot = sharedState;
}
Console.WriteLine(sharedStateSnapshot);
}
I'd like to quote Eric Lippert:
Frankly, I discourage you from ever making a volatile field. Volatile fields are a sign that you are doing something downright crazy: you're attempting to read and write the same value on two different threads without putting a lock in place.
The same applies to calling Volatile.Read and Volatile.Write. In fact, they are even worse than volatile fields, since they require you to do manually what the volatile modifier does automatically.

You're right, there's no guarantee that release stores will be immediately visible to all processors. Volatile.Read and Volatile.Write give you acquire/release semantics, but no immediacy guarantees.
The volatile modifier seems to do this though. The compiler will emit an OpCodes.Volatile IL instruction, and the jitter will tell the processor not to store the variable on any of its registers (see Hans Passant's answer).
But why do you need it to be immediate anyway? What if your SecondThread happens to run a couple of milliseconds sooner, before the values are actually wrote? Seeing as the scheduling is non-deterministic, the correctness of your program shouldn't depend on this "immediacy" anyway.

Until recently, I was under the impression that, as long as
FirstThread() really did execute before SecondThread(), this program
could not output anything but 1.
As you go on to explain yourself, this impression is wrong. Volatile.Read simply issues a read operation on its target followed by a memory barrier; the memory barrier prevents operation reordering on the processor executing the current thread but this does not help here because
There are no operations to reorder (just the single read or write in each thread).
The race condition across your threads means that even if the no-reorder guarantee applied across processors, it would simply mean that the order of operations which you cannot predict anyway would be preserved.
If my understanding is therefore correct, then there is nothing to
prevent the acquisition of sharedState being 'stale', if the write in
FirstThread() has not already been released.
That is correct. In essence you are using a tool designed to help with weak memory models against a possible problem caused by a race condition. The tool won't help you because that's not what it does.
If this is true, how can we actually ensure (assuming the weakest
processor memory model, such as ARM or Alpha), that the program will
always print 1? (Or have I made an error in my mental model
somewhere?)
To stress once again: the memory model is not the problem here. To ensure that your program will always print 1 you need to do two things:
Provide explicit thread synchronization that guarantees the write will happen before the read (in the simplest case, SecondThread can use a spin lock on a flag which FirstThread uses to signal it's done).
Ensure that SecondThread will not read a stale value. You can do this trivially by marking sharedState as volatile -- while this keyword has deservedly gotten much flak, it was designed explicitly for such use cases.
So in the simplest case you could for example have:
private volatile int sharedState = 0;
private volatile bool spinLock = false;
private void FirstThread()
{
sharedState = 1;
// ensure lock is released after the shared state write!
Volatile.Write(ref spinLock, true);
}
private void SecondThread()
{
SpinWait.SpinUntil(() => spinLock);
Console.WriteLine(sharedState);
}
Assuming no other writes to the two fields, this program is guaranteed to output nothing other than 1.

The need for volatile modifier in double checked locking in .NET

Multiple texts say that when implementing double-checked locking in .NET the field you are locking on should have volatile modifier applied. But why exactly? Considering the following example:
public sealed class Singleton
{
private static volatile Singleton instance;
private static object syncRoot = new Object();
private Singleton() {}
public static Singleton Instance
{
get
{
if (instance == null)
{
lock (syncRoot)
{
if (instance == null)
instance = new Singleton();
}
}
return instance;
}
}
}
why doesn't "lock (syncRoot)" accomplish the necessary memory consistency? Isn't it true that after "lock" statement both read and write would be volatile and so the necessary consistency would be accomplished?

Volatile is unnecessary. Well, sort of**
volatile is used to create a memory barrier* between reads and writes on the variable.
lock, when used, causes memory barriers to be created around the block inside the lock, in addition to limiting access to the block to one thread.
Memory barriers make it so each thread reads the most current value of the variable (not a local value cached in some register) and that the compiler doesn't reorder statements. Using volatile is unnecessary** because you've already got a lock.
Joseph Albahari explains this stuff way better than I ever could.
And be sure to check out Jon Skeet's guide to implementing the singleton in C#
update:
*volatile causes reads of the variable to be VolatileReads and writes to be VolatileWrites, which on x86 and x64 on CLR, are implemented with a MemoryBarrier. They may be finer grained on other systems.
**my answer is only correct if you are using the CLR on x86 and x64 processors. It might be true in other memory models, like on Mono (and other implementations), Itanium64 and future hardware. This is what Jon is referring to in his article in the "gotchas" for double checked locking.
Doing one of {marking the variable as volatile, reading it with Thread.VolatileRead, or inserting a call to Thread.MemoryBarrier} might be necessary for the code to work properly in a weak memory model situation.
From what I understand, on the CLR (even on IA64), writes are never reordered (writes always have release semantics). However, on IA64, reads may be reordered to come before writes, unless they are marked volatile. Unfortuantely, I do not have access to IA64 hardware to play with, so anything I say about it would be speculation.
i've also found these articles helpful:
http://www.codeproject.com/KB/tips/MemoryBarrier.aspx
vance morrison's article (everything links to this, it talks about double checked locking)
chris brumme's article (everything links to this)
Joe Duffy: Broken Variants of Double Checked Locking
luis abreu's series on multithreading give a nice overview of the concepts too
http://msmvps.com/blogs/luisabreu/archive/2009/06/29/multithreading-load-and-store-reordering.aspx
http://msmvps.com/blogs/luisabreu/archive/2009/07/03/multithreading-introducing-memory-fences.aspx

There is a way to implement it without volatile field. I'll explain it...
I think that it is memory access reordering inside the lock that is dangerous, such that you can get a not completelly initialized instance outside of the lock. To avoid this I do this:
public sealed class Singleton
{
private static Singleton instance;
private static object syncRoot = new Object();
private Singleton() {}
public static Singleton Instance
{
get
{
// very fast test, without implicit memory barriers or locks
if (instance == null)
{
lock (syncRoot)
{
if (instance == null)
{
var temp = new Singleton();
// ensures that the instance is well initialized,
// and only then, it assigns the static variable.
System.Threading.Thread.MemoryBarrier();
instance = temp;
}
}
}
return instance;
}
}
}
Understanding the code
Imagine that there are some initialization code inside the constructor of the Singleton class. If these instructions are reordered after the field is set with the address of the new object, then you have an incomplete instance... imagine that the class has this code:
private int _value;
public int Value { get { return this._value; } }
private Singleton()
{
this._value = 1;
}
Now imagine a call to the constructor using the new operator:
instance = new Singleton();
This can be expanded to these operations:
ptr = allocate memory for Singleton;
set ptr._value to 1;
set Singleton.instance to ptr;
What if I reorder these instructions like this:
ptr = allocate memory for Singleton;
set Singleton.instance to ptr;
set ptr._value to 1;
Does it make a difference? NO if you think of a single thread. YES if you think of multiple threads... what if the thread is interruped just after set instance to ptr:
ptr = allocate memory for Singleton;
set Singleton.instance to ptr;
-- thread interruped here, this can happen inside a lock --
set ptr._value to 1; -- Singleton.instance is not completelly initialized
That is what the memory barrier avoids, by not allowing memory access reordering:
ptr = allocate memory for Singleton;
set temp to ptr; // temp is a local variable (that is important)
set ptr._value to 1;
-- memory barrier... cannot reorder writes after this point, or reads before it --
-- Singleton.instance is still null --
set Singleton.instance to temp;
Happy coding!

I don't think anybody has actually answered the question, so I'll give it a try.
The volatile and the first if (instance == null) are not "necessary". The lock will make this code thread-safe.
So the question is: why would you add the first if (instance == null)?
The reason is presumably to avoid executing the locked section of code unnecessarily. While you are executing the code inside the lock, any other thread that tries to also execute that code is blocked, which will slow your program down if you try to access the singleton frequently from many threads. Depending on the language/platform, there could also be overheads from the lock itself that you wish to avoid.
So the first null check is added as a really quick way to see if you need the lock. If you don't need to create the singleton, you can avoid the lock entirely.
But you can't check if the reference is null without locking it in some way, because due to processor caching, another thread could change it and you would read a "stale" value that would lead you to enter the lock unnecessarily. But you're trying to avoid a lock!
So you make the singleton volatile to ensure that you read the latest value, without needing to use a lock.
You still need the inner lock because volatile only protects you during a single access to the variable - you can't test-and-set it safely without using a lock.
Now, is this actually useful?
Well I would say "in most cases, no".
If Singleton.Instance could cause inefficiency due to the locks, then why are you calling it so frequently that this would be a significant problem? The whole point of a singleton is that there is only one, so your code can read and cache the singleton reference once.
The only case I can think of where this caching wouldn't be possible would be when you have a large number of threads (e.g. a server using a new thread to process every request could be creating millions of very short-running threads, each of which would have to call Singleton.Instance once).
So I suspect that double checked locking is a mechanism that has a real place in very specific performance-critical cases, and then everybody has clambered on the "this is the proper way to do it" bandwagon without actually thinking what it does and whether it will actually be necessary in the case they are using it for.

You should use volatile with the double check lock pattern.
Most people point to this article as proof you do not need volatile:
https://msdn.microsoft.com/en-us/magazine/cc163715.aspx#S10
But they fail to read to the end:
"A Final Word of Warning - I am only guessing at the x86 memory model from observed behavior on existing processors. Thus low-lock techniques are also fragile because hardware and compilers can get more aggressive over time. Here are some strategies to minimize the impact of this fragility on your code. First, whenever possible, avoid low-lock techniques. (...) Finally, assume the weakest memory model possible, using volatile declarations instead of relying on implicit guarantees."
If you need more convincing then read this article on the ECMA spec will be used for other platforms:
msdn.microsoft.com/en-us/magazine/jj863136.aspx
If you need further convincing read this newer article that optimizations may be put in that prevent it from working without volatile:
msdn.microsoft.com/en-us/magazine/jj883956.aspx
In summary it "might" work for you without volatile for the moment, but don't chance it write proper code and either use volatile or the volatileread/write methods. Articles that suggest to do otherwise are sometimes leaving out some of the possible risks of JIT/compiler optimizations that could impact your code, as well us future optimizations that may happen that could break your code. Also as mentioned assumptions in the last article previous assumptions of working without volatile already may not hold on ARM.

AFAIK (and - take this with caution, I'm not doing a lot of concurrent stuff) no. The lock just gives you synchronization between multiple contenders (threads).
volatile on the other hand tells your machine to reevaluate the value every time, so that you don't stumble upon a cached (and wrong) value.
See http://msdn.microsoft.com/en-us/library/ms998558.aspx and note the following quote:
Also, the variable is declared to be volatile to ensure that assignment to the instance variable completes before the instance variable can be accessed.
A description of volatile: http://msdn.microsoft.com/en-us/library/x13ttww7%28VS.71%29.aspx

I think that I've found what I was looking for. Details are in this article - http://msdn.microsoft.com/en-us/magazine/cc163715.aspx#S10.
To sum up - in .NET volatile modifier is indeed not needed in this situation. However in weaker memory models writes made in constructor of lazily initiated object may be delayed after write to the field, so other threads might read corrupt non-null instance in the first if statement.

The lock is sufficient. The MS language spec (3.0) itself mentions this exact scenario in §8.12, without any mention of volatile:
A better approach is to synchronize
access to static data by locking a
private static object. For example:
class Cache
{
private static object synchronizationObject = new object();
public static void Add(object x) {
lock (Cache.synchronizationObject) {
...
}
}
public static void Remove(object x) {
lock (Cache.synchronizationObject) {
...
}
}
}

This a pretty good post about using volatile with double checked locking:
http://tech.puredanger.com/2007/06/15/double-checked-locking/
In Java, if the aim is to protect a variable you don't need to lock if it's marked as volatile

When should the volatile keyword be used in C#?

Can anyone provide a good explanation of the volatile keyword in C#? Which problems does it solve and which it doesn't? In which cases will it save me the use of locking?

If you want to get slightly more technical about what the volatile keyword does, consider the following program (I'm using DevStudio 2005):
#include <iostream>
void main()
{
int j = 0;
for (int i = 0 ; i < 100 ; ++i)
{
j += i;
}
for (volatile int i = 0 ; i < 100 ; ++i)
{
j += i;
}
std::cout << j;
}
Using the standard optimised (release) compiler settings, the compiler creates the following assembler (IA32):
void main()
{
00401000 push ecx
int j = 0;
00401001 xor ecx,ecx
for (int i = 0 ; i < 100 ; ++i)
00401003 xor eax,eax
00401005 mov edx,1
0040100A lea ebx,[ebx]
{
j += i;
00401010 add ecx,eax
00401012 add eax,edx
00401014 cmp eax,64h
00401017 jl main+10h (401010h)
}
for (volatile int i = 0 ; i < 100 ; ++i)
00401019 mov dword ptr [esp],0
00401020 mov eax,dword ptr [esp]
00401023 cmp eax,64h
00401026 jge main+3Eh (40103Eh)
00401028 jmp main+30h (401030h)
0040102A lea ebx,[ebx]
{
j += i;
00401030 add ecx,dword ptr [esp]
00401033 add dword ptr [esp],edx
00401036 mov eax,dword ptr [esp]
00401039 cmp eax,64h
0040103C jl main+30h (401030h)
}
std::cout << j;
0040103E push ecx
0040103F mov ecx,dword ptr [__imp_std::cout (40203Ch)]
00401045 call dword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (402038h)]
}
0040104B xor eax,eax
0040104D pop ecx
0040104E ret
Looking at the output, the compiler has decided to use the ecx register to store the value of the j variable. For the non-volatile loop (the first) the compiler has assigned i to the eax register. Fairly straightforward. There are a couple of interesting bits though - the lea ebx,[ebx] instruction is effectively a multibyte nop instruction so that the loop jumps to a 16 byte aligned memory address. The other is the use of edx to increment the loop counter instead of using an inc eax instruction. The add reg,reg instruction has lower latency on a few IA32 cores compared to the inc reg instruction, but never has higher latency.
Now for the loop with the volatile loop counter. The counter is stored at [esp] and the volatile keyword tells the compiler the value should always be read from/written to memory and never assigned to a register. The compiler even goes so far as to not do a load/increment/store as three distinct steps (load eax, inc eax, save eax) when updating the counter value, instead the memory is directly modified in a single instruction (an add mem,reg). The way the code has been created ensures the value of the loop counter is always up-to-date within the context of a single CPU core. No operation on the data can result in corruption or data loss (hence not using the load/inc/store since the value can change during the inc thus being lost on the store). Since interrupts can only be serviced once the current instruction has completed, the data can never be corrupted, even with unaligned memory.
Once you introduce a second CPU to the system, the volatile keyword won't guard against the data being updated by another CPU at the same time. In the above example, you would need the data to be unaligned to get a potential corruption. The volatile keyword won't prevent potential corruption if the data cannot be handled atomically, for example, if the loop counter was of type long long (64 bits) then it would require two 32 bit operations to update the value, in the middle of which an interrupt can occur and change the data.
So, the volatile keyword is only good for aligned data which is less than or equal to the size of the native registers such that operations are always atomic.
The volatile keyword was conceived to be used with IO operations where the IO would be constantly changing but had a constant address, such as a memory mapped UART device, and the compiler shouldn't keep reusing the first value read from the address.
If you're handling large data or have multiple CPUs then you'll need a higher level (OS) locking system to handle the data access properly.

If you are using .NET 1.1, the volatile keyword is needed when doing double checked locking. Why? Because prior to .NET 2.0, the following scenario could cause a second thread to access an non-null, yet not fully constructed object:
Thread 1 asks if a variable is null.
//if(this.foo == null)
Thread 1 determines the variable is null, so enters a lock.
//lock(this.bar)
Thread 1 asks AGAIN if the variable is null.
//if(this.foo == null)
Thread 1 still determines the variable is null, so it calls a constructor and assigns the value to the variable.
//this.foo = new Foo();
Prior to .NET 2.0, this.foo could be assigned the new instance of Foo, before the constructor was finished running. In this case, a second thread could come in (during thread 1's call to Foo's constructor) and experience the following:
Thread 2 asks if variable is null.
//if(this.foo == null)
Thread 2 determines the variable is NOT null, so tries to use it.
//this.foo.MakeFoo()
Prior to .NET 2.0, you could declare this.foo as being volatile to get around this problem. Since .NET 2.0, you no longer need to use the volatile keyword to accomplish double checked locking.
Wikipedia actually has a good article on Double Checked Locking, and briefly touches on this topic:
http://en.wikipedia.org/wiki/Double-checked_locking

Sometimes, the compiler will optimize a field and use a register to store it. If thread 1 does a write to the field and another thread accesses it, since the update was stored in a register (and not memory), the 2nd thread would get stale data.
You can think of the volatile keyword as saying to the compiler "I want you to store this value in memory". This guarantees that the 2nd thread retrieves the latest value.

From MSDN:
The volatile modifier is usually used for a field that is accessed by multiple threads without using the lock statement to serialize access. Using the volatile modifier ensures that one thread retrieves the most up-to-date value written by another thread.

The CLR likes to optimize instructions, so when you access a field in code it might not always access the current value of the field (it might be from the stack, etc). Marking a field as volatile ensures that the current value of the field is accessed by the instruction. This is useful when the value can be modified (in a non-locking scenario) by a concurrent thread in your program or some other code running in the operating system.
You obviously lose some optimization, but it does keep the code more simple.

Simply looking into the official page for volatile keyword you can see an example of typical usage.
public class Worker
{
public void DoWork()
{
bool work = false;
while (!_shouldStop)
{
work = !work; // simulate some work
}
Console.WriteLine("Worker thread: terminating gracefully.");
}
public void RequestStop()
{
_shouldStop = true;
}
private volatile bool _shouldStop;
}
With the volatile modifier added to the declaration of _shouldStop in place, you'll always get the same results. However, without that modifier on the _shouldStop member, the behavior is unpredictable.
So this is definitely not something downright crazy.
There exists Cache coherence that is responsible for CPU caches consistency.
Also if CPU employs strong memory model (as x86)
As a result, reads and writes of volatile fields require no special instructions on the x86: Ordinary reads and writes (for example, using the MOV instruction) are sufficient.
Example from C# 5.0 specification (chapter 10.5.3)
using System;
using System.Threading;
class Test
{
public static int result;
public static volatile bool finished;
static void Thread2() {
result = 143;
finished = true;
}
static void Main() {
finished = false;
new Thread(new ThreadStart(Thread2)).Start();
for (;;) {
if (finished) {
Console.WriteLine("result = {0}", result);
return;
}
}
}
}
produces the output: result = 143
If the field finished had not been declared volatile, then it would be permissible for the store to result to be visible to the main thread after the store to finished, and hence for the main thread to read the value 0 from the field result.
Volatile behavior is platform dependent so you should always consider using volatile when needed by case to be sure it satisfies your needs.
Even volatile could not prevent (all kind of) reordering (C# - The C# Memory Model in Theory and Practice, Part 2)
Even though the write to A is volatile and the read from A_Won is also volatile, the fences are both one-directional, and in fact allow this reordering.
So I believe if you want to know when to use volatile (vs lock vs Interlocked) you should get familiar with memory fences (full, half) and needs of a synchronization. Then you get your precious answer yourself for your good.

I found this article by Joydip Kanjilal very helpful!
When you mark an object or a variable as volatile, it becomes a candidate for volatile reads and writes. It should be noted that in C# all memory writes are volatile irrespective of whether you are writing data to a volatile or a non-volatile object. However, the ambiguity happens when you are reading data. When you are reading data that is non-volatile, the executing thread may or may not always get the latest value. If the object is volatile, the thread always gets the most up-to-date value
I'll just leave it here for reference

The compiler sometimes changes the order of statements in code to optimize it. Normally this is not a problem in single-threaded environment, but it might be an issue in multi-threaded environment. See following example:
private static int _flag = 0;
private static int _value = 0;
var t1 = Task.Run(() =>
{
_value = 10; /* compiler could switch these lines */
_flag = 5;
});
var t2 = Task.Run(() =>
{
if (_flag == 5)
{
Console.WriteLine("Value: {0}", _value);
}
});
If you run t1 and t2, you would expect no output or "Value: 10" as the result. It could be that the compiler switches line inside t1 function. If t2 then executes, it could be that _flag has value of 5, but _value has 0. So expected logic could be broken.
To fix this you can use volatile keyword that you can apply to the field. This statement disables the compiler optimizations so you can force the correct order in you code.
private static volatile int _flag = 0;
You should use volatile only if you really need it, because it disables certain compiler optimizations, it will hurt performance. It's also not supported by all .NET languages (Visual Basic doesn't support it), so it hinders language interoperability.

So to sum up all this, the correct answer to the question is:
If your code is running in the 2.0 runtime or later, the volatile keyword is almost never needed and does more harm than good if used unnecessarily. I.E. Don't ever use it. BUT in earlier versions of the runtime, it IS needed for proper double check locking on static fields. Specifically static fields whose class has static class initialization code.

multiple threads can access a variable.
The latest update will be on the variable

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.