What is the "volatile" keyword used for?

What is the "volatile" keyword used for? - c#

I read some articles about the volatile keyword but I could not figure out its correct usage. Could you please tell me what it should be used for in C# and in Java?

Consider this example:
int i = 5;
System.out.println(i);
The compiler may optimize this to just print 5, like this:
System.out.println(5);
However, if there is another thread which can change i, this is the wrong behaviour. If another thread changes i to be 6, the optimized version will still print 5.
The volatile keyword prevents such optimization and caching, and thus is useful when a variable can be changed by another thread.

For both C# and Java, "volatile" tells the compiler that the value of a variable must never be cached as its value may change outside of the scope of the program itself. The compiler will then avoid any optimisations that may result in problems if the variable changes "outside of its control".

Reads of volatile fields have acquire semantics. This means that it is guaranteed that the memory read from the volatile variable will occur before any following memory reads. It blocks the compiler from doing the reordering, and if the hardware requires it (weakly ordered CPU), it will use a special instruction to make the hardware flush any reads that occur after the volatile read but were speculatively started early, or the CPU could prevent them from being issued early in the first place, by preventing any speculative load from occurring between the issue of the load acquire and its retirement.
Writes of volatile fields have release semantics. This means that it is guaranteed that any memory writes to the volatile variable are guaranteed to be delayed until all previous memory writes are visible to other processors.
Consider the following example:
something.foo = new Thing();
If foo is a member variable in a class, and other CPUs have access to the object instance referred to by something, they might see the value foo change before the memory writes in the Thing constructor are globally visible! This is what "weakly ordered memory" means. This could occur even if the compiler has all of the stores in the constructor before the store to foo. If foo is volatile then the store to foo will have release semantics, and the hardware guarantees that all of the writes before the write to foo are visible to other processors before allowing the write to foo to occur.
How is it possible for the writes to foo to be reordered so badly? If the cache line holding foo is in the cache, and the stores in the constructor missed the cache, then it is possible for the store to complete much sooner than the writes to the cache misses.
The (awful) Itanium architecture from Intel had weakly ordered memory. The processor used in the original XBox 360 had weakly ordered memory. Many ARM processors, including the very popular ARMv7-A have weakly ordered memory.
Developers often don't see these data races because things like locks will do a full memory barrier, essentially the same thing as acquire and release semantics at the same time. No loads inside the lock can be speculatively executed before the lock is acquired, they are delayed until the lock is acquired. No stores can be delayed across a lock release, the instruction that releases the lock is delayed until all of the writes done inside the lock are globally visible.
A more complete example is the "Double-checked locking" pattern. The purpose of this pattern is to avoid having to always acquire a lock in order to lazy initialize an object.
Snagged from Wikipedia:
public class MySingleton {
private static object myLock = new object();
private static volatile MySingleton mySingleton = null;
private MySingleton() {
}
public static MySingleton GetInstance() {
if (mySingleton == null) { // 1st check
lock (myLock) {
if (mySingleton == null) { // 2nd (double) check
mySingleton = new MySingleton();
// Write-release semantics are implicitly handled by marking
// mySingleton with 'volatile', which inserts the necessary memory
// barriers between the constructor call and the write to mySingleton.
// The barriers created by the lock are not sufficient because
// the object is made visible before the lock is released.
}
}
}
// The barriers created by the lock are not sufficient because not all threads
// will acquire the lock. A fence for read-acquire semantics is needed between
// the test of mySingleton (above) and the use of its contents. This fence
// is automatically inserted because mySingleton is marked as 'volatile'.
return mySingleton;
}
}
In this example, the stores in the MySingleton constructor might not be visible to other processors before the store to mySingleton. If that happens, the other threads that peek at mySingleton will not acquire a lock and they will not necessarily pick up the writes to the constructor.
volatile never prevents caching. What it does is guarantee the order in which other processors "see" writes. A store release will delay a store until all pending writes are complete and a bus cycle has been issued telling other processors to discard/writeback their cache line if they happen to have the relevant lines cached. A load acquire will flush any speculated reads, ensuring that they won't be stale values from the past.

To understand what volatile does to a variable, it's important to understand what happens when the variable is not volatile.
Variable is Non-volatile
When two threads A & B are accessing a non-volatile variable, each thread will maintain a local copy of the variable in it's local cache. Any changes done by thread A in it's local cache won't be visible to the thread B.
Variable is volatile
When variables are declared volatile it essentially means that threads should not cache such a variable or in other words threads should not trust the values of these variables unless they are directly read from the main memory.
So, when to make a variable volatile?
When you have a variable which can be accessed by many threads and you want every thread to get the latest updated value of that variable even if the value is updated by any other thread/process/outside of the program.

The volatile keyword has different meanings in both Java and C#.
Java
From the Java Language Spec :
A field may be declared volatile, in which case the Java memory model ensures that all threads see a consistent value for the variable.
C#
From the C# Reference (retrieved 2021-03-31):
The volatile keyword indicates that a field might be modified by multiple threads that are executing at the same time. The compiler, the runtime system, and even hardware may rearrange reads and writes to memory locations for performance reasons. Fields that are declared volatile are not subject to these optimizations. (...)

In Java, "volatile" is used to tell the JVM that the variable may be used by multiple threads at the same time, so certain common optimizations cannot be applied.
Notably the situation where the two threads accessing the same variable are running on separate CPU's in the same machine. It is very common for CPU's to cache aggressively the data it holds because memory access is very much slower than cache access. This means that if the data is updated in CPU1 it must immediately go through all caches and to main memory instead of when the cache decides to clear itself, so that CPU2 can see the updated value (again by disregarding all caches on the way).

When you are reading data that is non-volatile, the executing thread may or may not always get the updated value.
But if the object is volatile, the thread always gets the most up-to-date value.

Volatile is solving concurrency problem. To make that value in sync. This keyword is mostly use in a threading. When multiple thread updating same variable.

Related

What specifically about double-checked locking fails without volatile in C#? [duplicate]

Multiple texts say that when implementing double-checked locking in .NET the field you are locking on should have volatile modifier applied. But why exactly? Considering the following example:
public sealed class Singleton
{
private static volatile Singleton instance;
private static object syncRoot = new Object();
private Singleton() {}
public static Singleton Instance
{
get
{
if (instance == null)
{
lock (syncRoot)
{
if (instance == null)
instance = new Singleton();
}
}
return instance;
}
}
}
why doesn't "lock (syncRoot)" accomplish the necessary memory consistency? Isn't it true that after "lock" statement both read and write would be volatile and so the necessary consistency would be accomplished?

Volatile is unnecessary. Well, sort of**
volatile is used to create a memory barrier* between reads and writes on the variable.
lock, when used, causes memory barriers to be created around the block inside the lock, in addition to limiting access to the block to one thread.
Memory barriers make it so each thread reads the most current value of the variable (not a local value cached in some register) and that the compiler doesn't reorder statements. Using volatile is unnecessary** because you've already got a lock.
Joseph Albahari explains this stuff way better than I ever could.
And be sure to check out Jon Skeet's guide to implementing the singleton in C#
update:
*volatile causes reads of the variable to be VolatileReads and writes to be VolatileWrites, which on x86 and x64 on CLR, are implemented with a MemoryBarrier. They may be finer grained on other systems.
**my answer is only correct if you are using the CLR on x86 and x64 processors. It might be true in other memory models, like on Mono (and other implementations), Itanium64 and future hardware. This is what Jon is referring to in his article in the "gotchas" for double checked locking.
Doing one of {marking the variable as volatile, reading it with Thread.VolatileRead, or inserting a call to Thread.MemoryBarrier} might be necessary for the code to work properly in a weak memory model situation.
From what I understand, on the CLR (even on IA64), writes are never reordered (writes always have release semantics). However, on IA64, reads may be reordered to come before writes, unless they are marked volatile. Unfortuantely, I do not have access to IA64 hardware to play with, so anything I say about it would be speculation.
i've also found these articles helpful:
http://www.codeproject.com/KB/tips/MemoryBarrier.aspx
vance morrison's article (everything links to this, it talks about double checked locking)
chris brumme's article (everything links to this)
Joe Duffy: Broken Variants of Double Checked Locking
luis abreu's series on multithreading give a nice overview of the concepts too
http://msmvps.com/blogs/luisabreu/archive/2009/06/29/multithreading-load-and-store-reordering.aspx
http://msmvps.com/blogs/luisabreu/archive/2009/07/03/multithreading-introducing-memory-fences.aspx

There is a way to implement it without volatile field. I'll explain it...
I think that it is memory access reordering inside the lock that is dangerous, such that you can get a not completelly initialized instance outside of the lock. To avoid this I do this:
public sealed class Singleton
{
private static Singleton instance;
private static object syncRoot = new Object();
private Singleton() {}
public static Singleton Instance
{
get
{
// very fast test, without implicit memory barriers or locks
if (instance == null)
{
lock (syncRoot)
{
if (instance == null)
{
var temp = new Singleton();
// ensures that the instance is well initialized,
// and only then, it assigns the static variable.
System.Threading.Thread.MemoryBarrier();
instance = temp;
}
}
}
return instance;
}
}
}
Understanding the code
Imagine that there are some initialization code inside the constructor of the Singleton class. If these instructions are reordered after the field is set with the address of the new object, then you have an incomplete instance... imagine that the class has this code:
private int _value;
public int Value { get { return this._value; } }
private Singleton()
{
this._value = 1;
}
Now imagine a call to the constructor using the new operator:
instance = new Singleton();
This can be expanded to these operations:
ptr = allocate memory for Singleton;
set ptr._value to 1;
set Singleton.instance to ptr;
What if I reorder these instructions like this:
ptr = allocate memory for Singleton;
set Singleton.instance to ptr;
set ptr._value to 1;
Does it make a difference? NO if you think of a single thread. YES if you think of multiple threads... what if the thread is interruped just after set instance to ptr:
ptr = allocate memory for Singleton;
set Singleton.instance to ptr;
-- thread interruped here, this can happen inside a lock --
set ptr._value to 1; -- Singleton.instance is not completelly initialized
That is what the memory barrier avoids, by not allowing memory access reordering:
ptr = allocate memory for Singleton;
set temp to ptr; // temp is a local variable (that is important)
set ptr._value to 1;
-- memory barrier... cannot reorder writes after this point, or reads before it --
-- Singleton.instance is still null --
set Singleton.instance to temp;
Happy coding!

I don't think anybody has actually answered the question, so I'll give it a try.
The volatile and the first if (instance == null) are not "necessary". The lock will make this code thread-safe.
So the question is: why would you add the first if (instance == null)?
The reason is presumably to avoid executing the locked section of code unnecessarily. While you are executing the code inside the lock, any other thread that tries to also execute that code is blocked, which will slow your program down if you try to access the singleton frequently from many threads. Depending on the language/platform, there could also be overheads from the lock itself that you wish to avoid.
So the first null check is added as a really quick way to see if you need the lock. If you don't need to create the singleton, you can avoid the lock entirely.
But you can't check if the reference is null without locking it in some way, because due to processor caching, another thread could change it and you would read a "stale" value that would lead you to enter the lock unnecessarily. But you're trying to avoid a lock!
So you make the singleton volatile to ensure that you read the latest value, without needing to use a lock.
You still need the inner lock because volatile only protects you during a single access to the variable - you can't test-and-set it safely without using a lock.
Now, is this actually useful?
Well I would say "in most cases, no".
If Singleton.Instance could cause inefficiency due to the locks, then why are you calling it so frequently that this would be a significant problem? The whole point of a singleton is that there is only one, so your code can read and cache the singleton reference once.
The only case I can think of where this caching wouldn't be possible would be when you have a large number of threads (e.g. a server using a new thread to process every request could be creating millions of very short-running threads, each of which would have to call Singleton.Instance once).
So I suspect that double checked locking is a mechanism that has a real place in very specific performance-critical cases, and then everybody has clambered on the "this is the proper way to do it" bandwagon without actually thinking what it does and whether it will actually be necessary in the case they are using it for.

You should use volatile with the double check lock pattern.
Most people point to this article as proof you do not need volatile:
https://msdn.microsoft.com/en-us/magazine/cc163715.aspx#S10
But they fail to read to the end:
"A Final Word of Warning - I am only guessing at the x86 memory model from observed behavior on existing processors. Thus low-lock techniques are also fragile because hardware and compilers can get more aggressive over time. Here are some strategies to minimize the impact of this fragility on your code. First, whenever possible, avoid low-lock techniques. (...) Finally, assume the weakest memory model possible, using volatile declarations instead of relying on implicit guarantees."
If you need more convincing then read this article on the ECMA spec will be used for other platforms:
msdn.microsoft.com/en-us/magazine/jj863136.aspx
If you need further convincing read this newer article that optimizations may be put in that prevent it from working without volatile:
msdn.microsoft.com/en-us/magazine/jj883956.aspx
In summary it "might" work for you without volatile for the moment, but don't chance it write proper code and either use volatile or the volatileread/write methods. Articles that suggest to do otherwise are sometimes leaving out some of the possible risks of JIT/compiler optimizations that could impact your code, as well us future optimizations that may happen that could break your code. Also as mentioned assumptions in the last article previous assumptions of working without volatile already may not hold on ARM.

AFAIK (and - take this with caution, I'm not doing a lot of concurrent stuff) no. The lock just gives you synchronization between multiple contenders (threads).
volatile on the other hand tells your machine to reevaluate the value every time, so that you don't stumble upon a cached (and wrong) value.
See http://msdn.microsoft.com/en-us/library/ms998558.aspx and note the following quote:
Also, the variable is declared to be volatile to ensure that assignment to the instance variable completes before the instance variable can be accessed.
A description of volatile: http://msdn.microsoft.com/en-us/library/x13ttww7%28VS.71%29.aspx

I think that I've found what I was looking for. Details are in this article - http://msdn.microsoft.com/en-us/magazine/cc163715.aspx#S10.
To sum up - in .NET volatile modifier is indeed not needed in this situation. However in weaker memory models writes made in constructor of lazily initiated object may be delayed after write to the field, so other threads might read corrupt non-null instance in the first if statement.

The lock is sufficient. The MS language spec (3.0) itself mentions this exact scenario in §8.12, without any mention of volatile:
A better approach is to synchronize
access to static data by locking a
private static object. For example:
class Cache
{
private static object synchronizationObject = new object();
public static void Add(object x) {
lock (Cache.synchronizationObject) {
...
}
}
public static void Remove(object x) {
lock (Cache.synchronizationObject) {
...
}
}
}

This a pretty good post about using volatile with double checked locking:
http://tech.puredanger.com/2007/06/15/double-checked-locking/
In Java, if the aim is to protect a variable you don't need to lock if it's marked as volatile

Why make a bool volatile when multithreading in C#?

I understand the volatile keyword in C++ fairly well. But in C#, it appears to take a different meaning, more so related to multi-threading. I thought bool operations were atomic, and I thought that if the operation were atomic, you wouldn't have threading concerns. What am I missing?
https://msdn.microsoft.com/en-us/library/x13ttww7.aspx

I thought bool operations were atomic
They are are indeed atomic.
and I thought that if the operation were atomic, you wouldn't have threading concerns.
That is where you have an incomplete picture. Imagine you have two threads running on separate cores, each with their own cache layers. Thread #1 has foo in its cache, and Thread #2 updates the value of foo. Thread #1 won't see the change unless foo is marked as volatile, acquired a lock, used the Interlocked class, or explicitly called Thread.MemoryBarrier() which would cause the value to be invalidated in the cache. Thus, guaranteeing that you read the most up to date value:
Using the volatile modifier ensures that one thread retrieves the most up-to-date value written by another thread.
Eric Lippert has a great post about volatility where he explains:
The true semantics of volatile reads
and writes are considerably more complex than I've outlined here; in
fact they do not actually guarantee that every processor stops what it
is doing and updates caches to/from main memory. Rather, they provide
weaker guarantees about how memory accesses before and after reads and
writes may be observed to be ordered with respect to each other.
Edit:
As per #xanatos comment, This doesn't mean that volatile guarantees an immediate read, it guarantees a read on the most updated value.

The volatile keyword in C# is all about reading/writing reordering, so it is something quite esoteric.
http://www.albahari.com/threading/part4.aspx#_Memory_Barriers_and_Volatility
(that I consider to be one of the "bibles" about threading) writes:
The volatile keyword instructs the compiler to generate an acquire-fence on every read from that field, and a release-fence on every write to that field. An acquire-fence prevents other reads/writes from being moved before the fence; a release-fence prevents other reads/writes from being moved after the fence. These “half-fences” are faster than full fences because they give the runtime and hardware more scope for optimization.
It is something quite unreadable :-)
Now... What it doesn't mean:
It doesn't mean that a value will be read now/will be written now
it simply means that if you read something from a volatile variable, everything else that has been read/written before this "special" read won't be moved after this "special" read. So it creates a barrier. So paradoxically, by reading from a volatile variable, you guarantee that all the writes you have done to any other variable (volatile or not) at the point of the reading will be done.
A volatile write is something probably more important, and is something that is in part guaranteed by the Intel CPU, and something that wasn't guaranteed by the first version of Java: no write reordering. The problem was:
object myrefthatissharedwithotherthreads = new MyClass(5);
where
class MyClass
{
public int Value;
MyClass(int value)
{
Value = value;
}
}
Now... that expression can be imagined to be:
var temp = new MyClass();
temp.Value = 5;
myrefthatissharedwithotherthreads = temp;
where temp is something generated by the compiler that you can't see.
If the writes can be reordered, you could have:
var temp = new MyClass();
myrefthatissharedwithotherthreads = temp;
temp.Value = 5;
and another thread could see a partially initialized MyClass, because the value of myrefthatissharedwithotherthreads is readable before the class MyClass has finished initializing.

Is lock or volatile required when worker threads write non-competitively to local or class variables?

For the case below, when there is no competition for writes between the worker threads, are locks or volatile still required? Any difference in the answer if "Peek" access is not required at "G".
class A
{
Object _o; // need volatile (position A)?
Int _i; // need volatile (position B)?
Method()
{
Object o;
Int i;
Task [] task = new Task[2]
{
Task.Factory.StartNew(() => {
_o = f1(); // use lock() (position C)?
o = f2(); // use lock() (position D)?
}
Task.Factory.StartNew(() => {
_i = g1(); // use lock() (position E)?
i = g2(); // use lock() (position F)?
}
}
// "Peek" at _o, _i, o, i (position G)?
Task.WaitAll(tasks);
// Use _o, _i, o, i (position H)?
}

The safe thing to do is to not do this in the first place. Don't write a value on one thread and read the value on another thread in the first place. Make a Task<object> and a Task<int> that return the values to the thread that needs them, rather than making tasks that modify variables across threads.
If you are hell bent on writing to variables across threads then you need to guarantee two things. First, that the jitter does not choose optimizations that would cause reads and writes to be moved around in time, and second, that a memory barrier is introduced. The memory barrier limits the processor from moving reads and writes around in time in certain ways.
As Brian Gideon notes in his answer, you get a memory barrier from the WaitAll, but I do not recall offhand if that is a documented guarantee or just an implementation detail.
As I said, I would not do this in the first place. If I were forced to, I would at least make the variables I was writing to marked as volatile.

Writes to reference types (i.e. Object) and word-sized value types (i.e. int in a 32 bit system) are atomic. This means that when you peek at the values (position 6) you can be sure that you either get the old value or the new value, but not something else (if you had a type such as a large struct it could be spliced, and you could read the value when it was half way through being written). You don't need a lock or volatile, so long as you're willing to accept the potential risk of reading stale values.
Note that because there is no memory barrier introduced at this point (a lock or use of volatile both add one) it's possible that the variable has been updated in the other thread, but the current thread isn't observing that change; it can be reading a "stale" value for (potentially) quite some time after it has been changed in the other thread. The use of volatile will ensure that the current thread can observe changes to the variable sooner.
You can be sure that you'll have the appropriate value after the call to WaitAll, even without a lock or volatile.
Also note that while you can be sure the reference to the reference type is written atomically, your program makes no guarantee about the observed order of any changes to the actual object that the reference refers to. Even if, from the point of view of the background thread, the object is initialized before it is assigned to the instance field, it may not happen in that order. The other thread can therefore observe the write of the reference tot he object but then follow that reference and find an object in an initialize, or partially initialized, state. Introducing a memory barrier (i.e. through the use of a volatile variable can potentially allow you to prevent the runtime from making such re-orderings, thus ensuring that doesn't happen. This is why it's better to just not do this in the first place and to just have the two tasks return the results that they generate rather than manipulating a closed over variable.
WaitAll will introduce a memory barrier, in addition to ensuring that the two tasks are actually finished, which means that you know that the variables are up-to-date and will not have the old stale values.

At position G you may observe the values _o and _i may retain their initialized values null and 0 respectively or they may contain the values written by the tasks. It is unpredictable at this position.
However, at position H you force the issue in two different ways. First, you have guaranteed that both tasks finished and thus the writes are completed. Second, Task.WaitAll will generate a memory barrier which will guarantee that the main thread will observe the new values published by the tasks.
So, in this particular example an explicit lock or memory barrier generator (volatile) is not technically required.

Using memory barriers

In the following code sample, does the memory barrier in FuncA is required to ensure that the most up-to-date value is read?
class Foo
{
DateTime m_bar;
void FuncA() // invoked by thread X
{
Thread.MemoryBarrier(); // is required?
Console.WriteLine(m_bar);
}
void FuncB() // invoked by thread Y
{
m_bar = DateTime.Now;
}
}
EDIT: If not, how can I ensure that FuncA will read the most recent value? (I want to make sure that the recent value is actually store in the processor's cache) [wihout using locks]

Looks like a big "No" to me. Thread.MemoryBarrier() only syncs up memory access within the thread that implemented it.
From MSDN:
The processor executing the current thread cannot reorder instructions in such a way that memory accesses prior to the call to MemoryBarrier execute after memory accesses that follow the call to MemoryBarrier.

I suggest you store Datetime as number of ticks (it is of type "long", i.e. Int64), you can easily transform from ticks (new DateTime (ticks)) and to ticks (myDateTime.Ticks). Then you can use Interlocked.Read to read value and Interlocked.Exchange to write value in fast non-locking operations.

Yes, the memory barrier is needed so that you can get the most up to date value.
If the memory barrier is not present then it is possible for thread X to read the value of m_bar from its own cache line while that value hasn't been written back to main memory (the change has been make local to Thread Y). You can achieve the same effect by declaring the variable as volatile:
The volatile modifier is usually used for a field that is accessed by multiple threads without using the lock statement to serialize access. Using the volatile modifier ensures that one thread retrieves the most up-to-date value written by another thread.
A good entry on that matter (probably the best) is this one by Joe Duffy: Volatile reads and writes, and timeliness

A memory barrier does infact the same thing as wat locking does, guarantee the field will get its
latest value from memory on entering the lock and be written to memory before exiting the lock.
Making sure a field's value is always read or written to memory and never optimized by reading or writing
it first to the cpu's cache can also be achieved by using the volatile keyword.
Unless primitive integral types and reference types DateTime cannot be cached in CPU registers ad so
need not (and cannot) be declared with the volatile keyword.

This actually doesn't matter since on 32bit architectures one can get a torn read in such situation

The need for volatile modifier in double checked locking in .NET

Multiple texts say that when implementing double-checked locking in .NET the field you are locking on should have volatile modifier applied. But why exactly? Considering the following example:
public sealed class Singleton
{
private static volatile Singleton instance;
private static object syncRoot = new Object();
private Singleton() {}
public static Singleton Instance
{
get
{
if (instance == null)
{
lock (syncRoot)
{
if (instance == null)
instance = new Singleton();
}
}
return instance;
}
}
}
why doesn't "lock (syncRoot)" accomplish the necessary memory consistency? Isn't it true that after "lock" statement both read and write would be volatile and so the necessary consistency would be accomplished?

There is a way to implement it without volatile field. I'll explain it...
I think that it is memory access reordering inside the lock that is dangerous, such that you can get a not completelly initialized instance outside of the lock. To avoid this I do this:
public sealed class Singleton
{
private static Singleton instance;
private static object syncRoot = new Object();
private Singleton() {}
public static Singleton Instance
{
get
{
// very fast test, without implicit memory barriers or locks
if (instance == null)
{
lock (syncRoot)
{
if (instance == null)
{
var temp = new Singleton();
// ensures that the instance is well initialized,
// and only then, it assigns the static variable.
System.Threading.Thread.MemoryBarrier();
instance = temp;
}
}
}
return instance;
}
}
}
Understanding the code
Imagine that there are some initialization code inside the constructor of the Singleton class. If these instructions are reordered after the field is set with the address of the new object, then you have an incomplete instance... imagine that the class has this code:
private int _value;
public int Value { get { return this._value; } }
private Singleton()
{
this._value = 1;
}
Now imagine a call to the constructor using the new operator:
instance = new Singleton();
This can be expanded to these operations:
ptr = allocate memory for Singleton;
set ptr._value to 1;
set Singleton.instance to ptr;
What if I reorder these instructions like this:
ptr = allocate memory for Singleton;
set Singleton.instance to ptr;
set ptr._value to 1;
Does it make a difference? NO if you think of a single thread. YES if you think of multiple threads... what if the thread is interruped just after set instance to ptr:
ptr = allocate memory for Singleton;
set Singleton.instance to ptr;
-- thread interruped here, this can happen inside a lock --
set ptr._value to 1; -- Singleton.instance is not completelly initialized
That is what the memory barrier avoids, by not allowing memory access reordering:
ptr = allocate memory for Singleton;
set temp to ptr; // temp is a local variable (that is important)
set ptr._value to 1;
-- memory barrier... cannot reorder writes after this point, or reads before it --
-- Singleton.instance is still null --
set Singleton.instance to temp;
Happy coding!

AFAIK (and - take this with caution, I'm not doing a lot of concurrent stuff) no. The lock just gives you synchronization between multiple contenders (threads).
volatile on the other hand tells your machine to reevaluate the value every time, so that you don't stumble upon a cached (and wrong) value.
See http://msdn.microsoft.com/en-us/library/ms998558.aspx and note the following quote:
Also, the variable is declared to be volatile to ensure that assignment to the instance variable completes before the instance variable can be accessed.
A description of volatile: http://msdn.microsoft.com/en-us/library/x13ttww7%28VS.71%29.aspx

I think that I've found what I was looking for. Details are in this article - http://msdn.microsoft.com/en-us/magazine/cc163715.aspx#S10.
To sum up - in .NET volatile modifier is indeed not needed in this situation. However in weaker memory models writes made in constructor of lazily initiated object may be delayed after write to the field, so other threads might read corrupt non-null instance in the first if statement.

The lock is sufficient. The MS language spec (3.0) itself mentions this exact scenario in §8.12, without any mention of volatile:
A better approach is to synchronize
access to static data by locking a
private static object. For example:
class Cache
{
private static object synchronizationObject = new object();
public static void Add(object x) {
lock (Cache.synchronizationObject) {
...
}
}
public static void Remove(object x) {
lock (Cache.synchronizationObject) {
...
}
}
}

This a pretty good post about using volatile with double checked locking:
http://tech.puredanger.com/2007/06/15/double-checked-locking/
In Java, if the aim is to protect a variable you don't need to lock if it's marked as volatile

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.