Locking a single bool variable when multithreading?

Locking a single bool variable when multithreading? - c#

Recently I have seen this code in a WebSite, and my question is the following:
private bool mbTestFinished = false;
private bool IsFinished()
{
lock( mLock )
{
return mbTestFinished;
}
}
internal void SetFinished()
{
lock( mLock )
{
mbTestFinished = true;
}
}
In a multithread environment, is really necessary to lock the access to the mbTestFinished?

Yes, it is needed. .Net environment uses some optimizations, and sometimes if a memory location is accessed frequently, data is moved into CPU registers. So, in this case, if mbTestFinished is in a CPU register, then a thread reading it may get a wrong value. Thus using the volatile key ensures, all accesses to this variable is done at the memory location, not at the registers.
On the otherhand, I have no idea of the frequency of this occurence. This may occur at a very low frequency.

In my opinion, no, the lock is superfluous here, for two reasons:
Boolean variables cannot cause assignment tearing like long for example, hence locking is not necessary.
To solve the visibility problem volatile is enough. It's true that the lock introduces an implicit fence, but since the lock is not required for atomicity, volatile is enough.

If mLock is ONLY for the variable mbTestFinished, then it's a bit of an overkill. Instead, you can use volatile or Interlocked, because both are User-Mode constructs for thread synchronization. lock (or Monitor) is a Hybrid Construct, in the sense that it is well optimized to avoid transiting from/to the Kernel-Mode whenever possible. The book "CLR via C#" has a in-depth discussion of these concepts.

Related

Is it safe to use Volatile.Read combined with Interlocked.Exchange for concurrently accessing a shared memory location from multiple threads in .NET?

Experts on threading/concurrency/memory model in .NET, could you verify that the following code is correct under all circumstances (that is, regardless of OS, .NET runtime, CPU architecture, etc.)?
class SomeClassWhoseInstancesAreAccessedConcurrently
{
private Strategy _strategy;
public SomeClassWhoseInstancesAreAccessedConcurrently()
{
_strategy = new SomeStrategy();
}
public void DoSomething()
{
Volatile.Read(ref _strategy).DoSomething();
}
public void ChangeStrategy()
{
Interlocked.Exchange(ref _strategy, new AnotherStrategy());
}
}
This pattern comes up pretty frequently. We have an object which is used concurrently by multiple threads and at some point the value of one of its fields needs to be changed. We want to guarantee that from that point on every access to that field coming from any thread observe the new value.
Considering the example above, we want to make sure that after the point in time when ChangeStrategy is executed, it can't happen that SomeStrategy.DoSomething is called instead of AnotherStrategy.DoSomething because some of the threads don't observe the change and use the old value cached in a register/CPU cache/whatever.
To my knowledge of the topic, we need at least volatile read to prevent such caching. The main question is that is it enough or we need Interlocked.CompareExchange(ref _strategy, null, null) instead to achieve the correct behavior?
If volatile read is enough, a further question arises: do we need Interlocked.Exchange at all or even volatile write would be ok in this case?
As I understand, volatile reads/writes use half-fences which allows a write followed by a read reordered, whose implications I still can't fully understand, to be honest. However, as per ECMA 335 specification, section I.12.6.5, "The class library provides a variety of atomic operations in the
System.Threading.Interlocked class. These operations (e.g., Increment, Decrement, Exchange,
and CompareExchange) perform implicit acquire/release operations." So, if I understand this correctly, Interlocked.Exchange should create a full-fence, which looks enough.
But, to complicate things further, it seems that not all Interlocked operations were implemented according to the specification on every platform.
I'd be very grateful if someone could clear this up.

Yes, your code is safe. It is functionally equivalent with using a lock like this:
public void DoSomething()
{
Strategy strategy;
lock (_locker) strategy = _strategy;
strategy.DoSomething();
}
public void ChangeStrategy()
{
Strategy strategy = new AnotherStrategy();
lock (_locker) _strategy = strategy;
}
Your code is more performant though, because the lock imposes a full fence, while the Volatile.Read imposes a potentially cheaper half fence.
You could improve the performance even more by replacing the Interlocked.Exchange (full fence) with a Volatile.Write (half fence). The only reason to prefer the Interlocked.Exchange over the Volatile.Write is when you want to retrieve the previous strategy as an atomic operation. Apparently this is not needed in your case.
For simplicity you could even get rid of the Volatile.Write/Volatile.Read calls, and just declare the _strategy field as volatile.

What specifically about double-checked locking fails without volatile in C#? [duplicate]

Multiple texts say that when implementing double-checked locking in .NET the field you are locking on should have volatile modifier applied. But why exactly? Considering the following example:
public sealed class Singleton
{
private static volatile Singleton instance;
private static object syncRoot = new Object();
private Singleton() {}
public static Singleton Instance
{
get
{
if (instance == null)
{
lock (syncRoot)
{
if (instance == null)
instance = new Singleton();
}
}
return instance;
}
}
}
why doesn't "lock (syncRoot)" accomplish the necessary memory consistency? Isn't it true that after "lock" statement both read and write would be volatile and so the necessary consistency would be accomplished?

Volatile is unnecessary. Well, sort of**
volatile is used to create a memory barrier* between reads and writes on the variable.
lock, when used, causes memory barriers to be created around the block inside the lock, in addition to limiting access to the block to one thread.
Memory barriers make it so each thread reads the most current value of the variable (not a local value cached in some register) and that the compiler doesn't reorder statements. Using volatile is unnecessary** because you've already got a lock.
Joseph Albahari explains this stuff way better than I ever could.
And be sure to check out Jon Skeet's guide to implementing the singleton in C#
update:
*volatile causes reads of the variable to be VolatileReads and writes to be VolatileWrites, which on x86 and x64 on CLR, are implemented with a MemoryBarrier. They may be finer grained on other systems.
**my answer is only correct if you are using the CLR on x86 and x64 processors. It might be true in other memory models, like on Mono (and other implementations), Itanium64 and future hardware. This is what Jon is referring to in his article in the "gotchas" for double checked locking.
Doing one of {marking the variable as volatile, reading it with Thread.VolatileRead, or inserting a call to Thread.MemoryBarrier} might be necessary for the code to work properly in a weak memory model situation.
From what I understand, on the CLR (even on IA64), writes are never reordered (writes always have release semantics). However, on IA64, reads may be reordered to come before writes, unless they are marked volatile. Unfortuantely, I do not have access to IA64 hardware to play with, so anything I say about it would be speculation.
i've also found these articles helpful:
http://www.codeproject.com/KB/tips/MemoryBarrier.aspx
vance morrison's article (everything links to this, it talks about double checked locking)
chris brumme's article (everything links to this)
Joe Duffy: Broken Variants of Double Checked Locking
luis abreu's series on multithreading give a nice overview of the concepts too
http://msmvps.com/blogs/luisabreu/archive/2009/06/29/multithreading-load-and-store-reordering.aspx
http://msmvps.com/blogs/luisabreu/archive/2009/07/03/multithreading-introducing-memory-fences.aspx

There is a way to implement it without volatile field. I'll explain it...
I think that it is memory access reordering inside the lock that is dangerous, such that you can get a not completelly initialized instance outside of the lock. To avoid this I do this:
public sealed class Singleton
{
private static Singleton instance;
private static object syncRoot = new Object();
private Singleton() {}
public static Singleton Instance
{
get
{
// very fast test, without implicit memory barriers or locks
if (instance == null)
{
lock (syncRoot)
{
if (instance == null)
{
var temp = new Singleton();
// ensures that the instance is well initialized,
// and only then, it assigns the static variable.
System.Threading.Thread.MemoryBarrier();
instance = temp;
}
}
}
return instance;
}
}
}
Understanding the code
Imagine that there are some initialization code inside the constructor of the Singleton class. If these instructions are reordered after the field is set with the address of the new object, then you have an incomplete instance... imagine that the class has this code:
private int _value;
public int Value { get { return this._value; } }
private Singleton()
{
this._value = 1;
}
Now imagine a call to the constructor using the new operator:
instance = new Singleton();
This can be expanded to these operations:
ptr = allocate memory for Singleton;
set ptr._value to 1;
set Singleton.instance to ptr;
What if I reorder these instructions like this:
ptr = allocate memory for Singleton;
set Singleton.instance to ptr;
set ptr._value to 1;
Does it make a difference? NO if you think of a single thread. YES if you think of multiple threads... what if the thread is interruped just after set instance to ptr:
ptr = allocate memory for Singleton;
set Singleton.instance to ptr;
-- thread interruped here, this can happen inside a lock --
set ptr._value to 1; -- Singleton.instance is not completelly initialized
That is what the memory barrier avoids, by not allowing memory access reordering:
ptr = allocate memory for Singleton;
set temp to ptr; // temp is a local variable (that is important)
set ptr._value to 1;
-- memory barrier... cannot reorder writes after this point, or reads before it --
-- Singleton.instance is still null --
set Singleton.instance to temp;
Happy coding!

I don't think anybody has actually answered the question, so I'll give it a try.
The volatile and the first if (instance == null) are not "necessary". The lock will make this code thread-safe.
So the question is: why would you add the first if (instance == null)?
The reason is presumably to avoid executing the locked section of code unnecessarily. While you are executing the code inside the lock, any other thread that tries to also execute that code is blocked, which will slow your program down if you try to access the singleton frequently from many threads. Depending on the language/platform, there could also be overheads from the lock itself that you wish to avoid.
So the first null check is added as a really quick way to see if you need the lock. If you don't need to create the singleton, you can avoid the lock entirely.
But you can't check if the reference is null without locking it in some way, because due to processor caching, another thread could change it and you would read a "stale" value that would lead you to enter the lock unnecessarily. But you're trying to avoid a lock!
So you make the singleton volatile to ensure that you read the latest value, without needing to use a lock.
You still need the inner lock because volatile only protects you during a single access to the variable - you can't test-and-set it safely without using a lock.
Now, is this actually useful?
Well I would say "in most cases, no".
If Singleton.Instance could cause inefficiency due to the locks, then why are you calling it so frequently that this would be a significant problem? The whole point of a singleton is that there is only one, so your code can read and cache the singleton reference once.
The only case I can think of where this caching wouldn't be possible would be when you have a large number of threads (e.g. a server using a new thread to process every request could be creating millions of very short-running threads, each of which would have to call Singleton.Instance once).
So I suspect that double checked locking is a mechanism that has a real place in very specific performance-critical cases, and then everybody has clambered on the "this is the proper way to do it" bandwagon without actually thinking what it does and whether it will actually be necessary in the case they are using it for.

You should use volatile with the double check lock pattern.
Most people point to this article as proof you do not need volatile:
https://msdn.microsoft.com/en-us/magazine/cc163715.aspx#S10
But they fail to read to the end:
"A Final Word of Warning - I am only guessing at the x86 memory model from observed behavior on existing processors. Thus low-lock techniques are also fragile because hardware and compilers can get more aggressive over time. Here are some strategies to minimize the impact of this fragility on your code. First, whenever possible, avoid low-lock techniques. (...) Finally, assume the weakest memory model possible, using volatile declarations instead of relying on implicit guarantees."
If you need more convincing then read this article on the ECMA spec will be used for other platforms:
msdn.microsoft.com/en-us/magazine/jj863136.aspx
If you need further convincing read this newer article that optimizations may be put in that prevent it from working without volatile:
msdn.microsoft.com/en-us/magazine/jj883956.aspx
In summary it "might" work for you without volatile for the moment, but don't chance it write proper code and either use volatile or the volatileread/write methods. Articles that suggest to do otherwise are sometimes leaving out some of the possible risks of JIT/compiler optimizations that could impact your code, as well us future optimizations that may happen that could break your code. Also as mentioned assumptions in the last article previous assumptions of working without volatile already may not hold on ARM.

AFAIK (and - take this with caution, I'm not doing a lot of concurrent stuff) no. The lock just gives you synchronization between multiple contenders (threads).
volatile on the other hand tells your machine to reevaluate the value every time, so that you don't stumble upon a cached (and wrong) value.
See http://msdn.microsoft.com/en-us/library/ms998558.aspx and note the following quote:
Also, the variable is declared to be volatile to ensure that assignment to the instance variable completes before the instance variable can be accessed.
A description of volatile: http://msdn.microsoft.com/en-us/library/x13ttww7%28VS.71%29.aspx

I think that I've found what I was looking for. Details are in this article - http://msdn.microsoft.com/en-us/magazine/cc163715.aspx#S10.
To sum up - in .NET volatile modifier is indeed not needed in this situation. However in weaker memory models writes made in constructor of lazily initiated object may be delayed after write to the field, so other threads might read corrupt non-null instance in the first if statement.

The lock is sufficient. The MS language spec (3.0) itself mentions this exact scenario in §8.12, without any mention of volatile:
A better approach is to synchronize
access to static data by locking a
private static object. For example:
class Cache
{
private static object synchronizationObject = new object();
public static void Add(object x) {
lock (Cache.synchronizationObject) {
...
}
}
public static void Remove(object x) {
lock (Cache.synchronizationObject) {
...
}
}
}

This a pretty good post about using volatile with double checked locking:
http://tech.puredanger.com/2007/06/15/double-checked-locking/
In Java, if the aim is to protect a variable you don't need to lock if it's marked as volatile

Volatile fields: How can I actually get the latest written value to a field?

Considering the following example:
private int sharedState = 0;
private void FirstThread() {
Volatile.Write(ref sharedState, 1);
}
private void SecondThread() {
int sharedStateSnapshot = Volatile.Read(ref sharedState);
Console.WriteLine(sharedStateSnapshot);
}
Until recently, I was under the impression that, as long as FirstThread() really did execute before SecondThread(), this program could not output anything but 1.
However, my understanding now is that:
Volatile.Write() emits a release fence. This means no preceding load or store (in program order) may happen after the assignment of 1 to sharedState.
Volatile.Read() emits an acquire fence. This means no subsequent load or store (in program order) may happen before the copying of sharedState to sharedStateSnapshot.
Or, to put it another way:
When sharedState is actually released to all processor cores, everything preceding that write will also be released, and,
When the value in the address sharedStateSnapshot is acquired; sharedState must have been already acquired.
If my understanding is therefore correct, then there is nothing to prevent the acquisition of sharedState being 'stale', if the write in FirstThread() has not already been released.
If this is true, how can we actually ensure (assuming the weakest processor memory model, such as ARM or Alpha), that the program will always print 1? (Or have I made an error in my mental model somewhere?)

Your understanding is correct, and it is true that you cannot ensure that the program will always print 1 using these techniques. To ensure your program will print 1, assuming thread 2 runs after thread one, you need two fences on each thread.
The easiest way to achieve that is using the lock keyword:
private int sharedState = 0;
private readonly object locker = new object();
private void FirstThread()
{
lock (locker)
{
sharedState = 1;
}
}
private void SecondThread()
{
int sharedStateSnapshot;
lock (locker)
{
sharedStateSnapshot = sharedState;
}
Console.WriteLine(sharedStateSnapshot);
}
I'd like to quote Eric Lippert:
Frankly, I discourage you from ever making a volatile field. Volatile fields are a sign that you are doing something downright crazy: you're attempting to read and write the same value on two different threads without putting a lock in place.
The same applies to calling Volatile.Read and Volatile.Write. In fact, they are even worse than volatile fields, since they require you to do manually what the volatile modifier does automatically.

You're right, there's no guarantee that release stores will be immediately visible to all processors. Volatile.Read and Volatile.Write give you acquire/release semantics, but no immediacy guarantees.
The volatile modifier seems to do this though. The compiler will emit an OpCodes.Volatile IL instruction, and the jitter will tell the processor not to store the variable on any of its registers (see Hans Passant's answer).
But why do you need it to be immediate anyway? What if your SecondThread happens to run a couple of milliseconds sooner, before the values are actually wrote? Seeing as the scheduling is non-deterministic, the correctness of your program shouldn't depend on this "immediacy" anyway.

Until recently, I was under the impression that, as long as
FirstThread() really did execute before SecondThread(), this program
could not output anything but 1.
As you go on to explain yourself, this impression is wrong. Volatile.Read simply issues a read operation on its target followed by a memory barrier; the memory barrier prevents operation reordering on the processor executing the current thread but this does not help here because
There are no operations to reorder (just the single read or write in each thread).
The race condition across your threads means that even if the no-reorder guarantee applied across processors, it would simply mean that the order of operations which you cannot predict anyway would be preserved.
If my understanding is therefore correct, then there is nothing to
prevent the acquisition of sharedState being 'stale', if the write in
FirstThread() has not already been released.
That is correct. In essence you are using a tool designed to help with weak memory models against a possible problem caused by a race condition. The tool won't help you because that's not what it does.
If this is true, how can we actually ensure (assuming the weakest
processor memory model, such as ARM or Alpha), that the program will
always print 1? (Or have I made an error in my mental model
somewhere?)
To stress once again: the memory model is not the problem here. To ensure that your program will always print 1 you need to do two things:
Provide explicit thread synchronization that guarantees the write will happen before the read (in the simplest case, SecondThread can use a spin lock on a flag which FirstThread uses to signal it's done).
Ensure that SecondThread will not read a stale value. You can do this trivially by marking sharedState as volatile -- while this keyword has deservedly gotten much flak, it was designed explicitly for such use cases.
So in the simplest case you could for example have:
private volatile int sharedState = 0;
private volatile bool spinLock = false;
private void FirstThread()
{
sharedState = 1;
// ensure lock is released after the shared state write!
Volatile.Write(ref spinLock, true);
}
private void SecondThread()
{
SpinWait.SpinUntil(() => spinLock);
Console.WriteLine(sharedState);
}
Assuming no other writes to the two fields, this program is guaranteed to output nothing other than 1.

thread-safety of primitive concurrent read and write

Simplified illustration below, how does .NET deal with such a situation?
and if it would cause problems, would i have to lock/gate access to each and every field/property that might at times be written to + accessed from different threads?
A field somewhere
public class CrossRoads(){
public int _timeouts;
}
A background thread writer
public void TimeIsUp(CrossRoads crossRoads){
crossRoads._timeouts++;
}
Possibly at the same time, trying to read elsewhere
public void HowManyTimeOuts(CrossRoads crossRoads){
int timeOuts = crossRoads._timeouts;
}

The simple answer is that the above code has the ability to cause problems if accessed simultaneously from multiple threads.
The .Net framework provides two solutions: interlocking and thread synchronization.
For simple data type manipulation (i.e. ints), interlocking using the Interlocked class will work correctly and is the recommended approach.
In fact, interlocked provides specific methods (Increment and Decrement) that make this process easy:
Add an IncrementCount method to your CrossRoads class:
public void IncrementCount() {
Interlocked.Increment(ref _timeouts);
}
Then call this from your background worker:
public void TimeIsUp(CrossRoads crossRoads){
crossRoads.IncrementCount();
}
The reading of the value, unless of a 64-bit value on a 32-bit OS, are atomic. See the Interlocked.Read method documentation for more detail.
For class objects or more complex operations, you will need to use thread synchronization locking (lock in C# or SyncLock in VB.Net).
This is accomplished by creating a static synchronization object at the level the lock is to be applied (for example, inside your class), obtaining a lock on that object, and performing (only) the necessary operations inside that lock:
private static object SynchronizationObject = new Object();
public void PerformSomeCriticalWork()
{
lock (SynchronizationObject)
{
// do some critical work
}
}

The good news is that reads and writes to ints are guaranteed to be atomic, so no torn values. However, it is not guaranteed to do a safe ++, and the read could potentially be cached in registers. There's also the issue of instruction re-ordering.
I would use:
Interlocked.Increment(ref crossroads._timeouts);
For the write, which will ensure no values are lost, and;
int timeouts = Interlocked.CompareExchange(ref crossroads._timeouts, 0, 0);
For the read, since this observes the same rules as the increment. Strictly speaking "volatile" is probably enough for the read, but it is so poorly understood that the Interlocked seems (IMO) safer. Either way, we're avoiding a lock.

Well, I'm not a C# developer, but this is how it typically works at this level:
how does .NET deal with such a situation?
Unlocked. Not likely to be guaranteed to be atomic.
Would i have to lock/gate access to each and every field/property that might at times be written to + accessed from different threads?
Yes. An alternative would be to make a lock for the object available to the clients, then tell the clients they must lock the object while using the instance. This will reduce the number of locks acquisitions, and guarantee a more consistent, predictable, state for your clients.

Forget dotnet. At the machine language level, crossRoads._timeouts++ will be implemented as an INC [memory] instruction. This is known as a Read-Modify-Write instruction. These instructions are atomic with respect to multi-threading on a single processor*, (essentially implemented with time-slicing,) but are not atomic with respect to multi-threading using multiple processors or multiple cores.
So:
If you can guarantee that only TimeIsUp() will ever modify crossRoads._timeouts, and if you can guarantee that only one thread will ever execute TimeIsUp(), then it will be safe to do this. The writing in TimeIsUp() will work fine, and the reading in HowManyTimeOuts() (and any place else) will work fine. But if you also modify crossRoads._timeouts elsewhere, or if you ever spawn one more background thread writer, you will be in trouble.
In either case, my advice would be to play it safe and lock it.
(*) They are atomic with respect to multi-threading on a single processor because context switches between threads happen on a periodic interrupt, and on the x86 architectures these instructions are atomic with respect to interrupts, meaning that if an interrupt occurs while the CPU is executing such an instruction, the interrupt will wait until the instruction completes. This does not hold true with more complex instructions, for example those with the REP prefix.

Although an int may be 'native' size to a CPU (dealing in 32 or 64 bits at a time), if you are reading and writing from different threads to the same variable, you are best off locking this variable and synchronizing access.
There is never a guarantee that reads/writes maybe atomic to an int.
You can also use Interlocked.Increment for your purposes here.

The need for volatile modifier in double checked locking in .NET

Multiple texts say that when implementing double-checked locking in .NET the field you are locking on should have volatile modifier applied. But why exactly? Considering the following example:
public sealed class Singleton
{
private static volatile Singleton instance;
private static object syncRoot = new Object();
private Singleton() {}
public static Singleton Instance
{
get
{
if (instance == null)
{
lock (syncRoot)
{
if (instance == null)
instance = new Singleton();
}
}
return instance;
}
}
}
why doesn't "lock (syncRoot)" accomplish the necessary memory consistency? Isn't it true that after "lock" statement both read and write would be volatile and so the necessary consistency would be accomplished?

There is a way to implement it without volatile field. I'll explain it...
I think that it is memory access reordering inside the lock that is dangerous, such that you can get a not completelly initialized instance outside of the lock. To avoid this I do this:
public sealed class Singleton
{
private static Singleton instance;
private static object syncRoot = new Object();
private Singleton() {}
public static Singleton Instance
{
get
{
// very fast test, without implicit memory barriers or locks
if (instance == null)
{
lock (syncRoot)
{
if (instance == null)
{
var temp = new Singleton();
// ensures that the instance is well initialized,
// and only then, it assigns the static variable.
System.Threading.Thread.MemoryBarrier();
instance = temp;
}
}
}
return instance;
}
}
}
Understanding the code
Imagine that there are some initialization code inside the constructor of the Singleton class. If these instructions are reordered after the field is set with the address of the new object, then you have an incomplete instance... imagine that the class has this code:
private int _value;
public int Value { get { return this._value; } }
private Singleton()
{
this._value = 1;
}
Now imagine a call to the constructor using the new operator:
instance = new Singleton();
This can be expanded to these operations:
ptr = allocate memory for Singleton;
set ptr._value to 1;
set Singleton.instance to ptr;
What if I reorder these instructions like this:
ptr = allocate memory for Singleton;
set Singleton.instance to ptr;
set ptr._value to 1;
Does it make a difference? NO if you think of a single thread. YES if you think of multiple threads... what if the thread is interruped just after set instance to ptr:
ptr = allocate memory for Singleton;
set Singleton.instance to ptr;
-- thread interruped here, this can happen inside a lock --
set ptr._value to 1; -- Singleton.instance is not completelly initialized
That is what the memory barrier avoids, by not allowing memory access reordering:
ptr = allocate memory for Singleton;
set temp to ptr; // temp is a local variable (that is important)
set ptr._value to 1;
-- memory barrier... cannot reorder writes after this point, or reads before it --
-- Singleton.instance is still null --
set Singleton.instance to temp;
Happy coding!

AFAIK (and - take this with caution, I'm not doing a lot of concurrent stuff) no. The lock just gives you synchronization between multiple contenders (threads).
volatile on the other hand tells your machine to reevaluate the value every time, so that you don't stumble upon a cached (and wrong) value.
See http://msdn.microsoft.com/en-us/library/ms998558.aspx and note the following quote:
Also, the variable is declared to be volatile to ensure that assignment to the instance variable completes before the instance variable can be accessed.
A description of volatile: http://msdn.microsoft.com/en-us/library/x13ttww7%28VS.71%29.aspx

I think that I've found what I was looking for. Details are in this article - http://msdn.microsoft.com/en-us/magazine/cc163715.aspx#S10.
To sum up - in .NET volatile modifier is indeed not needed in this situation. However in weaker memory models writes made in constructor of lazily initiated object may be delayed after write to the field, so other threads might read corrupt non-null instance in the first if statement.

The lock is sufficient. The MS language spec (3.0) itself mentions this exact scenario in §8.12, without any mention of volatile:
A better approach is to synchronize
access to static data by locking a
private static object. For example:
class Cache
{
private static object synchronizationObject = new object();
public static void Add(object x) {
lock (Cache.synchronizationObject) {
...
}
}
public static void Remove(object x) {
lock (Cache.synchronizationObject) {
...
}
}
}

This a pretty good post about using volatile with double checked locking:
http://tech.puredanger.com/2007/06/15/double-checked-locking/
In Java, if the aim is to protect a variable you don't need to lock if it's marked as volatile

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.