Multiple texts say that when implementing double-checked locking in .NET the field you are locking on should have volatile modifier applied. But why exactly? Considering the following example:
public sealed class Singleton
{
private static volatile Singleton instance;
private static object syncRoot = new Object();
private Singleton() {}
public static Singleton Instance
{
get
{
if (instance == null)
{
lock (syncRoot)
{
if (instance == null)
instance = new Singleton();
}
}
return instance;
}
}
}
why doesn't "lock (syncRoot)" accomplish the necessary memory consistency? Isn't it true that after "lock" statement both read and write would be volatile and so the necessary consistency would be accomplished?
Volatile is unnecessary. Well, sort of**
volatile is used to create a memory barrier* between reads and writes on the variable.
lock, when used, causes memory barriers to be created around the block inside the lock, in addition to limiting access to the block to one thread.
Memory barriers make it so each thread reads the most current value of the variable (not a local value cached in some register) and that the compiler doesn't reorder statements. Using volatile is unnecessary** because you've already got a lock.
Joseph Albahari explains this stuff way better than I ever could.
And be sure to check out Jon Skeet's guide to implementing the singleton in C#
update:
*volatile causes reads of the variable to be VolatileReads and writes to be VolatileWrites, which on x86 and x64 on CLR, are implemented with a MemoryBarrier. They may be finer grained on other systems.
**my answer is only correct if you are using the CLR on x86 and x64 processors. It might be true in other memory models, like on Mono (and other implementations), Itanium64 and future hardware. This is what Jon is referring to in his article in the "gotchas" for double checked locking.
Doing one of {marking the variable as volatile, reading it with Thread.VolatileRead, or inserting a call to Thread.MemoryBarrier} might be necessary for the code to work properly in a weak memory model situation.
From what I understand, on the CLR (even on IA64), writes are never reordered (writes always have release semantics). However, on IA64, reads may be reordered to come before writes, unless they are marked volatile. Unfortuantely, I do not have access to IA64 hardware to play with, so anything I say about it would be speculation.
i've also found these articles helpful:
http://www.codeproject.com/KB/tips/MemoryBarrier.aspx
vance morrison's article (everything links to this, it talks about double checked locking)
chris brumme's article (everything links to this)
Joe Duffy: Broken Variants of Double Checked Locking
luis abreu's series on multithreading give a nice overview of the concepts too
http://msmvps.com/blogs/luisabreu/archive/2009/06/29/multithreading-load-and-store-reordering.aspx
http://msmvps.com/blogs/luisabreu/archive/2009/07/03/multithreading-introducing-memory-fences.aspx
There is a way to implement it without volatile field. I'll explain it...
I think that it is memory access reordering inside the lock that is dangerous, such that you can get a not completelly initialized instance outside of the lock. To avoid this I do this:
public sealed class Singleton
{
private static Singleton instance;
private static object syncRoot = new Object();
private Singleton() {}
public static Singleton Instance
{
get
{
// very fast test, without implicit memory barriers or locks
if (instance == null)
{
lock (syncRoot)
{
if (instance == null)
{
var temp = new Singleton();
// ensures that the instance is well initialized,
// and only then, it assigns the static variable.
System.Threading.Thread.MemoryBarrier();
instance = temp;
}
}
}
return instance;
}
}
}
Understanding the code
Imagine that there are some initialization code inside the constructor of the Singleton class. If these instructions are reordered after the field is set with the address of the new object, then you have an incomplete instance... imagine that the class has this code:
private int _value;
public int Value { get { return this._value; } }
private Singleton()
{
this._value = 1;
}
Now imagine a call to the constructor using the new operator:
instance = new Singleton();
This can be expanded to these operations:
ptr = allocate memory for Singleton;
set ptr._value to 1;
set Singleton.instance to ptr;
What if I reorder these instructions like this:
ptr = allocate memory for Singleton;
set Singleton.instance to ptr;
set ptr._value to 1;
Does it make a difference? NO if you think of a single thread. YES if you think of multiple threads... what if the thread is interruped just after set instance to ptr:
ptr = allocate memory for Singleton;
set Singleton.instance to ptr;
-- thread interruped here, this can happen inside a lock --
set ptr._value to 1; -- Singleton.instance is not completelly initialized
That is what the memory barrier avoids, by not allowing memory access reordering:
ptr = allocate memory for Singleton;
set temp to ptr; // temp is a local variable (that is important)
set ptr._value to 1;
-- memory barrier... cannot reorder writes after this point, or reads before it --
-- Singleton.instance is still null --
set Singleton.instance to temp;
Happy coding!
I don't think anybody has actually answered the question, so I'll give it a try.
The volatile and the first if (instance == null) are not "necessary". The lock will make this code thread-safe.
So the question is: why would you add the first if (instance == null)?
The reason is presumably to avoid executing the locked section of code unnecessarily. While you are executing the code inside the lock, any other thread that tries to also execute that code is blocked, which will slow your program down if you try to access the singleton frequently from many threads. Depending on the language/platform, there could also be overheads from the lock itself that you wish to avoid.
So the first null check is added as a really quick way to see if you need the lock. If you don't need to create the singleton, you can avoid the lock entirely.
But you can't check if the reference is null without locking it in some way, because due to processor caching, another thread could change it and you would read a "stale" value that would lead you to enter the lock unnecessarily. But you're trying to avoid a lock!
So you make the singleton volatile to ensure that you read the latest value, without needing to use a lock.
You still need the inner lock because volatile only protects you during a single access to the variable - you can't test-and-set it safely without using a lock.
Now, is this actually useful?
Well I would say "in most cases, no".
If Singleton.Instance could cause inefficiency due to the locks, then why are you calling it so frequently that this would be a significant problem? The whole point of a singleton is that there is only one, so your code can read and cache the singleton reference once.
The only case I can think of where this caching wouldn't be possible would be when you have a large number of threads (e.g. a server using a new thread to process every request could be creating millions of very short-running threads, each of which would have to call Singleton.Instance once).
So I suspect that double checked locking is a mechanism that has a real place in very specific performance-critical cases, and then everybody has clambered on the "this is the proper way to do it" bandwagon without actually thinking what it does and whether it will actually be necessary in the case they are using it for.
You should use volatile with the double check lock pattern.
Most people point to this article as proof you do not need volatile:
https://msdn.microsoft.com/en-us/magazine/cc163715.aspx#S10
But they fail to read to the end:
"A Final Word of Warning - I am only guessing at the x86 memory model from observed behavior on existing processors. Thus low-lock techniques are also fragile because hardware and compilers can get more aggressive over time. Here are some strategies to minimize the impact of this fragility on your code. First, whenever possible, avoid low-lock techniques. (...) Finally, assume the weakest memory model possible, using volatile declarations instead of relying on implicit guarantees."
If you need more convincing then read this article on the ECMA spec will be used for other platforms:
msdn.microsoft.com/en-us/magazine/jj863136.aspx
If you need further convincing read this newer article that optimizations may be put in that prevent it from working without volatile:
msdn.microsoft.com/en-us/magazine/jj883956.aspx
In summary it "might" work for you without volatile for the moment, but don't chance it write proper code and either use volatile or the volatileread/write methods. Articles that suggest to do otherwise are sometimes leaving out some of the possible risks of JIT/compiler optimizations that could impact your code, as well us future optimizations that may happen that could break your code. Also as mentioned assumptions in the last article previous assumptions of working without volatile already may not hold on ARM.
AFAIK (and - take this with caution, I'm not doing a lot of concurrent stuff) no. The lock just gives you synchronization between multiple contenders (threads).
volatile on the other hand tells your machine to reevaluate the value every time, so that you don't stumble upon a cached (and wrong) value.
See http://msdn.microsoft.com/en-us/library/ms998558.aspx and note the following quote:
Also, the variable is declared to be volatile to ensure that assignment to the instance variable completes before the instance variable can be accessed.
A description of volatile: http://msdn.microsoft.com/en-us/library/x13ttww7%28VS.71%29.aspx
I think that I've found what I was looking for. Details are in this article - http://msdn.microsoft.com/en-us/magazine/cc163715.aspx#S10.
To sum up - in .NET volatile modifier is indeed not needed in this situation. However in weaker memory models writes made in constructor of lazily initiated object may be delayed after write to the field, so other threads might read corrupt non-null instance in the first if statement.
The lock is sufficient. The MS language spec (3.0) itself mentions this exact scenario in §8.12, without any mention of volatile:
A better approach is to synchronize
access to static data by locking a
private static object. For example:
class Cache
{
private static object synchronizationObject = new object();
public static void Add(object x) {
lock (Cache.synchronizationObject) {
...
}
}
public static void Remove(object x) {
lock (Cache.synchronizationObject) {
...
}
}
}
This a pretty good post about using volatile with double checked locking:
http://tech.puredanger.com/2007/06/15/double-checked-locking/
In Java, if the aim is to protect a variable you don't need to lock if it's marked as volatile
Related
Experts on threading/concurrency/memory model in .NET, could you verify that the following code is correct under all circumstances (that is, regardless of OS, .NET runtime, CPU architecture, etc.)?
class SomeClassWhoseInstancesAreAccessedConcurrently
{
private Strategy _strategy;
public SomeClassWhoseInstancesAreAccessedConcurrently()
{
_strategy = new SomeStrategy();
}
public void DoSomething()
{
Volatile.Read(ref _strategy).DoSomething();
}
public void ChangeStrategy()
{
Interlocked.Exchange(ref _strategy, new AnotherStrategy());
}
}
This pattern comes up pretty frequently. We have an object which is used concurrently by multiple threads and at some point the value of one of its fields needs to be changed. We want to guarantee that from that point on every access to that field coming from any thread observe the new value.
Considering the example above, we want to make sure that after the point in time when ChangeStrategy is executed, it can't happen that SomeStrategy.DoSomething is called instead of AnotherStrategy.DoSomething because some of the threads don't observe the change and use the old value cached in a register/CPU cache/whatever.
To my knowledge of the topic, we need at least volatile read to prevent such caching. The main question is that is it enough or we need Interlocked.CompareExchange(ref _strategy, null, null) instead to achieve the correct behavior?
If volatile read is enough, a further question arises: do we need Interlocked.Exchange at all or even volatile write would be ok in this case?
As I understand, volatile reads/writes use half-fences which allows a write followed by a read reordered, whose implications I still can't fully understand, to be honest. However, as per ECMA 335 specification, section I.12.6.5, "The class library provides a variety of atomic operations in the
System.Threading.Interlocked class. These operations (e.g., Increment, Decrement, Exchange,
and CompareExchange) perform implicit acquire/release operations." So, if I understand this correctly, Interlocked.Exchange should create a full-fence, which looks enough.
But, to complicate things further, it seems that not all Interlocked operations were implemented according to the specification on every platform.
I'd be very grateful if someone could clear this up.
Yes, your code is safe. It is functionally equivalent with using a lock like this:
public void DoSomething()
{
Strategy strategy;
lock (_locker) strategy = _strategy;
strategy.DoSomething();
}
public void ChangeStrategy()
{
Strategy strategy = new AnotherStrategy();
lock (_locker) _strategy = strategy;
}
Your code is more performant though, because the lock imposes a full fence, while the Volatile.Read imposes a potentially cheaper half fence.
You could improve the performance even more by replacing the Interlocked.Exchange (full fence) with a Volatile.Write (half fence). The only reason to prefer the Interlocked.Exchange over the Volatile.Write is when you want to retrieve the previous strategy as an atomic operation. Apparently this is not needed in your case.
For simplicity you could even get rid of the Volatile.Write/Volatile.Read calls, and just declare the _strategy field as volatile.
Multiple texts say that when implementing double-checked locking in .NET the field you are locking on should have volatile modifier applied. But why exactly? Considering the following example:
public sealed class Singleton
{
private static volatile Singleton instance;
private static object syncRoot = new Object();
private Singleton() {}
public static Singleton Instance
{
get
{
if (instance == null)
{
lock (syncRoot)
{
if (instance == null)
instance = new Singleton();
}
}
return instance;
}
}
}
why doesn't "lock (syncRoot)" accomplish the necessary memory consistency? Isn't it true that after "lock" statement both read and write would be volatile and so the necessary consistency would be accomplished?
Volatile is unnecessary. Well, sort of**
volatile is used to create a memory barrier* between reads and writes on the variable.
lock, when used, causes memory barriers to be created around the block inside the lock, in addition to limiting access to the block to one thread.
Memory barriers make it so each thread reads the most current value of the variable (not a local value cached in some register) and that the compiler doesn't reorder statements. Using volatile is unnecessary** because you've already got a lock.
Joseph Albahari explains this stuff way better than I ever could.
And be sure to check out Jon Skeet's guide to implementing the singleton in C#
update:
*volatile causes reads of the variable to be VolatileReads and writes to be VolatileWrites, which on x86 and x64 on CLR, are implemented with a MemoryBarrier. They may be finer grained on other systems.
**my answer is only correct if you are using the CLR on x86 and x64 processors. It might be true in other memory models, like on Mono (and other implementations), Itanium64 and future hardware. This is what Jon is referring to in his article in the "gotchas" for double checked locking.
Doing one of {marking the variable as volatile, reading it with Thread.VolatileRead, or inserting a call to Thread.MemoryBarrier} might be necessary for the code to work properly in a weak memory model situation.
From what I understand, on the CLR (even on IA64), writes are never reordered (writes always have release semantics). However, on IA64, reads may be reordered to come before writes, unless they are marked volatile. Unfortuantely, I do not have access to IA64 hardware to play with, so anything I say about it would be speculation.
i've also found these articles helpful:
http://www.codeproject.com/KB/tips/MemoryBarrier.aspx
vance morrison's article (everything links to this, it talks about double checked locking)
chris brumme's article (everything links to this)
Joe Duffy: Broken Variants of Double Checked Locking
luis abreu's series on multithreading give a nice overview of the concepts too
http://msmvps.com/blogs/luisabreu/archive/2009/06/29/multithreading-load-and-store-reordering.aspx
http://msmvps.com/blogs/luisabreu/archive/2009/07/03/multithreading-introducing-memory-fences.aspx
There is a way to implement it without volatile field. I'll explain it...
I think that it is memory access reordering inside the lock that is dangerous, such that you can get a not completelly initialized instance outside of the lock. To avoid this I do this:
public sealed class Singleton
{
private static Singleton instance;
private static object syncRoot = new Object();
private Singleton() {}
public static Singleton Instance
{
get
{
// very fast test, without implicit memory barriers or locks
if (instance == null)
{
lock (syncRoot)
{
if (instance == null)
{
var temp = new Singleton();
// ensures that the instance is well initialized,
// and only then, it assigns the static variable.
System.Threading.Thread.MemoryBarrier();
instance = temp;
}
}
}
return instance;
}
}
}
Understanding the code
Imagine that there are some initialization code inside the constructor of the Singleton class. If these instructions are reordered after the field is set with the address of the new object, then you have an incomplete instance... imagine that the class has this code:
private int _value;
public int Value { get { return this._value; } }
private Singleton()
{
this._value = 1;
}
Now imagine a call to the constructor using the new operator:
instance = new Singleton();
This can be expanded to these operations:
ptr = allocate memory for Singleton;
set ptr._value to 1;
set Singleton.instance to ptr;
What if I reorder these instructions like this:
ptr = allocate memory for Singleton;
set Singleton.instance to ptr;
set ptr._value to 1;
Does it make a difference? NO if you think of a single thread. YES if you think of multiple threads... what if the thread is interruped just after set instance to ptr:
ptr = allocate memory for Singleton;
set Singleton.instance to ptr;
-- thread interruped here, this can happen inside a lock --
set ptr._value to 1; -- Singleton.instance is not completelly initialized
That is what the memory barrier avoids, by not allowing memory access reordering:
ptr = allocate memory for Singleton;
set temp to ptr; // temp is a local variable (that is important)
set ptr._value to 1;
-- memory barrier... cannot reorder writes after this point, or reads before it --
-- Singleton.instance is still null --
set Singleton.instance to temp;
Happy coding!
I don't think anybody has actually answered the question, so I'll give it a try.
The volatile and the first if (instance == null) are not "necessary". The lock will make this code thread-safe.
So the question is: why would you add the first if (instance == null)?
The reason is presumably to avoid executing the locked section of code unnecessarily. While you are executing the code inside the lock, any other thread that tries to also execute that code is blocked, which will slow your program down if you try to access the singleton frequently from many threads. Depending on the language/platform, there could also be overheads from the lock itself that you wish to avoid.
So the first null check is added as a really quick way to see if you need the lock. If you don't need to create the singleton, you can avoid the lock entirely.
But you can't check if the reference is null without locking it in some way, because due to processor caching, another thread could change it and you would read a "stale" value that would lead you to enter the lock unnecessarily. But you're trying to avoid a lock!
So you make the singleton volatile to ensure that you read the latest value, without needing to use a lock.
You still need the inner lock because volatile only protects you during a single access to the variable - you can't test-and-set it safely without using a lock.
Now, is this actually useful?
Well I would say "in most cases, no".
If Singleton.Instance could cause inefficiency due to the locks, then why are you calling it so frequently that this would be a significant problem? The whole point of a singleton is that there is only one, so your code can read and cache the singleton reference once.
The only case I can think of where this caching wouldn't be possible would be when you have a large number of threads (e.g. a server using a new thread to process every request could be creating millions of very short-running threads, each of which would have to call Singleton.Instance once).
So I suspect that double checked locking is a mechanism that has a real place in very specific performance-critical cases, and then everybody has clambered on the "this is the proper way to do it" bandwagon without actually thinking what it does and whether it will actually be necessary in the case they are using it for.
You should use volatile with the double check lock pattern.
Most people point to this article as proof you do not need volatile:
https://msdn.microsoft.com/en-us/magazine/cc163715.aspx#S10
But they fail to read to the end:
"A Final Word of Warning - I am only guessing at the x86 memory model from observed behavior on existing processors. Thus low-lock techniques are also fragile because hardware and compilers can get more aggressive over time. Here are some strategies to minimize the impact of this fragility on your code. First, whenever possible, avoid low-lock techniques. (...) Finally, assume the weakest memory model possible, using volatile declarations instead of relying on implicit guarantees."
If you need more convincing then read this article on the ECMA spec will be used for other platforms:
msdn.microsoft.com/en-us/magazine/jj863136.aspx
If you need further convincing read this newer article that optimizations may be put in that prevent it from working without volatile:
msdn.microsoft.com/en-us/magazine/jj883956.aspx
In summary it "might" work for you without volatile for the moment, but don't chance it write proper code and either use volatile or the volatileread/write methods. Articles that suggest to do otherwise are sometimes leaving out some of the possible risks of JIT/compiler optimizations that could impact your code, as well us future optimizations that may happen that could break your code. Also as mentioned assumptions in the last article previous assumptions of working without volatile already may not hold on ARM.
AFAIK (and - take this with caution, I'm not doing a lot of concurrent stuff) no. The lock just gives you synchronization between multiple contenders (threads).
volatile on the other hand tells your machine to reevaluate the value every time, so that you don't stumble upon a cached (and wrong) value.
See http://msdn.microsoft.com/en-us/library/ms998558.aspx and note the following quote:
Also, the variable is declared to be volatile to ensure that assignment to the instance variable completes before the instance variable can be accessed.
A description of volatile: http://msdn.microsoft.com/en-us/library/x13ttww7%28VS.71%29.aspx
I think that I've found what I was looking for. Details are in this article - http://msdn.microsoft.com/en-us/magazine/cc163715.aspx#S10.
To sum up - in .NET volatile modifier is indeed not needed in this situation. However in weaker memory models writes made in constructor of lazily initiated object may be delayed after write to the field, so other threads might read corrupt non-null instance in the first if statement.
The lock is sufficient. The MS language spec (3.0) itself mentions this exact scenario in §8.12, without any mention of volatile:
A better approach is to synchronize
access to static data by locking a
private static object. For example:
class Cache
{
private static object synchronizationObject = new object();
public static void Add(object x) {
lock (Cache.synchronizationObject) {
...
}
}
public static void Remove(object x) {
lock (Cache.synchronizationObject) {
...
}
}
}
This a pretty good post about using volatile with double checked locking:
http://tech.puredanger.com/2007/06/15/double-checked-locking/
In Java, if the aim is to protect a variable you don't need to lock if it's marked as volatile
Considering the following example:
private int sharedState = 0;
private void FirstThread() {
Volatile.Write(ref sharedState, 1);
}
private void SecondThread() {
int sharedStateSnapshot = Volatile.Read(ref sharedState);
Console.WriteLine(sharedStateSnapshot);
}
Until recently, I was under the impression that, as long as FirstThread() really did execute before SecondThread(), this program could not output anything but 1.
However, my understanding now is that:
Volatile.Write() emits a release fence. This means no preceding load or store (in program order) may happen after the assignment of 1 to sharedState.
Volatile.Read() emits an acquire fence. This means no subsequent load or store (in program order) may happen before the copying of sharedState to sharedStateSnapshot.
Or, to put it another way:
When sharedState is actually released to all processor cores, everything preceding that write will also be released, and,
When the value in the address sharedStateSnapshot is acquired; sharedState must have been already acquired.
If my understanding is therefore correct, then there is nothing to prevent the acquisition of sharedState being 'stale', if the write in FirstThread() has not already been released.
If this is true, how can we actually ensure (assuming the weakest processor memory model, such as ARM or Alpha), that the program will always print 1? (Or have I made an error in my mental model somewhere?)
Your understanding is correct, and it is true that you cannot ensure that the program will always print 1 using these techniques. To ensure your program will print 1, assuming thread 2 runs after thread one, you need two fences on each thread.
The easiest way to achieve that is using the lock keyword:
private int sharedState = 0;
private readonly object locker = new object();
private void FirstThread()
{
lock (locker)
{
sharedState = 1;
}
}
private void SecondThread()
{
int sharedStateSnapshot;
lock (locker)
{
sharedStateSnapshot = sharedState;
}
Console.WriteLine(sharedStateSnapshot);
}
I'd like to quote Eric Lippert:
Frankly, I discourage you from ever making a volatile field. Volatile fields are a sign that you are doing something downright crazy: you're attempting to read and write the same value on two different threads without putting a lock in place.
The same applies to calling Volatile.Read and Volatile.Write. In fact, they are even worse than volatile fields, since they require you to do manually what the volatile modifier does automatically.
You're right, there's no guarantee that release stores will be immediately visible to all processors. Volatile.Read and Volatile.Write give you acquire/release semantics, but no immediacy guarantees.
The volatile modifier seems to do this though. The compiler will emit an OpCodes.Volatile IL instruction, and the jitter will tell the processor not to store the variable on any of its registers (see Hans Passant's answer).
But why do you need it to be immediate anyway? What if your SecondThread happens to run a couple of milliseconds sooner, before the values are actually wrote? Seeing as the scheduling is non-deterministic, the correctness of your program shouldn't depend on this "immediacy" anyway.
Until recently, I was under the impression that, as long as
FirstThread() really did execute before SecondThread(), this program
could not output anything but 1.
As you go on to explain yourself, this impression is wrong. Volatile.Read simply issues a read operation on its target followed by a memory barrier; the memory barrier prevents operation reordering on the processor executing the current thread but this does not help here because
There are no operations to reorder (just the single read or write in each thread).
The race condition across your threads means that even if the no-reorder guarantee applied across processors, it would simply mean that the order of operations which you cannot predict anyway would be preserved.
If my understanding is therefore correct, then there is nothing to
prevent the acquisition of sharedState being 'stale', if the write in
FirstThread() has not already been released.
That is correct. In essence you are using a tool designed to help with weak memory models against a possible problem caused by a race condition. The tool won't help you because that's not what it does.
If this is true, how can we actually ensure (assuming the weakest
processor memory model, such as ARM or Alpha), that the program will
always print 1? (Or have I made an error in my mental model
somewhere?)
To stress once again: the memory model is not the problem here. To ensure that your program will always print 1 you need to do two things:
Provide explicit thread synchronization that guarantees the write will happen before the read (in the simplest case, SecondThread can use a spin lock on a flag which FirstThread uses to signal it's done).
Ensure that SecondThread will not read a stale value. You can do this trivially by marking sharedState as volatile -- while this keyword has deservedly gotten much flak, it was designed explicitly for such use cases.
So in the simplest case you could for example have:
private volatile int sharedState = 0;
private volatile bool spinLock = false;
private void FirstThread()
{
sharedState = 1;
// ensure lock is released after the shared state write!
Volatile.Write(ref spinLock, true);
}
private void SecondThread()
{
SpinWait.SpinUntil(() => spinLock);
Console.WriteLine(sharedState);
}
Assuming no other writes to the two fields, this program is guaranteed to output nothing other than 1.
I read some articles about the volatile keyword but I could not figure out its correct usage. Could you please tell me what it should be used for in C# and in Java?
Consider this example:
int i = 5;
System.out.println(i);
The compiler may optimize this to just print 5, like this:
System.out.println(5);
However, if there is another thread which can change i, this is the wrong behaviour. If another thread changes i to be 6, the optimized version will still print 5.
The volatile keyword prevents such optimization and caching, and thus is useful when a variable can be changed by another thread.
For both C# and Java, "volatile" tells the compiler that the value of a variable must never be cached as its value may change outside of the scope of the program itself. The compiler will then avoid any optimisations that may result in problems if the variable changes "outside of its control".
Reads of volatile fields have acquire semantics. This means that it is guaranteed that the memory read from the volatile variable will occur before any following memory reads. It blocks the compiler from doing the reordering, and if the hardware requires it (weakly ordered CPU), it will use a special instruction to make the hardware flush any reads that occur after the volatile read but were speculatively started early, or the CPU could prevent them from being issued early in the first place, by preventing any speculative load from occurring between the issue of the load acquire and its retirement.
Writes of volatile fields have release semantics. This means that it is guaranteed that any memory writes to the volatile variable are guaranteed to be delayed until all previous memory writes are visible to other processors.
Consider the following example:
something.foo = new Thing();
If foo is a member variable in a class, and other CPUs have access to the object instance referred to by something, they might see the value foo change before the memory writes in the Thing constructor are globally visible! This is what "weakly ordered memory" means. This could occur even if the compiler has all of the stores in the constructor before the store to foo. If foo is volatile then the store to foo will have release semantics, and the hardware guarantees that all of the writes before the write to foo are visible to other processors before allowing the write to foo to occur.
How is it possible for the writes to foo to be reordered so badly? If the cache line holding foo is in the cache, and the stores in the constructor missed the cache, then it is possible for the store to complete much sooner than the writes to the cache misses.
The (awful) Itanium architecture from Intel had weakly ordered memory. The processor used in the original XBox 360 had weakly ordered memory. Many ARM processors, including the very popular ARMv7-A have weakly ordered memory.
Developers often don't see these data races because things like locks will do a full memory barrier, essentially the same thing as acquire and release semantics at the same time. No loads inside the lock can be speculatively executed before the lock is acquired, they are delayed until the lock is acquired. No stores can be delayed across a lock release, the instruction that releases the lock is delayed until all of the writes done inside the lock are globally visible.
A more complete example is the "Double-checked locking" pattern. The purpose of this pattern is to avoid having to always acquire a lock in order to lazy initialize an object.
Snagged from Wikipedia:
public class MySingleton {
private static object myLock = new object();
private static volatile MySingleton mySingleton = null;
private MySingleton() {
}
public static MySingleton GetInstance() {
if (mySingleton == null) { // 1st check
lock (myLock) {
if (mySingleton == null) { // 2nd (double) check
mySingleton = new MySingleton();
// Write-release semantics are implicitly handled by marking
// mySingleton with 'volatile', which inserts the necessary memory
// barriers between the constructor call and the write to mySingleton.
// The barriers created by the lock are not sufficient because
// the object is made visible before the lock is released.
}
}
}
// The barriers created by the lock are not sufficient because not all threads
// will acquire the lock. A fence for read-acquire semantics is needed between
// the test of mySingleton (above) and the use of its contents. This fence
// is automatically inserted because mySingleton is marked as 'volatile'.
return mySingleton;
}
}
In this example, the stores in the MySingleton constructor might not be visible to other processors before the store to mySingleton. If that happens, the other threads that peek at mySingleton will not acquire a lock and they will not necessarily pick up the writes to the constructor.
volatile never prevents caching. What it does is guarantee the order in which other processors "see" writes. A store release will delay a store until all pending writes are complete and a bus cycle has been issued telling other processors to discard/writeback their cache line if they happen to have the relevant lines cached. A load acquire will flush any speculated reads, ensuring that they won't be stale values from the past.
To understand what volatile does to a variable, it's important to understand what happens when the variable is not volatile.
Variable is Non-volatile
When two threads A & B are accessing a non-volatile variable, each thread will maintain a local copy of the variable in it's local cache. Any changes done by thread A in it's local cache won't be visible to the thread B.
Variable is volatile
When variables are declared volatile it essentially means that threads should not cache such a variable or in other words threads should not trust the values of these variables unless they are directly read from the main memory.
So, when to make a variable volatile?
When you have a variable which can be accessed by many threads and you want every thread to get the latest updated value of that variable even if the value is updated by any other thread/process/outside of the program.
The volatile keyword has different meanings in both Java and C#.
Java
From the Java Language Spec :
A field may be declared volatile, in which case the Java memory model ensures that all threads see a consistent value for the variable.
C#
From the C# Reference (retrieved 2021-03-31):
The volatile keyword indicates that a field might be modified by multiple threads that are executing at the same time. The compiler, the runtime system, and even hardware may rearrange reads and writes to memory locations for performance reasons. Fields that are declared volatile are not subject to these optimizations. (...)
In Java, "volatile" is used to tell the JVM that the variable may be used by multiple threads at the same time, so certain common optimizations cannot be applied.
Notably the situation where the two threads accessing the same variable are running on separate CPU's in the same machine. It is very common for CPU's to cache aggressively the data it holds because memory access is very much slower than cache access. This means that if the data is updated in CPU1 it must immediately go through all caches and to main memory instead of when the cache decides to clear itself, so that CPU2 can see the updated value (again by disregarding all caches on the way).
When you are reading data that is non-volatile, the executing thread may or may not always get the updated value.
But if the object is volatile, the thread always gets the most up-to-date value.
Volatile is solving concurrency problem. To make that value in sync. This keyword is mostly use in a threading. When multiple thread updating same variable.
When should I use volatile/Thread.MemoryBarrier() for thread safety?
You use volatile/Thread.MemoryBarrier() when you want to access a variable across threads without locking.
Variables that are atomic, like an int for example, are always read and written whole at once. That means that you will never get half of the value before another thread changes it and the other half after it has changed. Because of that you can safely read and write the value in different threads without syncronising.
However, the compiler may optimize away some reads and writes, which you prevent with the volatile keyword. If you for example have a loop like this:
sum = 0;
foreach (int value in list) {
sum += value;
}
The compiler may actually do the calculations in a processor register and only write the value to the sum variable after the loop. If you make the sum variable volatile, the compiler will generate code that reads and writes the variable for every change, so that it's value is up to date throughout the loop.
What's wrong with
private static readonly object syncObj = new object();
private static int counter;
public static int NextValue()
{
lock (syncObj)
{
return counter++;
}
}
?
This does all necessary locking, memory barriers, etc. for you. It's well understood and more readable than any custom synchronization code based on volatile and Thread.MemoryBarrier().
EDIT
I can't think of a scenario in which I'd use volatile or Thread.MemoryBarrier(). For example
private static volatile int counter;
public static int NextValue()
{
return counter++;
}
is not equivalent to the code above and is not thread-safe (volatile doesn't make ++ magically become thread-safe).
In a case like this:
private static volatile bool done;
void Thread1()
{
while (!done)
{
// do work
}
}
void Thread2()
{
// do work
done = true;
}
(which should work) I'd use a ManualResetEvent to signal when Thread2 is done.
Basically if you're using any other kind of synchronization to make your code threadsafe then you don't need to.
Most of the lock mechanisms (including lock) automatically imply a memory barrier so that multiple processor can get the correct information.
Volatile and MemoryBarrier are mostly used in lock free scenarios where you're trying to avoid the performance penalty of locking.
Edit: You should read this article by Joe Duffy about the CLR 2.0 memory model, it clarifies a lot of things (if you're really interested you should read ALL the article from Joe Duffie who is by large the most expert person in parallelism in .NET)
As the name implies volatile guarantees that cache value are flushed to memory so that all the threads see the same value. For example, if I have an integer whose latest write is saved in the cache, other threads may not see that. They may even see their cache copy of that integer. Marking a variable as volatile makes it to be read directly from the memory.
Sriwantha Sri Aravinda Attanayake