I understand the volatile keyword in C++ fairly well. But in C#, it appears to take a different meaning, more so related to multi-threading. I thought bool operations were atomic, and I thought that if the operation were atomic, you wouldn't have threading concerns. What am I missing?
https://msdn.microsoft.com/en-us/library/x13ttww7.aspx
I thought bool operations were atomic
They are are indeed atomic.
and I thought that if the operation were atomic, you wouldn't have threading concerns.
That is where you have an incomplete picture. Imagine you have two threads running on separate cores, each with their own cache layers. Thread #1 has foo in its cache, and Thread #2 updates the value of foo. Thread #1 won't see the change unless foo is marked as volatile, acquired a lock, used the Interlocked class, or explicitly called Thread.MemoryBarrier() which would cause the value to be invalidated in the cache. Thus, guaranteeing that you read the most up to date value:
Using the volatile modifier ensures that one thread retrieves the most up-to-date value written by another thread.
Eric Lippert has a great post about volatility where he explains:
The true semantics of volatile reads
and writes are considerably more complex than I've outlined here; in
fact they do not actually guarantee that every processor stops what it
is doing and updates caches to/from main memory. Rather, they provide
weaker guarantees about how memory accesses before and after reads and
writes may be observed to be ordered with respect to each other.
Edit:
As per #xanatos comment, This doesn't mean that volatile guarantees an immediate read, it guarantees a read on the most updated value.
The volatile keyword in C# is all about reading/writing reordering, so it is something quite esoteric.
http://www.albahari.com/threading/part4.aspx#_Memory_Barriers_and_Volatility
(that I consider to be one of the "bibles" about threading) writes:
The volatile keyword instructs the compiler to generate an acquire-fence on every read from that field, and a release-fence on every write to that field. An acquire-fence prevents other reads/writes from being moved before the fence; a release-fence prevents other reads/writes from being moved after the fence. These “half-fences” are faster than full fences because they give the runtime and hardware more scope for optimization.
It is something quite unreadable :-)
Now... What it doesn't mean:
It doesn't mean that a value will be read now/will be written now
it simply means that if you read something from a volatile variable, everything else that has been read/written before this "special" read won't be moved after this "special" read. So it creates a barrier. So paradoxically, by reading from a volatile variable, you guarantee that all the writes you have done to any other variable (volatile or not) at the point of the reading will be done.
A volatile write is something probably more important, and is something that is in part guaranteed by the Intel CPU, and something that wasn't guaranteed by the first version of Java: no write reordering. The problem was:
object myrefthatissharedwithotherthreads = new MyClass(5);
where
class MyClass
{
public int Value;
MyClass(int value)
{
Value = value;
}
}
Now... that expression can be imagined to be:
var temp = new MyClass();
temp.Value = 5;
myrefthatissharedwithotherthreads = temp;
where temp is something generated by the compiler that you can't see.
If the writes can be reordered, you could have:
var temp = new MyClass();
myrefthatissharedwithotherthreads = temp;
temp.Value = 5;
and another thread could see a partially initialized MyClass, because the value of myrefthatissharedwithotherthreads is readable before the class MyClass has finished initializing.
Related
I have read many contradicting information (msdn, SO etc.) about volatile and VoletileRead (ReadAcquireFence).
I understand the memory access reordering restriction implication of those - what I'm still completely confused about is the freshness guarantee - which is very important for me.
msdn doc for volatile mentions:
(...) This ensures that the most up-to-date value is present in the field at all times.
msdn doc for volatile fields mentions:
A read of a volatile field is called a volatile read. A volatile read has "acquire semantics"; that is, it is guaranteed to occur prior to any references to memory that occur after it in the instruction sequence.
.NET code for VolatileRead is:
public static int VolatileRead(ref int address)
{
int ret = address;
MemoryBarrier(); // Call MemoryBarrier to ensure the proper semantic in a portable way.
return ret;
}
According to msdn MemoryBarrier doc Memory barrier prevents reordering. However this doesn't seem to have any implications on freshness - correct?
How then one can get freshness guarantee?
And is there difference between marking field volatile and accessing it with VolatileRead and VolatileWrite semantic? I'm currently doing the later in my performance critical code that needs to guarantee freshness, however readers sometimes get stale value. I'm wondering if marking the state volatile will make situation different.
EDIT1:
What I'm trying to achieve - get the guarantee that reader threads will get as recent value of shared variable (written by multiple writers) as possible - ideally no older than what is the cost of context switch or other operations that may postpone the immediate write of state.
If volatile or higher level construct (e.g. lock) have this guarantee (do they?) than how do they achieve this?
EDIT2:
The very condensed question should have been - how do I get guarantee of as fresh value during reads as possible? Ideally without locking (as exclusive access is not needed and there is potential for high contention).
From what I learned here I'm wondering if this might be the solution (solving(?) line is marked with comment):
private SharedState _sharedState;
private SpinLock _spinLock = new SpinLock(false);
public void Update(SharedState newValue)
{
bool lockTaken = false;
_spinLock.Enter(ref lockTaken);
_sharedState = newValue;
if (lockTaken)
{
_spinLock.Exit();
}
}
public SharedState GetFreshSharedState
{
get
{
Thread.MemoryBarrier(); // <---- This is added to give readers freshness guarantee
var value = _sharedState;
Thread.MemoryBarrier();
return value;
}
}
The MemoryBarrier call was added to make sure both - reads and writes - are wrapped by full fences (same as lock code - as indicated here http://www.albahari.com/threading/part4.aspx#_The_volatile_keyword 'Memory barriers and locking' section)
Does this look correct or is it flawed?
EDIT3:
Thanks to very interesting discussions here I learned quite a few things and I actually was able to distill to the simplified unambiguous question that I have about this topic. It's quite different from the original one so I rather posted a new one here: Memory barrier vs Interlocked impact on memory caches coherency timing
I think this is a good question. But, it is also difficult to answer. I am not sure I can give you a definitive answer to your questions. It is not your fault really. It is just that the subject matter is complex and really requires knowing details that might not be feasible to enumerate. Honestly, it really seems like you have educated yourself on the subject quite well already. I have spent a lot of time studying the subject myself and I still do not fully understand everything. Nevertheless, I will still attempt an some semblance of an answer here anyway.
So what does it mean for a thread to read a fresh value anyway? Does it mean the value returned by the read is guaranteed to be no older than 100ms, 50ms, or 1ms? Or does it mean the value is the absolute latest? Or does it mean that if two reads occur back-to-back then the second is guaranteed to get a newer value assuming the memory address changed after the first read? Or does it mean something else altogether?
I think you are going to have a hard time getting your readers to work correctly if you are thinking about things in terms of time intervals. Instead think of things in terms of what happens when you chain reads together. To illustrate my point consider how you would implement an interlocked-like operation using arbitrarily complex logic.
public static T InterlockedOperation<T>(ref T location, T operand)
{
T initial, computed;
do
{
initial = location;
computed = op(initial, operand); // where op is replaced with a specific implementation
}
while (Interlocked.CompareExchange(ref location, computed, initial) != initial);
return computed;
}
In the code above we can create any interlocked-like operation if we exploit the fact that the second read of location via Interlocked.CompareExchange will be guaranteed to return a newer value if the memory address received a write after the first read. This is because the Interlocked.CompareExchange method generates a memory barrier. If the value has changed between reads then the code spins around the loop repeatedly until location stops changing. This pattern does not require that the code use the latest or freshest value; just a newer value. The distinction is crucial.1
A lot of lock free code I have seen works on this principal. That is the operations are usually wrapped into loops such that the operation is continually retried until it succeeds. It does not assume that the first attempt is using the latest value. Nor does it assume every use of the value be the latest. It only assumes that the value is newer after each read.
Try to rethink how your readers should behave. Try to make them more agnostic about the age of the value. If that is simply not possible and all writes must be captured and processed then you may be forced into a more deterministic approach like placing all writes into a queue and having the readers dequeue them one-by-one. I am sure the ConcurrentQueue class would help in that situation.
If you can reduce the meaning of "fresh" to only "newer" then placing a call to Thread.MemoryBarrier after each read, using Volatile.Read, using the volatile keyword, etc. will absolutely guarantee that one read in a sequence will return a newer value than a previous read.
1The ABA problem opens up a new can of worms.
A memory barrier does provide this guarantee. We can derive the "freshness" property that you are looking for from the reording properties that a barrier guarantees.
By freshness you probably mean that a read returns the value of the most recent write.
Let's say we have these operations, each on a different thread:
x = 1
x = 2
print(x)
How could we possibly print a value other than 2? Without volatile the read can move one slot upwards and return 1. Volatile prevents reorderings, though. The write cannot move backwards in time.
In short, volatile guarantees you to see the most recent value.
Strictly speaking I'd need to differentiate between volatile and a memory barrier here. The latter one is a stronger guarantee. I have simplified this discussion because volatile is implemented using memory barriers, at least on x86/x64.
Considering the following example:
private int sharedState = 0;
private void FirstThread() {
Volatile.Write(ref sharedState, 1);
}
private void SecondThread() {
int sharedStateSnapshot = Volatile.Read(ref sharedState);
Console.WriteLine(sharedStateSnapshot);
}
Until recently, I was under the impression that, as long as FirstThread() really did execute before SecondThread(), this program could not output anything but 1.
However, my understanding now is that:
Volatile.Write() emits a release fence. This means no preceding load or store (in program order) may happen after the assignment of 1 to sharedState.
Volatile.Read() emits an acquire fence. This means no subsequent load or store (in program order) may happen before the copying of sharedState to sharedStateSnapshot.
Or, to put it another way:
When sharedState is actually released to all processor cores, everything preceding that write will also be released, and,
When the value in the address sharedStateSnapshot is acquired; sharedState must have been already acquired.
If my understanding is therefore correct, then there is nothing to prevent the acquisition of sharedState being 'stale', if the write in FirstThread() has not already been released.
If this is true, how can we actually ensure (assuming the weakest processor memory model, such as ARM or Alpha), that the program will always print 1? (Or have I made an error in my mental model somewhere?)
Your understanding is correct, and it is true that you cannot ensure that the program will always print 1 using these techniques. To ensure your program will print 1, assuming thread 2 runs after thread one, you need two fences on each thread.
The easiest way to achieve that is using the lock keyword:
private int sharedState = 0;
private readonly object locker = new object();
private void FirstThread()
{
lock (locker)
{
sharedState = 1;
}
}
private void SecondThread()
{
int sharedStateSnapshot;
lock (locker)
{
sharedStateSnapshot = sharedState;
}
Console.WriteLine(sharedStateSnapshot);
}
I'd like to quote Eric Lippert:
Frankly, I discourage you from ever making a volatile field. Volatile fields are a sign that you are doing something downright crazy: you're attempting to read and write the same value on two different threads without putting a lock in place.
The same applies to calling Volatile.Read and Volatile.Write. In fact, they are even worse than volatile fields, since they require you to do manually what the volatile modifier does automatically.
You're right, there's no guarantee that release stores will be immediately visible to all processors. Volatile.Read and Volatile.Write give you acquire/release semantics, but no immediacy guarantees.
The volatile modifier seems to do this though. The compiler will emit an OpCodes.Volatile IL instruction, and the jitter will tell the processor not to store the variable on any of its registers (see Hans Passant's answer).
But why do you need it to be immediate anyway? What if your SecondThread happens to run a couple of milliseconds sooner, before the values are actually wrote? Seeing as the scheduling is non-deterministic, the correctness of your program shouldn't depend on this "immediacy" anyway.
Until recently, I was under the impression that, as long as
FirstThread() really did execute before SecondThread(), this program
could not output anything but 1.
As you go on to explain yourself, this impression is wrong. Volatile.Read simply issues a read operation on its target followed by a memory barrier; the memory barrier prevents operation reordering on the processor executing the current thread but this does not help here because
There are no operations to reorder (just the single read or write in each thread).
The race condition across your threads means that even if the no-reorder guarantee applied across processors, it would simply mean that the order of operations which you cannot predict anyway would be preserved.
If my understanding is therefore correct, then there is nothing to
prevent the acquisition of sharedState being 'stale', if the write in
FirstThread() has not already been released.
That is correct. In essence you are using a tool designed to help with weak memory models against a possible problem caused by a race condition. The tool won't help you because that's not what it does.
If this is true, how can we actually ensure (assuming the weakest
processor memory model, such as ARM or Alpha), that the program will
always print 1? (Or have I made an error in my mental model
somewhere?)
To stress once again: the memory model is not the problem here. To ensure that your program will always print 1 you need to do two things:
Provide explicit thread synchronization that guarantees the write will happen before the read (in the simplest case, SecondThread can use a spin lock on a flag which FirstThread uses to signal it's done).
Ensure that SecondThread will not read a stale value. You can do this trivially by marking sharedState as volatile -- while this keyword has deservedly gotten much flak, it was designed explicitly for such use cases.
So in the simplest case you could for example have:
private volatile int sharedState = 0;
private volatile bool spinLock = false;
private void FirstThread()
{
sharedState = 1;
// ensure lock is released after the shared state write!
Volatile.Write(ref spinLock, true);
}
private void SecondThread()
{
SpinWait.SpinUntil(() => spinLock);
Console.WriteLine(sharedState);
}
Assuming no other writes to the two fields, this program is guaranteed to output nothing other than 1.
Like many other people, I've always been confused by volatile reads/writes and fences. So now I'm trying to fully understand what these do.
So, a volatile read is supposed to (1) exhibit acquire-semantics and (2) guarantee that the value read is fresh, i.e., it is not a cached value. Let's focus on (2).
Now, I've read that, if you want to perform a volatile read, you should introduce an acquire fence (or a full fence) after the read, like this:
int local = shared;
Thread.MemoryBarrier();
How exactly does this prevent the read operation from using a previously cached value?
According to the definition of a fence (no read/stores are allowed to be moved above/below the fence), I would insert the fence before the read, preventing the read from crossing the fence and being moved backwards in time (aka, being cached).
How does preventing the read from being moved forwards in time (or subsequent instructions from being moved backwards in time) guarantee a volatile (fresh) read? How does it help?
Similarly, I believe that a volatile write should introduce a fence after the write operation, preventing the processor from moving the write forward in time (aka, delaying the write). I believe this would make the processor flush the write to the main memory.
But to my surprise, the C# implementation introduces the fence before the write!
[MethodImplAttribute(MethodImplOptions.NoInlining)] // disable optimizations
public static void VolatileWrite(ref int address, int value)
{
MemoryBarrier(); // Call MemoryBarrier to ensure the proper semantic in a portable way.
address = value;
}
Update
According to this example, apparently taken from "C# 4 in a Nutshell", fence 2 , placed after a write is supposed to force the write to be flushed to main memory immediately, and fence 3, placed before a read, is supposed to guarantee a fresh read:
class Foo{
int _answer;
bool complete;
void A(){
_answer = 123;
Thread.MemoryBarrier(); // Barrier 1
_complete = true;
Thread.MemoryBarrier(); // Barrier 2
}
void B(){
Thread.MemoryBarrier(); // Barrier 3;
if(_complete){
Thread.MemoryBarrier(); // Barrier 4;
Console.WriteLine(_answer);
}
}
}
The ideas in this book (and my own personal beliefs) seem to contradict the ideas behind C#'s VolatileRead and VolatileWrite implementations.
How exactly does this prevent the read operation from using a
previously cached value?
It does no such thing. A volatile read does not guarantee that the latest value will be returned. In plain English all it really means is that the next read will return a newer value and nothing more.
How does preventing the read from being moved forwards in time (or
subsequent instructions from being moved backwards in time) guarantee
a volatile (fresh) read? How does it help?
Be careful with the terminology here. Volatile is not synonymous with fresh. As I already mentioned above its real usefulness lies in how two or more volatile reads are chained together. The next read in a sequence of volatile reads will absolutely return a newer value than the previous read of the same address. Lock-free code should be written with this premise in mind. That is, the code should be structured to work on the principal of dealing with a newer value and not the latest value. This is why most lock-free code spins in a loop until it can verify that the operation completely successfully.
The ideas in this book (and my own personal beliefs) seem to
contradict the ideas behind C#'s VolatileRead and VolatileWrite
implementations.
Not really. Remember volatile != fresh. Yes, if you want a "fresh" read then you need to place an acquire-fence before the read. But, that is not the same as doing a volatile read. What I am saying is that if the implementation of VolatileRead had the call to Thread.MemoryBarrier before the read instruction then it would not actually produce a volatile read. If would produce fresh a read though.
The important thing to understand is that volatile does not only mean "cannot cache value", but also gives important visibility guarantees (to be exact, it's entirely possible to have a volatile write that only goes to cache; depends solely on the hardware and its used cache coherency protocols)
A volatile read gives acquire semantics, while a volatile write has release semantics. An acquire fence means that you cannot reorder reads or writes before the fence, while a release fence means you cannot move them after the fence. The linked answer in the comments explains that actually quite nicely.
Now the question is, if we don't have any memory barrier before the load how is it guaranteed that we'll see the newest value? The answer to that is: Because we also put memory barriers after each volatile write to guarantee that.
Doug Lea wrote a great summary on which barriers exist, what they do and where to put them for volatile reads/writes for the JMM as a help for compiler writers, but the text is also quite useful for other people. Volatile reads and writes give the same guarantees in both Java and the CLR so that's generally applicable.
Source - scroll down to the "Memory Barriers" section (I'd copy the interesting parts, but the formatting doesn't survive it..)
I read some articles about the volatile keyword but I could not figure out its correct usage. Could you please tell me what it should be used for in C# and in Java?
Consider this example:
int i = 5;
System.out.println(i);
The compiler may optimize this to just print 5, like this:
System.out.println(5);
However, if there is another thread which can change i, this is the wrong behaviour. If another thread changes i to be 6, the optimized version will still print 5.
The volatile keyword prevents such optimization and caching, and thus is useful when a variable can be changed by another thread.
For both C# and Java, "volatile" tells the compiler that the value of a variable must never be cached as its value may change outside of the scope of the program itself. The compiler will then avoid any optimisations that may result in problems if the variable changes "outside of its control".
Reads of volatile fields have acquire semantics. This means that it is guaranteed that the memory read from the volatile variable will occur before any following memory reads. It blocks the compiler from doing the reordering, and if the hardware requires it (weakly ordered CPU), it will use a special instruction to make the hardware flush any reads that occur after the volatile read but were speculatively started early, or the CPU could prevent them from being issued early in the first place, by preventing any speculative load from occurring between the issue of the load acquire and its retirement.
Writes of volatile fields have release semantics. This means that it is guaranteed that any memory writes to the volatile variable are guaranteed to be delayed until all previous memory writes are visible to other processors.
Consider the following example:
something.foo = new Thing();
If foo is a member variable in a class, and other CPUs have access to the object instance referred to by something, they might see the value foo change before the memory writes in the Thing constructor are globally visible! This is what "weakly ordered memory" means. This could occur even if the compiler has all of the stores in the constructor before the store to foo. If foo is volatile then the store to foo will have release semantics, and the hardware guarantees that all of the writes before the write to foo are visible to other processors before allowing the write to foo to occur.
How is it possible for the writes to foo to be reordered so badly? If the cache line holding foo is in the cache, and the stores in the constructor missed the cache, then it is possible for the store to complete much sooner than the writes to the cache misses.
The (awful) Itanium architecture from Intel had weakly ordered memory. The processor used in the original XBox 360 had weakly ordered memory. Many ARM processors, including the very popular ARMv7-A have weakly ordered memory.
Developers often don't see these data races because things like locks will do a full memory barrier, essentially the same thing as acquire and release semantics at the same time. No loads inside the lock can be speculatively executed before the lock is acquired, they are delayed until the lock is acquired. No stores can be delayed across a lock release, the instruction that releases the lock is delayed until all of the writes done inside the lock are globally visible.
A more complete example is the "Double-checked locking" pattern. The purpose of this pattern is to avoid having to always acquire a lock in order to lazy initialize an object.
Snagged from Wikipedia:
public class MySingleton {
private static object myLock = new object();
private static volatile MySingleton mySingleton = null;
private MySingleton() {
}
public static MySingleton GetInstance() {
if (mySingleton == null) { // 1st check
lock (myLock) {
if (mySingleton == null) { // 2nd (double) check
mySingleton = new MySingleton();
// Write-release semantics are implicitly handled by marking
// mySingleton with 'volatile', which inserts the necessary memory
// barriers between the constructor call and the write to mySingleton.
// The barriers created by the lock are not sufficient because
// the object is made visible before the lock is released.
}
}
}
// The barriers created by the lock are not sufficient because not all threads
// will acquire the lock. A fence for read-acquire semantics is needed between
// the test of mySingleton (above) and the use of its contents. This fence
// is automatically inserted because mySingleton is marked as 'volatile'.
return mySingleton;
}
}
In this example, the stores in the MySingleton constructor might not be visible to other processors before the store to mySingleton. If that happens, the other threads that peek at mySingleton will not acquire a lock and they will not necessarily pick up the writes to the constructor.
volatile never prevents caching. What it does is guarantee the order in which other processors "see" writes. A store release will delay a store until all pending writes are complete and a bus cycle has been issued telling other processors to discard/writeback their cache line if they happen to have the relevant lines cached. A load acquire will flush any speculated reads, ensuring that they won't be stale values from the past.
To understand what volatile does to a variable, it's important to understand what happens when the variable is not volatile.
Variable is Non-volatile
When two threads A & B are accessing a non-volatile variable, each thread will maintain a local copy of the variable in it's local cache. Any changes done by thread A in it's local cache won't be visible to the thread B.
Variable is volatile
When variables are declared volatile it essentially means that threads should not cache such a variable or in other words threads should not trust the values of these variables unless they are directly read from the main memory.
So, when to make a variable volatile?
When you have a variable which can be accessed by many threads and you want every thread to get the latest updated value of that variable even if the value is updated by any other thread/process/outside of the program.
The volatile keyword has different meanings in both Java and C#.
Java
From the Java Language Spec :
A field may be declared volatile, in which case the Java memory model ensures that all threads see a consistent value for the variable.
C#
From the C# Reference (retrieved 2021-03-31):
The volatile keyword indicates that a field might be modified by multiple threads that are executing at the same time. The compiler, the runtime system, and even hardware may rearrange reads and writes to memory locations for performance reasons. Fields that are declared volatile are not subject to these optimizations. (...)
In Java, "volatile" is used to tell the JVM that the variable may be used by multiple threads at the same time, so certain common optimizations cannot be applied.
Notably the situation where the two threads accessing the same variable are running on separate CPU's in the same machine. It is very common for CPU's to cache aggressively the data it holds because memory access is very much slower than cache access. This means that if the data is updated in CPU1 it must immediately go through all caches and to main memory instead of when the cache decides to clear itself, so that CPU2 can see the updated value (again by disregarding all caches on the way).
When you are reading data that is non-volatile, the executing thread may or may not always get the updated value.
But if the object is volatile, the thread always gets the most up-to-date value.
Volatile is solving concurrency problem. To make that value in sync. This keyword is mostly use in a threading. When multiple thread updating same variable.
New to this website, so let me know if I'm not posting in an accepted manner.
I've frequently coded something along the lines of the sample below(with stuff like Dispose ommited for clarity. ). My question is, are the volatiles needed as shown? Or does the ManualResetEvent.Set have an implicit memory barrier as I've read Thread.Start does? Or would an explicit MemoryBarrier call be better than the volatiles? Or is it completely wrong? Also, the fact that the "implicit memory barrier behavior" in some operations is not documented as far as I've seen is quite frutrating, is there a list of these operations somewhere?
Thanks,
Tom
:
class OneUseBackgroundOp
{
// background args
private string _x;
private object _y;
private long _z;
// background results
private volatile DateTime _a
private volatile double _b;
private volatile object _c;
// thread control
private Thread _task;
private ManualResetEvent _completedSignal;
private volatile bool _completed;
public bool DoSomething(string x, object y, long z, int initialWaitMs)
{
bool doneWithinWait;
_x = x;
_y = y;
_z = z;
_completedSignal = new ManualResetEvent(false);
_task = new Thread(new ThreadStart(Task));
_task.IsBackground = true;
_task.Start()
doneWithinWait = _completedSignal.WaitOne(initialWaitMs);
return doneWithinWait;
}
public bool Completed
{
get
{
return _completed;
}
}
/* public getters for the result fields go here, with an exception
thrown if _completed is not true; */
private void Task()
{
// args x, y, and z are written once, before the Thread.Start
// implicit memory barrier so they may be accessed freely.
// possibly long-running work goes here
// with the work completed, assign the result fields _a, _b, _c here
_completed = true;
_completedSignal.Set();
}
}
Note that this is off the cuff, without studying your code closely. I don't think Set performs a memory barrier, but I don't see how that's relevant in your code? Seems like more important would be if Wait performs one, which it does. So unless I missed something in the 10 seconds I devoted to looking at your code, I don't believe you need the volatiles.
Edit: Comments are too restrictive. I'm now referring to Matt's edit.
Matt did a good job with his evaluation, but he's missing a detail. First, let's provide some definitions of things thrown around, but not clarified here.
A volatile read reads a value and then invalidates the CPU cache. A volatile write flushes the cache, and then writes the value. A memory barrier flushes the cache and then invalidates it.
The .NET memory model ensures that all writes are volatile. Reads, by default, are not, unless an explicit VolatileRead is made, or the volatile keyword is specified on the field. Further, interlocked methods force cache coherency, and all of the synchronization concepts (Monitor, ReaderWriterLock, Mutex, Semaphore, AutoResetEvent, ManualResetEvent, etc.) call interlocked methods internally, and thus ensure cache coherency.
Again, all of this is from Jeffrey Richter's book, "CLR via C#".
I said, initially, that I didn't think Set performed a memory barrier. However, upon further reflection about what Mr. Richter said, Set would be performing an interlocked operation, and would thus also ensure cache coherency.
I stand by my original assertion that volatile is not needed here.
Edit 2: It looks as if you're building a "future". I'd suggest you look into PFX, rather than rolling your own.
The volatile keyword should not confused as to making _a, _b, and _c thread-safe. See here for a better explanation. Further, the ManualResetEvent does not have any bearing on the thread-safety of _a, _b, and _c. You have to manage that separately.
EDIT: With this edit, I am attempting to distill all of the information that has been put in various answers and comments regarding this question.
The basic question is whether or not the result variables (_a, _b, and _c) will be 'visible' at the time the flag variable (_completed) returns true.
For a moment, let's assume that none of the variables are marked volatile. In this case, it would be possible for the result variables to be set after the flag variable is set in Task(), like this:
private void Task()
{
// possibly long-running work goes here
_completed = true;
_a = result1;
_b = result2;
_c = result3;
_completedSignal.Set();
}
This is clearly not what we want, so how do we deal with this?
If these variables are marked volatile, then this reordering will be prevented. But that is what prompted the original question - are the volatiles required or does the ManualResetEvent provide an implicit memory barrier such that reordering does not occur, in which case the volatile keyword is not really necessary?
If I understand correctly, wekempf's position is that the WaitOne() function provides an implicit memory barrier which fixes the problem. BUT that doesn't seem sufficient to me. The main and background threads could be executing on two separate processors. So, if Set() does not also provide an implicit memory barrier, then the Task() function could end up being executed like this on one of the processors (even with the volatile variables):
private void Task()
{
// possibly long-running work goes here
_completedSignal.Set();
_a = result1;
_b = result2;
_c = result3;
_completed = true;
}
I have searched high and low for information regarding memory barriers and the EventWaitHandles, and I have come up with nothing. The only reference I have seen is the one wekempf has made to Jeffrey Richter's book. The problem I have with this is that the EventWaitHandle is meant to synchronize threads, not access to data. I have never seen any example where EventWaitHandle (e.g., ManualResetEvent) is used to synchronize access to data. As such, I'm hard-pressed to believe that EventWaitHandle does anything with regard to memory barriers. Otherwise, I would expect to find some reference to this on the internet.
EDIT #2: This is a response to wekempf's response to my response... ;)
I managed to read the section from Jeffrey Richter's book at amazon.com. From page 628 (wekempf quotes this too):
Finally, i should point out that whenever a thread calls an interlocked method, the CPU forces cache coherency. So if you are manipulating variables via interlocked methods, you do not have to worry about all of this memory model stuff. Furthermore, all thread synchronization locks (Monitor, ReaderWriterLock, Mutex, Semaphone, AutoResetEvent, ManualResetEvent, etc.) call interlocked methods internally.
So it would seem that, as wekempf pointed out, that the result variables do not require the volatile keyword in the example as shown since the ManualResetEvent ensures cache coherency.
Before closing this edit, there are two additional points I'd like to make.
First, my initial assumption was that the background thread would potentially run multiple times. I obviously overlooked the name of the class (OneUseBackgroundOp)! Given that it is only run once, it is not clear to me why the DoSomething() function calls WaitOne() in the manner that it does. What is the point of waiting initialWaitMs milliseconds if the background thread may or may not be done at the time DoSomething() returns? Why not just kickoff the background thread and use a lock to synchronize access to the results variables OR simply execute the contents of the Task() function as part of the thread that calls DoSomething()? Is there a reason not to do this?
Second, it seems to me that not using some kind of locking mechanism on the results variables is still a bad approach. True, it is not needed in the code as shown. But at some point down the road, another thread may come along and try to access the data. It would be better in my mind to prepare for this possibility now rather than try to track down mysterious behavior anomalies later.
Thanks to everyone for bearing with me on this. I've certainly learned a lot by participating in this discussion.
Wait functions have an implicit memory barrier. See http://msdn.microsoft.com/en-us/library/ms686355(v=vs.85).aspx
First, I'm not sure if I should "Answer my own question" or use a comment for this, but here goes:
My understanding is that volatile prevents code/memory optimizations from moving the accesses to my result variables (and the completed boolean) such that the the thread that reads the result will see upt-to-date data.
You wouldn't want the _completed boolean made visible to all threads after the Set() due to compiler or emmpry optimaztions/reordering. Likewise, you wouldn't want the writes to the results _a, _b, _c being seen after the Set().
EDIT: Further explaination/clarification on the question, in regards to items mentioned by Matt Davis:
Finally, i should point out that
whenever a thread calls an interlocked
method, the CPU forces cache
coherency. So if you are manipulating
variables via interlocked methods, you
do not have to worry about all of this
memory model stuff. Furthermore, all
thread synchronization locks (Monitor,
ReaderWriterLock, Mutex, Semaphone,
AutoResetEvent, ManualResetEvent,
etc.) call interlocked methods
internally.
So it would seem that, as wekempf
pointed out, that the result variables
do not require the volatile keyword in
the example as shown since the
ManualResetEvent ensures cache
coherency.
So you are both in agreement that such an operation takes care of caching between processors or in registers etc.
But does it prevent reording to guarantee such that BOTH the results are assigned before the completed flag, and that the completed flag is assigned true before the ManualResetEvent is Set?
First, my initial assumption was that
the background thread would
potentially run multiple times. I
obviously overlooked the name of the
class (OneUseBackgroundOp)! Given that
it is only run once, it is not clear
to me why the DoSomething() function
calls WaitOne() in the manner that it
does. What is the point of waiting
initialWaitMs milliseconds if the
background thread may or may not be
done at the time DoSomething()
returns? Why not just kickoff the
background thread and use a lock to
synchronize access to the results
variables OR simply execute the
contents of the Task() function as
part of the thread that calls
DoSomething()? Is there a reason not
to do this?
The concept of the sample is to execute a possibly long-runnig task. If the task can be completed within an exceptable amount of time, then the calling thread will get access to the result and continue with normal processing. But sometime a task can take quite a long time and the claiing thread cannot be blocked for that period and can take reasonable steps to deal with that. That can include checking back later on the operation using the Completed property.
A concrete example: A DNS resolve is often very quick (subsecond) and worth waiting for even from a GUI, but sometimes it can take many many seconds. So by using a utility class like the sample, one could gets a result easily from the point-of-view of the caller 95% of the time and not lock up the GUI the other 5%. One could use a Background worker, but that can be overkill for an operation that the vast majority of the time doesn't need all that plumbing.
Second, it seems to me that not using
some kind of locking mechanism on the
results variables is still a bad
approach. True, it is not needed in
the code as shown.
The result (and completed flag) data is meant to be write-once, read-many. If I added a lock to assign the results and flag, I'd also have to lock in my result getters, and I never liked seeing getters lock just to return a data point. From my reading, such fine-grained locking is not appropriate. If an operation has 5 or 6 results, the caller has to take and release the lock 5 or 6 times needlessly.
But at some point
down the road, another thread may come
along and try to access the data. It
would be better in my mind to prepare
for this possibility now rather than
try to track down mysterious behavior
anomalies later.
Because I have a volatile completed flag that is guarenteed to be set before the volatile results are, and the only access to the results is through the getters, and as mentioned in the smaple, an exception is thrown if the getter is called and the operation is not yet complete, I'd expect that the Completed and result getters CAN be invoked by a thread other than the one that called DoSomething(). That's my hope anyway. I believe this to be true with the volatiles anyway.
Based on what you've shown, I would say that, no, the volatiles are not required in that code.
The ManualResetEvent itself doesn't have an implicit memory barrier. However, the fact that the main thread is waiting for the signal means that it can't modify any variables. At least, it can't modify any variables while it's waiting. So I guess you could say that waiting on a synchronization object is an implicit memory barrier.
Note, however, that other threads, if they exist and have access to those variables, could modify them.
From your question, it seems that you're missing the point of what volatile does. All volatile does is tell the compiler that the variable might be modified by other threads asynchronously, so it shouldn't optimize code that accesses the variable. volatile does not in any way synchronize access to the variable.